...And You Will Know Me By The Trail of Papers
OR: o.O
Thursday, August 22, 2019
Language representation in MUSE embeddings
I just posted a notebook to github that explores how language is represented in MUSE word embeddings.
Friday, August 9, 2019
Sentiment analysis does not find bias on NFL reddit
I previously published results using sentiment analysis to show that commenters on the NBA reddit had higher sentiment towards young players, highscoring white players, and white coaches. In this post, I am going to extend that analysis to the NFL reddit. My overall finding is that like the NBA, NFL redditors like rookies and players that gain more yards. However, I did not find any significant coefficients for race. For coaches, I found the commenters like coaches that outperform expectations, but again did not find evidence of bias.
Compared to the NFL results, fewer coefficients were significant, and the tstatistics were smaller. I believe this is in part due to the problems mentioned previously: difficulty in matching players to comments; and fewer player performance features to compare with.
Beyond statistical power, this analysis be be complicated by the assumption that NFL fans can (or do) compare players from different positions fairly. Int his analysis, different positions are mixed together, but we are using a synthetic stat (DVOA) to compare them. While two players may be equivalent by DVOA, it may be hard for a casual fan to see that; instead they likely consider more basic stats, which could skew results (e.g. quarterbacks gain more yards than tight ends). While I had a categorical variable for position which was not significant, there may be more subtle influences.
Another difference between the NFL and NBA is that NFL players wear helmets, making personal connections weaker.
The most popular coach was a young, rookie coach who improved the Rams' wins by 7 in a single year. On the unpopular coach side, Gregg Williams was involved in "bountygate," a scandal where coaches were paying players for causing injuries.
Brief review of the method
To try to understand what factors (e.g. performance, age, race) influence player popularity, I scraped millions of comments from r/NFL from 20132018. I then quantified each commenter's opinion toward players using the sentiment analyzer VADER. This analysis calculates whether a word is positive (“GOAT!”) or negative, and ties that feeling to a player. Sentiment scores generally ranged from 0.2 to 0.3. Finally, I quantified the impact of each factor on popularity by performing a leastsquare regression with an outcome variable of sentiment towards a player. Details of the analysis are in the previous post, and a series of notebooks covering scraping data, sentiment quantification, and regression.Unique difficulties of performing sentiment analysis
Compared to the NBA, two factors make analyzing the NFL more challenging: the larger number of players; and the smaller number of players with comparable stats.
The larger number of players made resolving which player was being talked about more difficult (named entity recognition). I performed named entity recognition by identifying comments that contained player first or last names, then matched those names to players. If an NBA commenter mentioned "Blake," I could link that comment to Blake Griffin since there is only one active "Blake" in the NBA. However, in the NFL, "Blake" could refer to Blake Bortles, Blake Bell, Blake Jarwin, or others. This means that the best way to identify comments about NFL players is to find full name matches, in contrast to how people normally talk about players (just their first or last name). A second side effect of the larger number of players was that commentplayer matching took longer, as the way I implemented it took O(n^2).
The nature of the NFL also made it harder to compare player stats. In basketball, all players score, rebound, and assist. In the NFL, half the players play defense; and on offense, lineman touch the ball; I ended up only analyzing skillposition players. This means that despite the NFL having more players overall, there were fewer players to analyze, and the statistical power of the analysis was lower.
Most and least popular players
To try to compare players between different positions (QB, RB, WR, and TE), I used Football Outsiders' advanced metric DVOA, which is defense adjusted zscore of yards above average. For example, if a QB and WR both had a DVOA of 1, they would both be one standard deviation better than the average player at their position.
Here are the most and least popular players for the years 20132018:
Here are the most and least popular players for the years 20132018:
Lowest Sentiment Seasons Highest Sentiment Seasons
Player  Year  Avg Sentiment  Player  Year  Avg Sentiment 

Michael Bennett  2017  0.25  David Johnson  2016  0.24 
Danny Trevathan  2015  0.15  JJ Watt  2017  0.22 
Vontaze Burfict  2015  0.15  JJ Watt  2016  0.22 
Vontaze Burfict  2016  0.14 
On the unpopular side, Michael Bennett in 2017 was one of the first players to kneel during the national anthem; and Vontaze Burfict has a reputation as a dirty player. For the popular players, all three are pro Bowlers. While I am not an avid NFL fan, these results pass my smell test.
Regression results
As I did with the NBA, to quantify which features were important, I ran a weighted least square regression with clustered standard errors at the player level. Due to the lack of features, I only present two specifications here. First, I ran a regression with DVOA as the only feature (spec 1). This single coefficient was statistically significant (tstat of 2.1), albeit weakly, and shows people prefer highachieving players. I then ran a second regression adding features for age, and race for players; and citylevel statistics for commenters (spec 2). In this regression, DVOA was no longer significant, while the rookie coefficient was. Neither the overall age or race coefficients were significant.
Coefficient  (1)  (2) 

DVOA  0.0057 (2.1)  0.0051 (1.4) 
Rookie    0.018 (2.4) 
1 year of youth<27 td="">    0.0011 (0.7) 
Race (white)    0.0067 (1.2) 
Compared to the NFL results, fewer coefficients were significant, and the tstatistics were smaller. I believe this is in part due to the problems mentioned previously: difficulty in matching players to comments; and fewer player performance features to compare with.
Beyond statistical power, this analysis be be complicated by the assumption that NFL fans can (or do) compare players from different positions fairly. Int his analysis, different positions are mixed together, but we are using a synthetic stat (DVOA) to compare them. While two players may be equivalent by DVOA, it may be hard for a casual fan to see that; instead they likely consider more basic stats, which could skew results (e.g. quarterbacks gain more yards than tight ends). While I had a categorical variable for position which was not significant, there may be more subtle influences.
Another difference between the NFL and NBA is that NFL players wear helmets, making personal connections weaker.
NFL coaches
We can also calculate sentiment towards NFL coaches. Here are the most popular and least popular coaches:
Coach  Year  Avg Sentiment  Player  Year  Avg Sentiment 

Sean McVay  2017  0.28  Gregg Williams  2017  0.29 
Adam Gase  2016  0.27  Gregg Williams  2016  0.23 
Todd Bowles  2015  0.27  Sean Payton  20152016  0.11 
Marc Trestman  2013  0.24  Mike Smith  2013  0.08 
The most popular coach was a young, rookie coach who improved the Rams' wins by 7 in a single year. On the unpopular coach side, Gregg Williams was involved in "bountygate," a scandal where coaches were paying players for causing injuries.
Having looked at the most and least popular players, we can again perform a regression:
Coefficient  Magnitude (tstatistic) 

Win % over expectation  0.22 (2.1) 
Age (years)  0.2W (2.0) 
Tenure with team (years)  0.25W (1.95) 
Race (white)  Not significant 
(For non Win coefficients, I expressed the magnitude in terms of wins.)
As with NBA coaches, NFL coaches were more popular when they outperformed expectations, were younger, or had longer tenure. In contrast to the NBA, where there was significant and large bias against coaches worth ~10 wins / year, we could not detect bias against NFL coaches. This could be due to due to a lack of power (there are many fewer NFL black coaches than NBA coaches); reflect differences in media coverage; or indeed reflect decreased bias among NFL commenters. It is interesting that NFL coaches are perceived to be much more important than NFL coaches, yet bias is less prevalent.
As with NBA coaches, NFL coaches were more popular when they outperformed expectations, were younger, or had longer tenure. In contrast to the NBA, where there was significant and large bias against coaches worth ~10 wins / year, we could not detect bias against NFL coaches. This could be due to due to a lack of power (there are many fewer NFL black coaches than NBA coaches); reflect differences in media coverage; or indeed reflect decreased bias among NFL commenters. It is interesting that NFL coaches are perceived to be much more important than NFL coaches, yet bias is less prevalent.
Saturday, February 16, 2019
What makes athletes popular? A sentiment regression analysis
By Michael Patterson, and Matt Goldman
(Standard disclaimer: The analyses contained here were done on personal time, and do not reflect the views of our employers.)
The 2018 Cleveland Cavs made the NBA finals while generating some of the hottest memes of 2018 ("We got an [expletive] squad now," "He boomed me."). Following the Cavs on Reddit, I (Mike) noticed something odd. Turkish rookie Cedi Osman was a particular fan favourite. Cedi played limited minutes with energy, and everyone joked that Cedi was the "GOAT" (Greatest Of All Time) carrying Lebron. In contrast, Tristan Thompson, a hero of the 2016 season, had an off year, and was the center of a meme for being traded ("Shump, TT, and the Nets pick"). For the Cavs at least, it seemed commenters gave the white players an easier time. And being a data scientist, I thought, "I could measure that!"
In this project, we used sentiment analysis and regression models to measure how performance, demographics, and race systematically predict sentiment towards players and coaches on the r/NBA and r/NFL subreddits. In doing so, we get a useful window into opinion formation in the online communities that increasingly dominate social and political discourse. Movements as varied as Black Lives Matter, #metoo and r/the_donald have leveraged online communities to connect disparate supporters around a common set of values and ideas. Social commentators have expressed concerns that herd mentalities and outgroup biases can lead to the formation of distorted opinions in these settings, but it is hard to study such bias due to confounding factors. Studying professional athletes offers key advantages: we can measure sentiment associated with a large sample of athletes of varying race; and there are objective performance metrics.
This blog post includes a brief overview of our methodology and results. For details of the analysis, we have written a series of three Jupyter notebooks (linked below). We find that:
 NBA Players
 Reddit commenters like players who perform well; in the NBA, scoring 1 more PPG is worth approximately 0.02 standard deviations of sentiment
 Commenters particularly like both young players (each year below the mean age of 26.7 is worth ~2.5 PPG), and old players (1.5 PPG for each year above 26.7)
 The coefficient for race was overall not statistically significant (t <= 1.76)
 In the NBA, scoring points for white players is worth ~3x as much as for black players
 Commenters from “blue” cities (cities that supported Hillary Clinton in the 2016 general election) had higher sentiment towards NBA players, but this effect was smaller than 1 PPG.
 NBA coaches
 The overall sentiment towards coaches was less than that towards player (mean and median of 0.1 vs 0.13 for players)
 One win above expectation was worth 0.06 standard deviations
 There is a significant bias against black coaches in the NBA, worth ~10 wins
 NFL
 We measured performance in the NFL using Football Outsiders dVOA statistic; one point of DVOA was worth 0.02 standard deviations of sentiment
 We did not detect any effect of race in the NFL, either for players or coaches
Sentiment Modeling
To quantify how redditors felt towards players, we used a natural language processing (NLP) technique called sentiment analysis. “Sentiment” is just a jargony way of saying whether someone is liked or disliked. The sentiment analysis technique we used (VADER) sums the positive and negative sentiment of the words in a sentence, and then normalizes them for an overall score. To tie these sentiment scores to players, we used a technique called named entity recognition. For details of the NLP, see this notebook.
Using these techniques, we analyzed over 2.5 million reddit comments from the 20132018 seasons (see here for how to scrape reddit) for the NBA and NFL. Since sentiment towards players can change over time, we calculated sentiment on a yearly basis. To get the sentiment towards a player in a year, we first calculated the average sentiment towards each player from each commenter, then averaged over all commenters (a meanofmeans). This was performed over the 20132018 seasons.
Using this sentiment model, scores generally range from 1 to 1, with 0 being neutral. The mean sentiment across all players was slightly positive, 0.13. We can check the results of our sentiment analysis by looking at the highest and lowest sentiment playeryears:
Lowest Sentiment Seasons Highest Sentiment Seasons
Player

Year

Avg Sentiment

Player

Year

Avg Sentiment

Mike Dunleavy

2016

0.11

Brandon Ingram

2015

0.27

Kelly Olynyk

2016

0.08

KarlAthony Towns

2015

0.26

Steve Blake

2015

0.07

Marc Gasol

2014

0.25

Zaza Pachulia

2017

0.07

Gordon Hayward

2014

0.24

In general, these sentiment values pass the sniff test. Dirty players like Kelly Olynyk, and Zaza Pachulia each received their low score in seasons immediately following incidents where they injured highprofile players; Brandon Ingram and KarlAnthony Towns in 2015 were young players with potential, and Marc Gasol is the franchise player of Memphis. For a full table of player sentiment, see this .tsv, where the column ‘compound_mean_mean’ represents player sentiment.
Fig 1 (left panel) plots a histogram of this mean sentiment score across white and black players. Overall, the distributions are similar (unpaired ttest, p=0.07), with a standard deviation of 0.053 (calculated on players having at least 200 commenters). However, this need not be the whole story. White and black players differ on many other characteristics that may also determine sentiment. For example, Fig. 1 (right panel) plots player age versus average sentiment score. Here we can see that both young and old players are more liked than NBA middleage players. In order to make useful statements about the role of race in determining sentiment, we need to consider how other player characteristics can confound and mediate such a relationship.
Fig 1 (left panel) plots a histogram of this mean sentiment score across white and black players. Overall, the distributions are similar (unpaired ttest, p=0.07), with a standard deviation of 0.053 (calculated on players having at least 200 commenters). However, this need not be the whole story. White and black players differ on many other characteristics that may also determine sentiment. For example, Fig. 1 (right panel) plots player age versus average sentiment score. Here we can see that both young and old players are more liked than NBA middleage players. In order to make useful statements about the role of race in determining sentiment, we need to consider how other player characteristics can confound and mediate such a relationship.
Fig. 1: Graphs exploring sentiment distributions. Left panel: Histogram of sentiment towards white and black players. Overall the distributions are similar. Right panel: Average sentiment towards players for each age. Young and older players are more popular than average.
Regression Analysis: NBA Players
The above graphs are informative, but not conclusive, since many factors can be correlated with each other, and we can’t make causal inferences. For example, young players might be popular because they are full of potential, or they might be popular because they are underpaid. To disentangle these effects, we can use multivariate regression analysis, where we consider all of these factors simultaneously. Rather than analyze data at the playeryear level, we can analyze it at the playeruseryear level to gain more samples.
In our regression analysis, we set our target variable to be the average player sentiment from a commenter in a year. We start our analysis with simple models, and gradually add more and more covariates (features). Starting with a simple regression using PPG as a covariate, we find that the PPG coefficient is significant, and 1 PPG is worth about 0.01 standard deviations of sentiment (0.0007 compared to the standard deviation of 0.053). In this regression, I also included the covariate of minutes played, which was not significant.
Specification
 
Coefficient
(tstatistic)

(1)

(2)

(3)

(4)

(5)

Intercept

0.07

0.067

0.054

0.0853

0.08

PPG

0.0007
(1.985)

0.0014
(3.5)

0.0012
(2.3)

0.0007 (1.91)

0.001
(2.2)

Rookie

0.020
(5.4)

0.021
(4.7)

0.022
(4.8)
 
Youth

0.0026
(3.5)

0.0028
(3.6)

0.0026
(3.2)
 
Oldness

0.0014
(2.04)

0.0014
(1.9)

0.0012
(1.7)
 
White Player (race)

0.0004
(0.08)

0.0095 (1.76)

0.008
(1.6)
 
White Player X PPG

0.0022 (2.6)

0.0016
(2.6)
 
Commenter City “Blueness”

0.0096
(3.2)
 
Blue Commenter X White Player

0.005 (0.67)

All stats were downloaded from basketballreference.com. Youth (oldness) defined as years below (above) the average NBA age (26.7 years). Regression was done at commenterplayeryear level, weighted with square root of comment count, and with clustered errors at player level. For details, see future notebook.
After this initial regression, we start to increase complexity. For the full list of specifications that we used, please see this Google spreadsheet. The next bit of complexity we added was more performance variables, and simple demographics. Here we found, surprisingly, that no other performance variable was significant for sentiment. For age, we found that commenters preferred both young and old players, and rookies the most. Being a rookie was worth ~14 PPG, one year of youth was worth ~22.5 PPG (0.0026 vs 0.0014 in this specification), and one year of oldness being worth 1 PPG (0.0014 for each). This might be explained by the potential of youth, and survivors bias of players who get older.
In spec (3), we add race as a covariate, which coefficient was not significant. However, the confidence interval on this effect is fairly wide, and we can only conclude that race is less important than 3 PPG. However in specs (4+5) , we see that white players received 23x the benefit of scoring as black players did. In fact, in specification (4) we see that the coefficient for scoring for PPG dips below statistical significance for black players alone.
All the previous coefficients were measured at the player level, but there may also be bias at the commenter level. To measure this, we took each user’s flair, and assigned it to a city (on Reddit, users can express affiliation with a team, e.g. “[CLE] Cedi Osman”). We found that when a user has flair for a city that Clinton won disproportionately more, they had a higher sentiment towards players. To check whether we could detect changes in politicization over time, we interacted year with Clinton vote share, but did not find significant coefficients). Overall, these results are in line with research that shows Democrats have higher favorability towards the NBA and NFL than Republicans do. Interpreting this coefficient, however, is difficult, as it is correlated with many other factors of a city.
Regression analysis: NBA coaches
Using the same techniques, we can quantify sentiment towards coaches. Here are the most liked and disliked coachseasons since 2013. This list looks reasonable, as popular coaches like Brad Stevens and Steve Kerr are at the top. Overall, the mean and median sentiment towards coaches was 0.1. For the full table, see this .tsv.
Highest Sentiment Seasons Lowest Sentiment Seasons
Coach

Year

Avg sentiment

Coach

Year

Avg Sentiment

Brad Stevens
(BOS)

20142015

0.3

George Karl
(SAC)

20152016

0.15

Erik Spoelstra
(MIA)

20142015

0.26

Earl Watson
(PHX)

20172018

0.1

Brad Stevens
(BOS)

20172018

0.24

Kurt Rambis
(NYK)

20152016

0.1

Steve Kerr
(GSW)

20142015

0.23

Fred Hoiberg
(CHI)

20152016

0.06

As before, we can use regression analysis to understand what factors influence coach sentiment. However, the coach analysis has less power than the player analysis for a few reasons: 1.) There is only one coach per team, limiting the sample size, and increasing the influence of outlier coaches. 2.) People talk less about coaches, making estimates of coach sentiment less reliable. 3.) Coaches have fewer covariates compared to players, increasing the chance of omitted variable bias; for example, coaches may be well liked for interviews, which we don’t quantify here.
We can start with the simplest regression, predicting sentiment using variables based on wins: the raw win percentage in a season; career win percentage for the coach; and winning percentage compared to the overunder for wins in a season as a proxy for over or underachievement. Surprisingly, the coefficient for wins alone is negative, but the coefficient for wins above expectation was highly positive (specification (1)). There is also a positive coefficient for career win percentage, albeit half the size of the inseason coefficient. The next covariates we can add are timerelated, like tenure with team, or age (specification (2)). Both of these coefficients are significant: one year of age is worth approximately 0.5 wins; the effect of tenure is twice as big as age. The opposing signs of these coefficients would allow young coaches to retain their popularity as they stay with the same team. We also tested a covariate for former players, which was not significant.
Specification
 
Coefficient (tstatistic)

(1)

(2)

(3)

Intercept

0.01

0.14

0.14

Season Win %

0.14 (2.2)

0.13 (1.94)

0.08 (1.5)

Win%  preseason over/under

0.51 (6.8)

0.48 (6.6)

0.39 (4.5)

Career Win %

0.23 (3)

0.25 (3.3)

0.17 (2.3)

Age (years)

0.0034 (2.4)

0.0033 (2.7)
 
Tenure with team (years)

0.005 (2.1)

0.0048 (2.2)
 
Race (White)

0.047 (2.8)

Finally, we can add a covariate for race. Here, we find the coefficient is significant and large, worth approximately 10 wins (0.047, vs the coefficient for one win of 0.0047). This coefficient was surprisingly large, and different enough from our player analysis, that we wanted to double check it. First, we plotted the residual of our predicted sentiment (predicted sentiment  measured sentiment) using a model that ignored race, splitting the data for white and black coaches (Fig. 2, left panel). Here we can see that a segment of black coaches have negative residuals, meaning that their measured sentiment was less than predicted, a sign that the coefficient for race is negative.
Another reason for concern is that there are relatively few coaches, which means some outlier coaches could be influencing our results. To verify this was not the case, we can perform a bootstrapped regression where we resample our data at the coach level (namely, we take a sample where half the coaches are missing, and fit a regression; Fig. 2, right panel). If we do this, we find that the distribution of coefficients for race are different from zero.
Fig. 2: Left: Histogram of residuals of sentiment using a model that did not include race. Right panel: distribution of coefficients for race, using bootstrapped samples.
Conclusion
We found that factors like age, performance, and race were related to sentiment towards players. The overall sentiment on reddit was positive, probably due to moderation policies that remove abusive language. This moderation policy limits our ability to measure overt racism, as those posts are removed.
We only found one piece of evidence of racial bias for NBA players, namely that high scoring white players are well liked. In general, this fits with our personal observations, as players like Luka Doncic and Gordon Hayward (both legitimately great players), receive lots of attention and favoritism on reddit. We do not know the cause of this bias: it could be due to the relative scarcity of white players yielding a novelty factor; or it could reflect unconscious bias by a subset of reddit users. On the flip side, some of the least popular players on reddit are lowskill white players known for a bruising style and dangerous play.
We also found evidence that commenters from cities that supported Clinton had slightly higher sentiment towards the NBA. This could align with recent research that sports are becoming politicized. For example, conservative sentiment towards the NFL has dropped since the national anthem protests. We did not, however, find any change in time for this effect.
We found a significant, large bias against black NBA coaches. Anecdotally, we can think of two successful, young black coaches, Tyronn Lue and Dwane Casey, that have gotten consistent criticism. In contrast, probably the three most popular coaches, Steve Kerr, Brad Stevens, and Gregg Popovich, are all white. It is possible that black coaches systematically differ from white coaches in ways we have not quantified, although we did not detect a significant coefficient for explayers.
This analysis could be improved by using more refined sentiment analysis, named entity recognition, incorporating other social media sources, and by taking a finergrained approached to time. For sentiment analysis and NER, we used quick methods like VADER and filtering comments to those about a single player. These analyses could both be improved simultaneously by using a combined sentimententity extraction; this would require training a model on a significant amount of labeled data. For other social media sources, we could also try to use Twitter, or perhaps go down to the team subreddit level to gather more data; this would allow us to test the robustness of these results. For time, it would be interesting to analyze sentiment on a gamebygame basis; for example, white players might receive more praise for a high scoring game than black players.
So far, we have only presented results from the NBA, but we also performed a similar analysis for the NFL. We’ll be putting those out shortly.
We only found one piece of evidence of racial bias for NBA players, namely that high scoring white players are well liked. In general, this fits with our personal observations, as players like Luka Doncic and Gordon Hayward (both legitimately great players), receive lots of attention and favoritism on reddit. We do not know the cause of this bias: it could be due to the relative scarcity of white players yielding a novelty factor; or it could reflect unconscious bias by a subset of reddit users. On the flip side, some of the least popular players on reddit are lowskill white players known for a bruising style and dangerous play.
We also found evidence that commenters from cities that supported Clinton had slightly higher sentiment towards the NBA. This could align with recent research that sports are becoming politicized. For example, conservative sentiment towards the NFL has dropped since the national anthem protests. We did not, however, find any change in time for this effect.
We found a significant, large bias against black NBA coaches. Anecdotally, we can think of two successful, young black coaches, Tyronn Lue and Dwane Casey, that have gotten consistent criticism. In contrast, probably the three most popular coaches, Steve Kerr, Brad Stevens, and Gregg Popovich, are all white. It is possible that black coaches systematically differ from white coaches in ways we have not quantified, although we did not detect a significant coefficient for explayers.
This analysis could be improved by using more refined sentiment analysis, named entity recognition, incorporating other social media sources, and by taking a finergrained approached to time. For sentiment analysis and NER, we used quick methods like VADER and filtering comments to those about a single player. These analyses could both be improved simultaneously by using a combined sentimententity extraction; this would require training a model on a significant amount of labeled data. For other social media sources, we could also try to use Twitter, or perhaps go down to the team subreddit level to gather more data; this would allow us to test the robustness of these results. For time, it would be interesting to analyze sentiment on a gamebygame basis; for example, white players might receive more praise for a high scoring game than black players.
So far, we have only presented results from the NBA, but we also performed a similar analysis for the NFL. We’ll be putting those out shortly.
Subscribe to:
Posts (Atom)