...And You Will Know Me By The Trail of Papers: 2019

Saturday, September 7, 2019

Fortnite Reddit prefers female skins

Fortnite is the most popular game of 2018-2019 (and I would argue one of the best designed games ever). “Skins” in Fortnite are costumes for your character, letting you dress up as an astronaut, bigfoot, or a fish controlling a robot body. The release of each new skin is met with discussion online; recently, a black female skin “Luxe” was released, and people seemed unusually critical of the skin. Since I had all of the code from my NBA and NFL analyses*, I figured I would investigate to see how skin race and gender influences sentiment on Fortnite reddit. In terms of results, the two headlines are:

Raw mentions for skins is primarily driven by time since release, and whether a skin was included in a Battle Pass
Sentiment is higher for female skins

* Don’t let “I already have the code” fool you on a data science project. Most of the work was in collecting the data and cleaning it.

Methods

For details of my methods, see this blog post. In brief, I scraped Fortnite reddit for comments from January 2018 through July 2019, with the help of pushshift.io. I then performed named entity recognition* to identify which posts were about Fornite skins. To quantify which skins were most liked, I used VADER with a lexicon modified for Fortnite. For covariates, I scraped two Fortnite skins websites, Gamepedia and Progameguides. All notebooks used for the project can be found on github.

* For NER this time, I tried fine-tuning spaCy’s NER model. I labeled entities for ~300 comments. I found that for skins that had 5+ labels, the NER worked fairly well (decent recall on spot-checked posts). However, for skins that had <= 3 labels, the recall was abysmal. Rather than hand labeling 1,500 comments (5 for each skin), I decided to go back to simple regex extraction for skin names

Analysis of which skins get discussed the most

Before diving into which skins had the highest sentiment, I first wanted to see which skins were discussed the most. Here are the five most commented skins:

Skin	Mentions
Omega	40,300
John Wick	29,900
Drift	29,400
Skull Trooper	23,500
Black Knight	17,900

To understand why these skins are popular, it helps to know a bit about how Fortnite is played. Every 3-4 months, the developers release a “Season,” which includes a “Battle Pass.” The Battle Pass costs around $10, and includes access to a large number of skins that get unlocked as you play. Three of the above skins are from the Battle Pass (Omega, Drift, and Black Knight). John Wick is a skin from cross-promotional advertisement from John Wick 3; and "John Wick" was also the nickname for a Battle Pass skin, The Reaper. Finally, Skull Trooper is an old skin from October 2017 that was the signature skin of a Fortnite streamer, Myth.

The distribution of number of skin comments followed an exponential distribution. Here is the number of comments per skin, in rank order (note the log-scale of the y-axis:

To better understand what is driving skin discussion, I performed a regression with a target variable of log(skin mentions). The covariates for this regression were:

Skin gender (including non-human skins)
Skin race ("non-human"; or "not visible" for some skins)
Number of days since release
Whether the skin was part of a Battle Pass
Whether the skin was a tier 1 or 100 Battle Pass skin

Whether the skin / character was featured in the Fortnite “story”

The results of this regression were that the primary drivers of skin discussion were time since release, and whether it was part of a battle pass. Skin “demographic” features did not matter. Here are the coefficients and p-values for the different features (coefficient is in log units):

Feature	Coefficient (p-value)
Battle Pass	1.4 (< 0.001)
Tier 1 skin	0.6 (0.17)
Tier 100 skin	1.3 (0.015)
Story skin	2.1 (0.002)
Days since release	0.004 (< 0.001)
Race (black)	-0.3 (0.44)
Gender (male)	-0.1 (0.46)

Analysis of what drives skin sentiment

In addition to analyzing which skins are discussed most, I wanted to understand what drives which skins are liked or disliked. I used VADER to analyze sentiment towards skins on a sentence-by-sentence level, then averaged the sentiment to get average sentiment towards each skin. The sentiment for each sentence can range from -1 to 1. Here the most liked and disliked skins in the sample (minimum 20 mentions):

Most liked skins		Least liked skins
Skin	Mean sentiment	Skin	Mean sentiment
Straw Ops	0.23	Shaman	-0.11
Psion	0.22	Birdie	-0.06
Scarlet Defender	0.21	Hypernova	-0.05

All of these skins are less popular skins, which highlights one of the biases of this analysis: people who discuss skins may have stronger opinions; and this bias may be biggest for the least discussed skins. Of these skins, only Hypernova is male, which may indicate that female skins have wider variance (both more liked and disliked).

To investigate that hypothesis, we can plot the distribution of sentiment towards both male and female skins. The overall mean sentiment skins was 0.084, with STD of 0.58:

While there are both well liked male and female skins, there is a large swath of male skins with neutral opinion (0-0.1).

To complete the analysis, I ran a regression targeting mean sentiment for each skin, with the same features as before. I started with a simple specification with covariates for race and gender. In this specification, the coefficient for gender was significantly negative for male skins (-0.01, ~ 0.15 standard deviations); no racial coefficient was significant. I then ran a complete specification with all covariates, and got similar results.

Covariate	Coefficients for spec 1 (p-value)	Coefficients for spec 2 (p-value)
Race	NS	NS
Gender (Male)	-0.011 (0.012)	-0.012 (0.007)
Battle Pass, etc.		NS

Discussion

In the first part of this analysis, I found that skins featured in the Battle Pass were discussed more often. This makes intuitive sense, as these skins are featured on splash screens and marketing for Fortnite. Many of these skins also have unlockable content, which people discuss how to unlock.

In terms of sentiment, I found that male skins had lower sentiment than female skins. One possible explanation for this is that Fortnite reddit skews towards young men, who might simply be more attracted to female skins. Popular streamers like Daequan often objectify female characters, making comments like, “Gimme that booty!” Another potential explanation is that female skins may have more diverse aesthetics, which allows people who prefer those aesthetics to attach to those skins. For example, many male skins share standard military profiles, and are relatively indistinguishable. In contrast, female skins can express a wider range of emotions, and may have more variety in clothes (skirts, tops, etc.). Some tangential evidence for this may be the large number of male skins with neutral sentiment.

As a final note to myself, if I want to revisit this type of analysis in the future, I need to improve the sentiment analysis. While I believe the assumptions of my current model – that sentence level sentiment reflects skin sentiment – is broadly true, in checking my data a large minority of samples have inaccurate sentiment. While performing more sophisticated sentiment analysis may take more time, it should give me a better estimate of entity sentiment, and frankly feel less hacky.

Thursday, August 22, 2019

Language representation in MUSE embeddings

I just posted a notebook to github that explores how language is represented in MUSE word embeddings.

Friday, August 9, 2019

Sentiment analysis does not find bias on NFL reddit

I previously published results using sentiment analysis to show that commenters on the NBA reddit had higher sentiment towards young players, high-scoring white players, and white coaches. In this post, I am going to extend that analysis to the NFL reddit. My overall finding is that like the NBA, NFL redditors like rookies and players that gain more yards. However, I did not find any significant coefficients for race. For coaches, I found the commenters like coaches that outperform expectations, but again did not find evidence of bias.

Brief review of the method

To try to understand what factors (e.g. performance, age, race) influence player popularity, I scraped millions of comments from r/NFL from 2013-2018. I then quantified each commenter's opinion toward players using the sentiment analyzer VADER. This analysis calculates whether a word is positive (“GOAT!”) or negative, and ties that feeling to a player. Sentiment scores generally ranged from -0.2 to 0.3. Finally, I quantified the impact of each factor on popularity by performing a least-square regression with an outcome variable of sentiment towards a player. Details of the analysis are in the previous post, and a series of notebooks covering scraping data, sentiment quantification, and regression.

Unique difficulties of performing sentiment analysis

Compared to the NBA, two factors make analyzing the NFL more challenging: the larger number of players; and the smaller number of players with comparable stats.

The larger number of players made resolving which player was being talked about more difficult (named entity recognition). I performed named entity recognition by identifying comments that contained player first or last names, then matched those names to players. If an NBA commenter mentioned "Blake," I could link that comment to Blake Griffin since there is only one active "Blake" in the NBA. However, in the NFL, "Blake" could refer to Blake Bortles, Blake Bell, Blake Jarwin, or others. This means that the best way to identify comments about NFL players is to find full name matches, in contrast to how people normally talk about players (just their first or last name). A second side effect of the larger number of players was that comment-player matching took longer, as the way I implemented it took O(n^2).

The nature of the NFL also made it harder to compare player stats. In basketball, all players score, rebound, and assist. In the NFL, half the players play defense; and on offense, lineman touch the ball; I ended up only analyzing skill-position players. This means that despite the NFL having more players overall, there were fewer players to analyze, and the statistical power of the analysis was lower.

Most and least popular players

To try to compare players between different positions (QB, RB, WR, and TE), I used Football Outsiders' advanced metric DVOA, which is defense adjusted z-score of yards above average. For example, if a QB and WR both had a DVOA of 1, they would both be one standard deviation better than the average player at their position.

Here are the most and least popular players for the years 2013-2018:

Lowest Sentiment Seasons Highest Sentiment Seasons

Player	Year	Avg Sentiment	Player	Year	Avg Sentiment
Michael Bennett	2017	-0.25	David Johnson	2016	0.24
Danny Trevathan	2015	-0.15	JJ Watt	2017	0.22
Vontaze Burfict	2015	-0.15	JJ Watt	2016	0.22
Vontaze Burfict	2016	-0.14

On the unpopular side, Michael Bennett in 2017 was one of the first players to kneel during the national anthem; and Vontaze Burfict has a reputation as a dirty player. For the popular players, all three are pro Bowlers. While I am not an avid NFL fan, these results pass my smell test.

Regression results

As I did with the NBA, to quantify which features were important, I ran a weighted least square regression with clustered standard errors at the player level. Due to the lack of features, I only present two specifications here. First, I ran a regression with DVOA as the only feature (spec 1). This single coefficient was statistically significant (t-stat of 2.1), albeit weakly, and shows people prefer high-achieving players. I then ran a second regression adding features for age, and race for players; and city-level statistics for commenters (spec 2). In this regression, DVOA was no longer significant, while the rookie coefficient was. Neither the overall age or race coefficients were significant.

Coefficient	(1)	(2)
DVOA	0.0057 (2.1)	0.0051 (1.4)
Rookie	-	0.018 (2.4)
1 year of youth<27 td="">	-	0.0011 (0.7)
Race (white)	-	0.0067 (1.2)

Compared to the NFL results, fewer coefficients were significant, and the t-statistics were smaller. I believe this is in part due to the problems mentioned previously: difficulty in matching players to comments; and fewer player performance features to compare with.

Beyond statistical power, this analysis be be complicated by the assumption that NFL fans can (or do) compare players from different positions fairly. Int his analysis, different positions are mixed together, but we are using a synthetic stat (DVOA) to compare them. While two players may be equivalent by DVOA, it may be hard for a casual fan to see that; instead they likely consider more basic stats, which could skew results (e.g. quarterbacks gain more yards than tight ends). While I had a categorical variable for position which was not significant, there may be more subtle influences.

Another difference between the NFL and NBA is that NFL players wear helmets, making personal connections weaker.

NFL coaches

We can also calculate sentiment towards NFL coaches. Here are the most popular and least popular coaches:

Coach	Year	Avg Sentiment	Player	Year	Avg Sentiment
Sean McVay	2017	0.28	Gregg Williams	2017	-0.29
Adam Gase	2016	0.27	Gregg Williams	2016	-0.23
Todd Bowles	2015	0.27	Sean Payton	2015-2016	-0.11
Marc Trestman	2013	0.24	Mike Smith	2013	-0.08

The most popular coach was a young, rookie coach who improved the Rams' wins by 7 in a single year. On the unpopular coach side, Gregg Williams was involved in "bountygate," a scandal where coaches were paying players for causing injuries.

Having looked at the most and least popular players, we can again perform a regression:

Coefficient	Magnitude (t-statistic)
Win % over expectation	0.22 (2.1)
Age (years)	-0.2W (-2.0)
Tenure with team (years)	0.25W (1.95)
Race (white)	Not significant

(For non Win coefficients, I expressed the magnitude in terms of wins.)

As with NBA coaches, NFL coaches were more popular when they outperformed expectations, were younger, or had longer tenure. In contrast to the NBA, where there was significant and large bias against coaches worth ~10 wins / year, we could not detect bias against NFL coaches. This could be due to due to a lack of power (there are many fewer NFL black coaches than NBA coaches); reflect differences in media coverage; or indeed reflect decreased bias among NFL commenters. It is interesting that NFL coaches are perceived to be much more important than NFL coaches, yet bias is less prevalent.

Saturday, February 16, 2019

What makes athletes popular? A sentiment regression analysis

By Michael Patterson, and Matt Goldman

(Standard disclaimer: The analyses contained here were done on personal time, and do not reflect the views of our employers.)

The 2018 Cleveland Cavs made the NBA finals while generating some of the hottest memes of 2018 ("We got an [expletive] squad now," "He boomed me."). Following the Cavs on Reddit, I (Mike) noticed something odd. Turkish rookie Cedi Osman was a particular fan favourite. Cedi played limited minutes with energy, and everyone joked that Cedi was the "GOAT" (Greatest Of All Time) carrying Lebron. In contrast, Tristan Thompson, a hero of the 2016 season, had an off year, and was the center of a meme for being traded ("Shump, TT, and the Nets pick"). For the Cavs at least, it seemed commenters gave the white players an easier time. And being a data scientist, I thought, "I could measure that!"

In this project, we used sentiment analysis and regression models to measure how performance, demographics, and race systematically predict sentiment towards players and coaches on the r/NBA and r/NFL subreddits. In doing so, we get a useful window into opinion formation in the online communities that increasingly dominate social and political discourse. Movements as varied as Black Lives Matter, #metoo and r/the_donald have leveraged online communities to connect disparate supporters around a common set of values and ideas. Social commentators have expressed concerns that herd mentalities and out-group biases can lead to the formation of distorted opinions in these settings, but it is hard to study such bias due to confounding factors. Studying professional athletes offers key advantages: we can measure sentiment associated with a large sample of athletes of varying race; and there are objective performance metrics.

This blog post includes a brief overview of our methodology and results. For details of the analysis, we have written a series of three Jupyter notebooks (linked below). We find that:

NBA Players

Reddit commenters like players who perform well; in the NBA, scoring 1 more PPG is worth approximately 0.02 standard deviations of sentiment
Commenters particularly like both young players (each year below the mean age of 26.7 is worth ~2.5 PPG), and old players (1.5 PPG for each year above 26.7)
The coefficient for race was overall not statistically significant (t <= 1.76)
In the NBA, scoring points for white players is worth ~3x as much as for black players
Commenters from “blue” cities (cities that supported Hillary Clinton in the 2016 general election) had higher sentiment towards NBA players, but this effect was smaller than 1 PPG.

NBA coaches

The overall sentiment towards coaches was less than that towards player (mean and median of 0.1 vs 0.13 for players)
One win above expectation was worth 0.06 standard deviations
There is a significant bias against black coaches in the NBA, worth ~10 wins

We measured performance in the NFL using Football Outsiders dVOA statistic; one point of DVOA was worth 0.02 standard deviations of sentiment
We did not detect any effect of race in the NFL, either for players or coaches

Sentiment Modeling

To quantify how redditors felt towards players, we used a natural language processing (NLP) technique called sentiment analysis. “Sentiment” is just a jargon-y way of saying whether someone is liked or disliked. The sentiment analysis technique we used (VADER) sums the positive and negative sentiment of the words in a sentence, and then normalizes them for an overall score. To tie these sentiment scores to players, we used a technique called named entity recognition. For details of the NLP, see this notebook.

Using these techniques, we analyzed over 2.5 million reddit comments from the 2013-2018 seasons (see here for how to scrape reddit) for the NBA and NFL. Since sentiment towards players can change over time, we calculated sentiment on a yearly basis. To get the sentiment towards a player in a year, we first calculated the average sentiment towards each player from each commenter, then averaged over all commenters (a mean-of-means). This was performed over the 2013-2018 seasons.

Using this sentiment model, scores generally range from -1 to 1, with 0 being neutral. The mean sentiment across all players was slightly positive, 0.13. We can check the results of our sentiment analysis by looking at the highest and lowest sentiment player-years:

Lowest Sentiment Seasons Highest Sentiment Seasons

Player	Year	Avg Sentiment	Player	Year	Avg Sentiment
Mike Dunleavy	2016	-0.11	Brandon Ingram	2015	0.27
Kelly Olynyk	2016	-0.08	Karl-Athony Towns	2015	0.26
Steve Blake	2015	-0.07	Marc Gasol	2014	0.25
Zaza Pachulia	2017	-0.07	Gordon Hayward	2014	0.24

In general, these sentiment values pass the sniff test. Dirty players like Kelly Olynyk, and Zaza Pachulia each received their low score in seasons immediately following incidents where they injured high-profile players; Brandon Ingram and Karl-Anthony Towns in 2015 were young players with potential, and Marc Gasol is the franchise player of Memphis. For a full table of player sentiment, see this .tsv, where the column ‘compound_mean_mean’ represents player sentiment.

Fig 1 (left panel) plots a histogram of this mean sentiment score across white and black players. Overall, the distributions are similar (unpaired t-test, p=0.07), with a standard deviation of 0.053 (calculated on players having at least 200 commenters). However, this need not be the whole story. White and black players differ on many other characteristics that may also determine sentiment. For example, Fig. 1 (right panel) plots player age versus average sentiment score. Here we can see that both young and old players are more liked than NBA middle-age players. In order to make useful statements about the role of race in determining sentiment, we need to consider how other player characteristics can confound and mediate such a relationship.

Fig. 1: Graphs exploring sentiment distributions. Left panel: Histogram of sentiment towards white and black players. Overall the distributions are similar. Right panel: Average sentiment towards players for each age. Young and older players are more popular than average.

Regression Analysis: NBA Players

The above graphs are informative, but not conclusive, since many factors can be correlated with each other, and we can’t make causal inferences. For example, young players might be popular because they are full of potential, or they might be popular because they are underpaid. To disentangle these effects, we can use multi-variate regression analysis, where we consider all of these factors simultaneously. Rather than analyze data at the player-year level, we can analyze it at the player-user-year level to gain more samples.

In our regression analysis, we set our target variable to be the average player sentiment from a commenter in a year. We start our analysis with simple models, and gradually add more and more covariates (features). Starting with a simple regression using PPG as a covariate, we find that the PPG coefficient is significant, and 1 PPG is worth about 0.01 standard deviations of sentiment (0.0007 compared to the standard deviation of 0.053). In this regression, I also included the covariate of minutes played, which was not significant.

	Specification
Coefficient (t-statistic)	(1)	(2)	(3)	(4)	(5)
Intercept	0.07	0.067	0.054	0.0853	0.08
PPG	0.0007 (1.985)	0.0014 (3.5)	0.0012 (2.3)	0.0007 (1.91)	0.001 (2.2)
Rookie		0.020 (5.4)	0.021 (4.7)		0.022 (4.8)
Youth		0.0026 (3.5)	0.0028 (3.6)		0.0026 (3.2)
Oldness		0.0014 (2.04)	0.0014 (1.9)		0.0012 (1.7)
White Player (race)			0.0004 (0.08)	0.0095 (1.76)	0.008 (1.6)
White Player X PPG				0.0022 (2.6)	0.0016 (2.6)
Commenter City “Blueness”					0.0096 (3.2)
Blue Commenter X White Player					-0.005 (-0.67)

All stats were downloaded from basketball-reference.com. Youth (oldness) defined as years below (above) the average NBA age (26.7 years). Regression was done at commenter-player-year level, weighted with square root of comment count, and with clustered errors at player level. For details, see future notebook.

After this initial regression, we start to increase complexity. For the full list of specifications that we used, please see this Google spreadsheet. The next bit of complexity we added was more performance variables, and simple demographics. Here we found, surprisingly, that no other performance variable was significant for sentiment. For age, we found that commenters preferred both young and old players, and rookies the most. Being a rookie was worth ~14 PPG, one year of youth was worth ~2-2.5 PPG (0.0026 vs 0.0014 in this specification), and one year of oldness being worth 1 PPG (0.0014 for each). This might be explained by the potential of youth, and survivors bias of players who get older.

In spec (3), we add race as a covariate, which coefficient was not significant. However, the confidence interval on this effect is fairly wide, and we can only conclude that race is less important than 3 PPG. However in specs (4+5) , we see that white players received 2-3x the benefit of scoring as black players did. In fact, in specification (4) we see that the coefficient for scoring for PPG dips below statistical significance for black players alone.

All the previous coefficients were measured at the player level, but there may also be bias at the commenter level. To measure this, we took each user’s flair, and assigned it to a city (on Reddit, users can express affiliation with a team, e.g. “[CLE] Cedi Osman”). We found that when a user has flair for a city that Clinton won disproportionately more, they had a higher sentiment towards players. To check whether we could detect changes in politicization over time, we interacted year with Clinton vote share, but did not find significant coefficients). Overall, these results are in line with research that shows Democrats have higher favorability towards the NBA and NFL than Republicans do. Interpreting this coefficient, however, is difficult, as it is correlated with many other factors of a city.

Regression analysis: NBA coaches

Using the same techniques, we can quantify sentiment towards coaches. Here are the most liked and disliked coach-seasons since 2013. This list looks reasonable, as popular coaches like Brad Stevens and Steve Kerr are at the top. Overall, the mean and median sentiment towards coaches was 0.1. For the full table, see this .tsv.

Highest Sentiment Seasons Lowest Sentiment Seasons

Coach	Year	Avg sentiment	Coach	Year	Avg Sentiment
Brad Stevens (BOS)	2014-2015	0.3	George Karl (SAC)	2015-2016	-0.15
Erik Spoelstra (MIA)	2014-2015	0.26	Earl Watson (PHX)	2017-2018	-0.1
Brad Stevens (BOS)	2017-2018	0.24	Kurt Rambis (NYK)	2015-2016	-0.1
Steve Kerr (GSW)	2014-2015	0.23	Fred Hoiberg (CHI)	2015-2016	-0.06

As before, we can use regression analysis to understand what factors influence coach sentiment. However, the coach analysis has less power than the player analysis for a few reasons: 1.) There is only one coach per team, limiting the sample size, and increasing the influence of outlier coaches. 2.) People talk less about coaches, making estimates of coach sentiment less reliable. 3.) Coaches have fewer covariates compared to players, increasing the chance of omitted variable bias; for example, coaches may be well liked for interviews, which we don’t quantify here.

We can start with the simplest regression, predicting sentiment using variables based on wins: the raw win percentage in a season; career win percentage for the coach; and winning percentage compared to the over-under for wins in a season as a proxy for over- or under-achievement. Surprisingly, the coefficient for wins alone is negative, but the coefficient for wins above expectation was highly positive (specification (1)). There is also a positive coefficient for career win percentage, albeit half the size of the in-season coefficient. The next covariates we can add are time-related, like tenure with team, or age (specification (2)). Both of these coefficients are significant: one year of age is worth approximately 0.5 wins; the effect of tenure is twice as big as age. The opposing signs of these coefficients would allow young coaches to retain their popularity as they stay with the same team. We also tested a covariate for former players, which was not significant.

	Specification
Coefficient (t-statistic)	(1)	(2)	(3)
Intercept	-0.01	0.14	0.14
Season Win %	-0.14 (-2.2)	-0.13 (-1.94)	-0.08 (-1.5)
Win% - pre-season over/under	0.51 (6.8)	0.48 (6.6)	0.39 (4.5)
Career Win %	0.23 (3)	0.25 (3.3)	0.17 (2.3)
Age (years)		-0.0034 (-2.4)	-0.0033 (-2.7)
Tenure with team (years)		0.005 (2.1)	0.0048 (2.2)
Race (White)			0.047 (2.8)

Finally, we can add a covariate for race. Here, we find the coefficient is significant and large, worth approximately 10 wins (0.047, vs the coefficient for one win of 0.0047). This coefficient was surprisingly large, and different enough from our player analysis, that we wanted to double check it. First, we plotted the residual of our predicted sentiment (predicted sentiment - measured sentiment) using a model that ignored race, splitting the data for white and black coaches (Fig. 2, left panel). Here we can see that a segment of black coaches have negative residuals, meaning that their measured sentiment was less than predicted, a sign that the coefficient for race is negative.

Another reason for concern is that there are relatively few coaches, which means some outlier coaches could be influencing our results. To verify this was not the case, we can perform a bootstrapped regression where we re-sample our data at the coach level (namely, we take a sample where half the coaches are missing, and fit a regression; Fig. 2, right panel). If we do this, we find that the distribution of coefficients for race are different from zero.

Fig. 2: Left: Histogram of residuals of sentiment using a model that did not include race. Right panel: distribution of coefficients for race, using bootstrapped samples.

Conclusion

We found that factors like age, performance, and race were related to sentiment towards players. The overall sentiment on reddit was positive, probably due to moderation policies that remove abusive language. This moderation policy limits our ability to measure overt racism, as those posts are removed.

We only found one piece of evidence of racial bias for NBA players, namely that high scoring white players are well liked. In general, this fits with our personal observations, as players like Luka Doncic and Gordon Hayward (both legitimately great players), receive lots of attention and favoritism on reddit. We do not know the cause of this bias: it could be due to the relative scarcity of white players yielding a novelty factor; or it could reflect unconscious bias by a subset of reddit users. On the flip side, some of the least popular players on reddit are low-skill white players known for a bruising style and dangerous play.

We also found evidence that commenters from cities that supported Clinton had slightly higher sentiment towards the NBA. This could align with recent research that sports are becoming politicized. For example, conservative sentiment towards the NFL has dropped since the national anthem protests. We did not, however, find any change in time for this effect.

We found a significant, large bias against black NBA coaches. Anecdotally, we can think of two successful, young black coaches, Tyronn Lue and Dwane Casey, that have gotten consistent criticism. In contrast, probably the three most popular coaches, Steve Kerr, Brad Stevens, and Gregg Popovich, are all white. It is possible that black coaches systematically differ from white coaches in ways we have not quantified, although we did not detect a significant coefficient for ex-players.

This analysis could be improved by using more refined sentiment analysis, named entity recognition, incorporating other social media sources, and by taking a finer-grained approached to time. For sentiment analysis and NER, we used quick methods like VADER and filtering comments to those about a single player. These analyses could both be improved simultaneously by using a combined sentiment-entity extraction; this would require training a model on a significant amount of labeled data. For other social media sources, we could also try to use Twitter, or perhaps go down to the team subreddit level to gather more data; this would allow us to test the robustness of these results. For time, it would be interesting to analyze sentiment on a game-by-game basis; for example, white players might receive more praise for a high scoring game than black players.

So far, we have only presented results from the NBA, but we also performed a similar analysis for the NFL. We’ll be putting those out shortly.

Pages