...And You Will Know Me By The Trail of Papers

Wednesday, March 22, 2023

Ingredient recommendation with Word2Vec

As a project during pat leave, I used ML to build an ingredient recommender. The model is relatively simple, using Word2Vec and co-occurence counts to recommend ingredients.

One interesting observation is that the model appeared to perform better using smaller ingredient embeddings. For example with 100-dimensional embeddings, "carrot" was similar to "bone." But with 16 dimensions, that was less true.

The base embeddings already had some intuitive ingredient recommendations. For example, if you asked for an ingredient that goes with "kiwi and banana," "yogurt" was a top answer. Or if you asked what to do with "lettuce, and feta cheese", it suggested a wrap.

To make the embedding into a more fleshed out recommender, I calculated how often ingredients appeared together. This then allowed the recommender to find ingredients that are similar to each other but don't occur often (as those would be more "novel").

On a few examples, this seemed to work, as it recommended "kumquats" for oranges.

If you are interested in the details of the training, I wrote up a notebook

And if you want to try the recommender directly, you can use it here.

Tuesday, April 21, 2020

Salaries for Software engineers appear to have stagnated

Salaries for software engineers appear to have stagnated over the last three years:
* engineers in 2019 made~$1k more than 2018
* in 2020 made ~$10k more than 2019

Salaries are obviously still high, but it appears the explosive growth is over.

To estimate these numbers, I scraped salary data for large companies in 2018-2020, then ran a regression for salary with coefficients for level, company, location, and year.

Most coefficients made sense: Facebook, Google, and Uber paid top of market; and Saleforce at the bottom.

(I left Microsoft out due to unusual levels)

Surprisingly, the California salary was only ~$11k more than Washington state

Full table (coefficients in $1,000s):

	coef	P>\|t\|
common_level[senior]	313	0
common_level[staff]	486	0
common_level[swe2]	207	0
company[T.Apple]	-7	0.329
company[T.Facebook]	46	0
company[T.Google]	37	0
company[T.Salesforce]	-94	0
company[T.Uber]	48	0
C(year)[T.2019]	1	0.814
C(year)[T.2020]	13	0.01
cali[T.True]	11	0.009

Saturday, September 7, 2019

Fortnite Reddit prefers female skins

Fortnite is the most popular game of 2018-2019 (and I would argue one of the best designed games ever). “Skins” in Fortnite are costumes for your character, letting you dress up as an astronaut, bigfoot, or a fish controlling a robot body. The release of each new skin is met with discussion online; recently, a black female skin “Luxe” was released, and people seemed unusually critical of the skin. Since I had all of the code from my NBA and NFL analyses*, I figured I would investigate to see how skin race and gender influences sentiment on Fortnite reddit. In terms of results, the two headlines are:

Raw mentions for skins is primarily driven by time since release, and whether a skin was included in a Battle Pass
Sentiment is higher for female skins

* Don’t let “I already have the code” fool you on a data science project. Most of the work was in collecting the data and cleaning it.

Methods

For details of my methods, see this blog post. In brief, I scraped Fortnite reddit for comments from January 2018 through July 2019, with the help of pushshift.io. I then performed named entity recognition* to identify which posts were about Fornite skins. To quantify which skins were most liked, I used VADER with a lexicon modified for Fortnite. For covariates, I scraped two Fortnite skins websites, Gamepedia and Progameguides. All notebooks used for the project can be found on github.

* For NER this time, I tried fine-tuning spaCy’s NER model. I labeled entities for ~300 comments. I found that for skins that had 5+ labels, the NER worked fairly well (decent recall on spot-checked posts). However, for skins that had <= 3 labels, the recall was abysmal. Rather than hand labeling 1,500 comments (5 for each skin), I decided to go back to simple regex extraction for skin names

Analysis of which skins get discussed the most

Before diving into which skins had the highest sentiment, I first wanted to see which skins were discussed the most. Here are the five most commented skins:

Skin	Mentions
Omega	40,300
John Wick	29,900
Drift	29,400
Skull Trooper	23,500
Black Knight	17,900

To understand why these skins are popular, it helps to know a bit about how Fortnite is played. Every 3-4 months, the developers release a “Season,” which includes a “Battle Pass.” The Battle Pass costs around $10, and includes access to a large number of skins that get unlocked as you play. Three of the above skins are from the Battle Pass (Omega, Drift, and Black Knight). John Wick is a skin from cross-promotional advertisement from John Wick 3; and "John Wick" was also the nickname for a Battle Pass skin, The Reaper. Finally, Skull Trooper is an old skin from October 2017 that was the signature skin of a Fortnite streamer, Myth.

The distribution of number of skin comments followed an exponential distribution. Here is the number of comments per skin, in rank order (note the log-scale of the y-axis:

To better understand what is driving skin discussion, I performed a regression with a target variable of log(skin mentions). The covariates for this regression were:

Skin gender (including non-human skins)
Skin race ("non-human"; or "not visible" for some skins)
Number of days since release
Whether the skin was part of a Battle Pass
Whether the skin was a tier 1 or 100 Battle Pass skin

Whether the skin / character was featured in the Fortnite “story”

The results of this regression were that the primary drivers of skin discussion were time since release, and whether it was part of a battle pass. Skin “demographic” features did not matter. Here are the coefficients and p-values for the different features (coefficient is in log units):

Feature	Coefficient (p-value)
Battle Pass	1.4 (< 0.001)
Tier 1 skin	0.6 (0.17)
Tier 100 skin	1.3 (0.015)
Story skin	2.1 (0.002)
Days since release	0.004 (< 0.001)
Race (black)	-0.3 (0.44)
Gender (male)	-0.1 (0.46)

Analysis of what drives skin sentiment

In addition to analyzing which skins are discussed most, I wanted to understand what drives which skins are liked or disliked. I used VADER to analyze sentiment towards skins on a sentence-by-sentence level, then averaged the sentiment to get average sentiment towards each skin. The sentiment for each sentence can range from -1 to 1. Here the most liked and disliked skins in the sample (minimum 20 mentions):

Most liked skins		Least liked skins
Skin	Mean sentiment	Skin	Mean sentiment
Straw Ops	0.23	Shaman	-0.11
Psion	0.22	Birdie	-0.06
Scarlet Defender	0.21	Hypernova	-0.05

All of these skins are less popular skins, which highlights one of the biases of this analysis: people who discuss skins may have stronger opinions; and this bias may be biggest for the least discussed skins. Of these skins, only Hypernova is male, which may indicate that female skins have wider variance (both more liked and disliked).

To investigate that hypothesis, we can plot the distribution of sentiment towards both male and female skins. The overall mean sentiment skins was 0.084, with STD of 0.58:

While there are both well liked male and female skins, there is a large swath of male skins with neutral opinion (0-0.1).

To complete the analysis, I ran a regression targeting mean sentiment for each skin, with the same features as before. I started with a simple specification with covariates for race and gender. In this specification, the coefficient for gender was significantly negative for male skins (-0.01, ~ 0.15 standard deviations); no racial coefficient was significant. I then ran a complete specification with all covariates, and got similar results.

Covariate	Coefficients for spec 1 (p-value)	Coefficients for spec 2 (p-value)
Race	NS	NS
Gender (Male)	-0.011 (0.012)	-0.012 (0.007)
Battle Pass, etc.		NS

Discussion

In the first part of this analysis, I found that skins featured in the Battle Pass were discussed more often. This makes intuitive sense, as these skins are featured on splash screens and marketing for Fortnite. Many of these skins also have unlockable content, which people discuss how to unlock.

In terms of sentiment, I found that male skins had lower sentiment than female skins. One possible explanation for this is that Fortnite reddit skews towards young men, who might simply be more attracted to female skins. Popular streamers like Daequan often objectify female characters, making comments like, “Gimme that booty!” Another potential explanation is that female skins may have more diverse aesthetics, which allows people who prefer those aesthetics to attach to those skins. For example, many male skins share standard military profiles, and are relatively indistinguishable. In contrast, female skins can express a wider range of emotions, and may have more variety in clothes (skirts, tops, etc.). Some tangential evidence for this may be the large number of male skins with neutral sentiment.

As a final note to myself, if I want to revisit this type of analysis in the future, I need to improve the sentiment analysis. While I believe the assumptions of my current model – that sentence level sentiment reflects skin sentiment – is broadly true, in checking my data a large minority of samples have inaccurate sentiment. While performing more sophisticated sentiment analysis may take more time, it should give me a better estimate of entity sentiment, and frankly feel less hacky.

Thursday, August 22, 2019

Language representation in MUSE embeddings

I just posted a notebook to github that explores how language is represented in MUSE word embeddings.

Friday, August 9, 2019

Sentiment analysis does not find bias on NFL reddit

I previously published results using sentiment analysis to show that commenters on the NBA reddit had higher sentiment towards young players, high-scoring white players, and white coaches. In this post, I am going to extend that analysis to the NFL reddit. My overall finding is that like the NBA, NFL redditors like rookies and players that gain more yards. However, I did not find any significant coefficients for race. For coaches, I found the commenters like coaches that outperform expectations, but again did not find evidence of bias.

Brief review of the method

To try to understand what factors (e.g. performance, age, race) influence player popularity, I scraped millions of comments from r/NFL from 2013-2018. I then quantified each commenter's opinion toward players using the sentiment analyzer VADER. This analysis calculates whether a word is positive (“GOAT!”) or negative, and ties that feeling to a player. Sentiment scores generally ranged from -0.2 to 0.3. Finally, I quantified the impact of each factor on popularity by performing a least-square regression with an outcome variable of sentiment towards a player. Details of the analysis are in the previous post, and a series of notebooks covering scraping data, sentiment quantification, and regression.

Unique difficulties of performing sentiment analysis

Compared to the NBA, two factors make analyzing the NFL more challenging: the larger number of players; and the smaller number of players with comparable stats.

The larger number of players made resolving which player was being talked about more difficult (named entity recognition). I performed named entity recognition by identifying comments that contained player first or last names, then matched those names to players. If an NBA commenter mentioned "Blake," I could link that comment to Blake Griffin since there is only one active "Blake" in the NBA. However, in the NFL, "Blake" could refer to Blake Bortles, Blake Bell, Blake Jarwin, or others. This means that the best way to identify comments about NFL players is to find full name matches, in contrast to how people normally talk about players (just their first or last name). A second side effect of the larger number of players was that comment-player matching took longer, as the way I implemented it took O(n^2).

The nature of the NFL also made it harder to compare player stats. In basketball, all players score, rebound, and assist. In the NFL, half the players play defense; and on offense, lineman touch the ball; I ended up only analyzing skill-position players. This means that despite the NFL having more players overall, there were fewer players to analyze, and the statistical power of the analysis was lower.

Most and least popular players

To try to compare players between different positions (QB, RB, WR, and TE), I used Football Outsiders' advanced metric DVOA, which is defense adjusted z-score of yards above average. For example, if a QB and WR both had a DVOA of 1, they would both be one standard deviation better than the average player at their position.

Here are the most and least popular players for the years 2013-2018:

Lowest Sentiment Seasons Highest Sentiment Seasons

Player	Year	Avg Sentiment	Player	Year	Avg Sentiment
Michael Bennett	2017	-0.25	David Johnson	2016	0.24
Danny Trevathan	2015	-0.15	JJ Watt	2017	0.22
Vontaze Burfict	2015	-0.15	JJ Watt	2016	0.22
Vontaze Burfict	2016	-0.14

On the unpopular side, Michael Bennett in 2017 was one of the first players to kneel during the national anthem; and Vontaze Burfict has a reputation as a dirty player. For the popular players, all three are pro Bowlers. While I am not an avid NFL fan, these results pass my smell test.

Regression results

As I did with the NBA, to quantify which features were important, I ran a weighted least square regression with clustered standard errors at the player level. Due to the lack of features, I only present two specifications here. First, I ran a regression with DVOA as the only feature (spec 1). This single coefficient was statistically significant (t-stat of 2.1), albeit weakly, and shows people prefer high-achieving players. I then ran a second regression adding features for age, and race for players; and city-level statistics for commenters (spec 2). In this regression, DVOA was no longer significant, while the rookie coefficient was. Neither the overall age or race coefficients were significant.

Coefficient	(1)	(2)
DVOA	0.0057 (2.1)	0.0051 (1.4)
Rookie	-	0.018 (2.4)
1 year of youth<27 td="">	-	0.0011 (0.7)
Race (white)	-	0.0067 (1.2)

Compared to the NFL results, fewer coefficients were significant, and the t-statistics were smaller. I believe this is in part due to the problems mentioned previously: difficulty in matching players to comments; and fewer player performance features to compare with.

Beyond statistical power, this analysis be be complicated by the assumption that NFL fans can (or do) compare players from different positions fairly. Int his analysis, different positions are mixed together, but we are using a synthetic stat (DVOA) to compare them. While two players may be equivalent by DVOA, it may be hard for a casual fan to see that; instead they likely consider more basic stats, which could skew results (e.g. quarterbacks gain more yards than tight ends). While I had a categorical variable for position which was not significant, there may be more subtle influences.

Another difference between the NFL and NBA is that NFL players wear helmets, making personal connections weaker.

NFL coaches

We can also calculate sentiment towards NFL coaches. Here are the most popular and least popular coaches:

Coach	Year	Avg Sentiment	Player	Year	Avg Sentiment
Sean McVay	2017	0.28	Gregg Williams	2017	-0.29
Adam Gase	2016	0.27	Gregg Williams	2016	-0.23
Todd Bowles	2015	0.27	Sean Payton	2015-2016	-0.11
Marc Trestman	2013	0.24	Mike Smith	2013	-0.08

The most popular coach was a young, rookie coach who improved the Rams' wins by 7 in a single year. On the unpopular coach side, Gregg Williams was involved in "bountygate," a scandal where coaches were paying players for causing injuries.

Having looked at the most and least popular players, we can again perform a regression:

Coefficient	Magnitude (t-statistic)
Win % over expectation	0.22 (2.1)
Age (years)	-0.2W (-2.0)
Tenure with team (years)	0.25W (1.95)
Race (white)	Not significant

(For non Win coefficients, I expressed the magnitude in terms of wins.)

As with NBA coaches, NFL coaches were more popular when they outperformed expectations, were younger, or had longer tenure. In contrast to the NBA, where there was significant and large bias against coaches worth ~10 wins / year, we could not detect bias against NFL coaches. This could be due to due to a lack of power (there are many fewer NFL black coaches than NBA coaches); reflect differences in media coverage; or indeed reflect decreased bias among NFL commenters. It is interesting that NFL coaches are perceived to be much more important than NFL coaches, yet bias is less prevalent.

Pages