...And You Will Know Me By The Trail of Papers

tag:blogger.com,1999:blog-28780207098093504482026-06-04T12:39:02.181-04:00...And You Will Know Me By The Trail of PapersOR: o.OMikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.comBlogger115125tag:blogger.com,1999:blog-2878020709809350448.post-34337752143431865812023-03-22T15:05:00.001-04:002023-03-22T15:05:25.082-04:00Ingredient recommendation with Word2Vec

As a project during pat leave, I used ML to build an ingredient recommender. The model is relatively simple, using Word2Vec and co-occurence counts to recommend ingredients.One interesting observation is that the model appeared to perform better using smaller ingredient embeddings. For example with 100-dimensional embeddings, "carrot" was similar to "bone." But with 16 dimensions, that was less

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-87059220247371612332020-04-21T18:06:00.001-04:002020-04-21T18:06:18.470-04:00Salaries for Software engineers appear to have stagnated

Salaries for software engineers appear to have stagnated over the last three years: * engineers in 2019 made~$1k more than 2018 * in 2020 made ~$10k more than 2019 Salaries are obviously still high, but it appears the explosive growth is over. To estimate these numbers, I scraped salary data for large companies in 2018-2020, then ran a regression for salary with coefficients for level, company,

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-82090126839936934092019-09-07T15:07:00.002-04:002019-09-07T15:14:13.315-04:00Fortnite Reddit prefers female skins

Fortnite is the most popular game of 2018-2019 (and I would argue one of the best designed games ever). “Skins” in Fortnite are costumes for your character, letting you dress up as an astronaut, bigfoot, or a fish controlling a robot body. The release of each new skin is met with discussion online; recently, a black female skin “Luxe” was released, and people seemed unusually critical of the

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-27622620428662095812019-08-22T08:38:00.003-04:002019-08-22T08:38:39.219-04:00Language representation in MUSE embeddings

I just posted a notebook to github that explores how language is represented in MUSE word embeddings.

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-40077693618819289212019-08-09T15:32:00.000-04:002019-08-09T15:32:09.680-04:00Sentiment analysis does not find bias on NFL reddit

I previously published results using sentiment analysis to show that commenters on the NBA reddit had higher sentiment towards young players, high-scoring white players, and white coaches. In this post, I am going to extend that analysis to the NFL reddit. My overall finding is that like the NBA, NFL redditors like rookies and players that gain more yards. However, I did not find any

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-65042992071016446312019-02-16T21:40:00.000-05:002019-03-05T17:38:00.012-05:00What makes athletes popular? A sentiment regression analysis

By Michael Patterson, and Matt Goldman (Standard disclaimer: The analyses contained here were done on personal time, and do not reflect the views of our employers.) The 2018 Cleveland Cavs made the NBA finals while generating some of the hottest memes of 2018 ("We got an [expletive] squad now," "He boomed me."). Following the Cavs on Reddit, I (Mike) noticed something odd. Turkish rookie

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-27776043425664265162018-02-16T00:52:00.001-05:002018-02-16T00:52:55.544-05:00Paper trail day trip: Compositional Coulee

The distributional hypothesis states that the meaning of a word is related to the words that usually surround it. For example "table" is usually near words like "sit," "chair," and "wood," so we can get a sense of what "table" means. For phrase, you can usually figure out a phrase's meaning by combining the meaning of its constituent words. For example "table tennis," has something to do with

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-42934354737043161832017-12-28T12:03:00.001-05:002017-12-28T12:03:48.031-05:00Paper trail day trip: Lac Lison

(Blogger note: I am now a Data Scientist in Microsoft Support Engineering, working with natural language processing (NLP). I have been reading NLP papers for work, and rather than just post summaries to our Teams (Microsoft Slack) channel, I figured I could summarize them here for my future self, and others.) I use word embeddings regularly at work, specifically word2vec. Word2vec creates

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-20399885306935587882017-01-01T21:06:00.000-05:002017-03-23T00:51:46.687-04:00Using KDTrees in Apache Spark

OR: The best Coffee Shop in Hong Kong to catch Pokemon I work with spatial data all the time, and one of the most common things I do with spatial data is find the nearest locations between two sets of objects. For example, in the context of Pokemon Go, you might ask, "what is the nearest Pokestop to a given Pokemon?" The standard way to do this is to use a data structure called a KDTree. In the

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-91380558954145964322016-03-31T16:39:00.002-04:002016-03-31T16:39:33.946-04:00A simple GUI for analyzing BioDAQ data

The Palmiter lab often monitors food and water intake in response to a variety of stimuli. To quantify these measurements, we house mice in BioDAQ chambers, which will record how much a mouse eats or drinks down to 0.01g. While the chambers are nice, the software that comes with them is terrible. In addition to being slow, it outputs data for each cage separately, which means you get to enjoy

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com3tag:blogger.com,1999:blog-2878020709809350448.post-7658891778772081042016-03-28T15:03:00.002-04:002016-03-28T15:03:45.166-04:00A simple GUI for analyzing thermal images

One of the grad students in the lab has started a project on thermoregulation, and he measures mouse tail temperature using an infrared camera from FLIR. FLIR has an analysis tool for its cameras which works OK, but is not really designed for analyzing hundreds of images. To save him some time, I made a simple GUI for analyzing thermal images. In this post I'm going to outline the design of the

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com7tag:blogger.com,1999:blog-2878020709809350448.post-86087503996139311942016-01-16T12:00:00.000-05:002016-01-18T19:10:55.815-05:00Introducing the mechanisms Twitter bot

When describing what we know about the world, scientists often have to state what we don't know. Rather than simply stating, "We don't know how X works," scientists (and especially biologists) have come up with the beautiful syntax, "The mechanisms underlying X are not yet understood." Why use five syllables when you could use 14! In a previous blog post, I explored the history of this syntax,

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-1975897790803592732015-12-28T09:00:00.000-05:002015-12-28T09:00:09.169-05:00Exploring random forest hyper-parameters using a League of Legends dataset

Over the last few blog posts, I have used random forests to investigate data from the game League of Legends. In this last post, I will explore model optimization. Specifically I will look at how hyper-parameters like forest size, and node size can influence classification accuracy, show that dimensionality reduction doesn't help random forests, and compare random forest performance to Naive

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-78399599662905810712015-11-20T19:01:00.000-05:002015-12-14T20:37:29.381-05:00Influence of region, skill and patch on predictability in League of Legends

Last month, I used random forests to predict the winner of League of Legends games. Since then I have downloaded datasets from different regions, different ELOs, and on different patches, and checked how these factors influence predictability. This post will summarize the results, but for details of the analyses, please see this notebook. Differences between regions Korea is famed

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-74312224969390529992015-10-22T01:30:00.000-04:002015-11-23T22:47:12.599-05:00Visualizing champion difficulty in Jupyter

I've been playing around some more with League of Legends analysis and Jupyter. This time I scraped champion information from League of Graphs, and calculated which champions were the easiest / hardest to play, and which champions are best in the early / late game. Unfortunately, Blogger does not interact with Jupyter, so I have posted the notebook on nbviewer, if you are interested in taking a

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-68230559118750671722015-10-16T13:13:00.000-04:002015-11-23T22:46:44.387-05:00Playing in random forests in League of Legends

I've wanted to learn more about machine learning, specifically python's scikit-learn module. I'm an avid League of Legends player (summoner names lemmingo and Umiy), and Riot Games provides a thorough API for querying game data, so I decided to explore machine learning using League of Legends (LoL). Specifically, I wanted to see if I could predict the eventual winner of a game long before the

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com2tag:blogger.com,1999:blog-2878020709809350448.post-87464000047554328572015-09-21T12:00:00.000-04:002015-09-21T12:00:04.740-04:00Walk Along the Paper Trail: Garfield Gap

It's been three years since I did a Walk Along paper summary! Wow! Recently in our journal club, we discussed a paper by Garfield et al from the Lowell lab. In discussion, some unusually interesting points were raised, and I'd like to think about them here. Background I've written about this before, but here is a quick refresher on hunger in the brain. Many types of neurons control metabolism,

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-41556464282116844282015-06-11T08:00:00.000-04:002015-06-11T08:00:05.708-04:00The catch-22 of recording from identified cell populations

Recording from individual cells in genetically identified populations is the hottest technique in systems neuroscience right now (I am, of course, totally biased since that's what I'm trying to do). To record from identified populations, you first choose a mouse line that expresses Cre driven by a cell-specific marker like D1R. Then you transduce those cells with floxed ChR2 or GCaMP so

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com2tag:blogger.com,1999:blog-2878020709809350448.post-42701587272487099082015-05-18T15:00:00.000-04:002015-05-18T18:32:47.449-04:00Playing with deconvolution and GCaMP6 imaging data

The Palmiter lab recently got an Inscopix microscope. We are still troubleshooting our surgeries and recordings right now, so we don't have any imaging data yet. Given that, I wanted to set up our analysis pipeline ahead of time. Specifically, I wanted to see how we can identify calcium events. In this post I will explore how well deconvolution works on calcium imaging data from the Svoboda lab.

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com2Mount Rainier, Mount Rainier National Park, Washington 98304, USA46.8527777 -121.7602777000000246.510249699999996 -122.40572470000002 47.1953057 -121.11483070000001tag:blogger.com,1999:blog-2878020709809350448.post-51393863804439374062015-05-06T17:17:00.003-04:002015-06-09T22:59:19.401-04:00A cheap source for 230 μm ferrules

UPDATE: Somewhat embarrassingly, and fortuitously, I hadn't researched enough ferrule suppliers before making this post. After contacting a couple more, I found a supplier that sells 230 μm ID ferrules for $1.5 / pc. It is the Shenzhen Han Xin Hardware Mold Co. They do not have the 230 μm ID ferrules listed on their Alibaba page, but you can contact them for a custom

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com1tag:blogger.com,1999:blog-2878020709809350448.post-6706738109731015352015-01-29T12:00:00.000-05:002015-01-29T12:00:03.484-05:00Where's Bregma?

OR: A fun game for every surgery! If you're interested in any of the important parts of the brain (i.e. not cortex), you're going to need do to stereotaxic surgery to target the area you're interested in. And to do stereotaxic surgery, you need fiducial coordinates on the skull, which canonically are bregma and lambda. You would think that the definitions of bregma and lambda are well known,

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com4tag:blogger.com,1999:blog-2878020709809350448.post-50586143024831508692015-01-08T19:33:00.000-05:002015-01-08T19:33:01.925-05:00The (Near) Future of Cell Type Specificity

I have been growing more interested in genomics, so this quarter I took a class on new techniques in genomics.* What I learned is that the most important aspect of modern genomics is cost. Sequencing gets exponentially cheaper every year, which makes answering old questions less expensive, and opens up new experimental possibilities. For example for cancer, in the past doctors might

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-17946201339385394382014-11-20T17:00:00.000-05:002014-11-20T17:00:02.632-05:00Deciphering the syntax of Nature article titles

We all want to publish in Nature. Papers in Nature are (supposed to be) the complete package: reliable results that show something novel; cool techniques; a famous corresponding author. And if you want to get one, you need a title that shows you are a refined gentleperson who belongs in the Nature club. So to help you, dear blog reader, I have scoured the archives of Nature* to decipher the

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com0tag:blogger.com,1999:blog-2878020709809350448.post-50630832987887504142014-11-17T18:14:00.000-05:002014-11-17T18:14:29.391-05:00Channelrhodopsinning: Your light doesn't always do what you want.

Two years ago, I wrote a post about the common mistakes I notice in channelrhodopsin papers. Since then, two labs have developed improved photoactivatable chloride channels for inhibition, binary logic has been introduced to neurons, and you can use one vector to photostimulate and record from a neuron type. Beyond those headline advances, though, were some smaller papers that highlight some of

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com1tag:blogger.com,1999:blog-2878020709809350448.post-41532109706784071802014-02-03T12:00:00.000-05:002014-02-06T19:19:30.173-05:00Questions of Taste

Some time ago, Neuroecology asked people in the twitterverse what the biggest questions in their field are. While I'm no longer working on taste, these are the three big questions to my mind. Take these with a rock of salt, as I'm not 100% up to date on the literature, and these are just my opinion. 0. To what extent, and where, is taste a labeled line system versus a combinatoric one?

Mikehttp://www.blogger.com/profile/13460212509225238067noreply@blogger.com2