Monday, August 22, 2011

Compendium of Analyses, Part II, Ensembles

A couple weeks ago, I listed a variety of standard analyses which can be used to characterize single neurons. Today, I'm going to cover analyses that describe how populations of neurons encode information. These analyses are more complex than the single-cell analyses, and I do not know the details of how they are all implemented. Unlike the single-cell analyses, which are all from a similar Weltanschauung, each population analysis requires a slightly different perspective.


The simplest population analysis requires looking at the smallest population: two neurons. One way to do this is via cross-correlation, which answers the question, "how often do two neurons fire near each other in time?"

To do this, you start with the spike trains of two neurons.  For each spike neuron A fires, you identify the spikes that neuron B fires around the same time, and note the time difference. As you repeat this, you will build a histogram of these time differences, centered on t=0 lag. If two neurons' spikes are uncorrelated, the histogram will be flat, as the neurons fire at random times; if the neurons' firing is correlated, you will see peaks in the histogram.

Cross-correlation between two gustatory cortex neurons. The two neurons' firing is normally uncorrelated (thin traces), but when two tastants are applied, they become correlated (purple and blue lines).
From Katz et al, 2002
Given the simplicity of this analysis, you would think it's trivial to implement, but it's not.  I was playing around with some olfactory data, and looked into using MATLAB's xcorr() function, given there's a neuroscience blog named after it. And xcorr works great, for analog data. However, action potentials are digital. Someone else in the lab had implemented an autocorrelation using pdist(), but that doesn't work for pairs of neurons. So I had to root around the internet for a quick and dirty implementation of cross-correlation for spikes. You would think this would be standard by now.


When you think about action potentials, it's natural to take a cell-centric view, and think about how ions flow in and out of cells. From an expanded perspective, though, large groups of neurons can significantly effect the electrical milieu around them.  This is called the local field potential (LFP), which you can measure during extracellular recording. The LFP typically oscillates; in the olfactory bulb, the most prominent oscillations are at the gamma frequency.

While the LFP does not reflect population coding in the traditional sense, it does reflect population firing, and the modulatory state.  Modulatory centers can change the LFP's amplitude or frequency, both changing how individual neurons fire, and the environment all neurons fire in.

(Update from July 2012: Looking back on this, it's  embarrassing that I didn't mention the actual analyses you can use to look at LFPs. In any case, for the curious, the basic ones are: power spectrum analyses of epochs (using FFTs), spectrograms (via wavelet decomposition or FFTs), coherence/correlation, and spike-triggered LFPs (and vice-versa)).

Population response vectors and PCA

For single-cell analyses, you typically describe neurons in terms of what stimuli they respond to. On the population level, you need to reverse this, and ask, how is a stimuli encoded by the population?

The easy way to do this is to create a population response vector for different stimuli. To do this, you calculate the firing rate of each neuron you recorded from following a stimulus. Then you take all the firing rates, and put them into a vector, which gives you the population spike response.  Then you repeat this for different stimuli, or time points.  To find out how similar or different the representation of two stimuli are, you just subtract the population vector for one stimuli from the other to get the population spike distance.

Schematic of population spike response. Each cell responds to a stimuli (top row).  You convert these responses into a single number, the firing rate, then put these responses into a vector, where each row is a cell. You can then look at how the population representation changes over time, or with different stimuli, by subtracting one population vector from another.
From Bathellier et. al., 2008
Population spike vectors can get unwieldy for large number of neurons, so people often reduce the dimensionality via principal component analysis (PCA). I have covered PCA before, but the basic idea is that the responses of different neurons in the population vector are correlated, and you can create artificial variables called "principle components" that include this correlation.  If you are lucky, the first few components will explain a majority of the response.

Once you have the population vectors (or principle components) for a a set of responses, you can really have fun. For example, you can see whether the population spike distance between odors is correlated with their perceived similarity. Or you can see how the principal components of an odor response change over time, forming dynamic cycles (below). Principle components are a great way to make data easier to visualize and manipulate.

This shows the PCA response over time to different odor mixtures in the zebrafish. Trajectories start at the arrowheads. The trajectories for the different stimuli generally follow two trajectories.
From Niessing and Friedrich, 2010
Hidden Markov Models

Another way to think about the population response is that neuronal firing represents a "brain state." For taste, this state could be, "I am tasting something sweet." This idea of neuronal firing reflecting internal states can be represented by a Hidden Markov Model (HMM). You assume that the animal has an internal state that you do not know (it is "hidden"), and try to infer the state by the firing patterns of neurons. The math of how this is done is complex, but it involves making assumptions about what states are hidden, and then testing different states to see if they fit the data better.

The beauty of HMMs is that rather than worrying about receptive fields, and firing rates, you simply try to measure the "state" of a set of neurons. This further frees you from trying to guess when states start and end, and lets you find state transitions as they naturally occur. The downside is that it is more difficult to interpret what these states mean in human terms.

A. Spike trains from 10 neurons in GC in response to sucrose. Different states identified by HMM are numbered 1-4. B. Four more example responses. Note that the state change occurs at different times, something that would be missed by typical PSTH analyses. C. Firing rates of the neurons in each state.
From Jones et al, 2007.
Hidden Markov Models have been uncommonly used in neuroscience. I tried implementing them in MATLAB for some of our data. MATLAB has an HMM function (hmmdecode), but like xcorr(), it is optimized for single observations of analog data. All of our olfactory bulb recordings are multiple observations of digital data. At some point I'm going to have to write some code myself, or ask the Katz lab for it, to see if it yields anything interesting.*


That's all for today. To be over-reductionist, I would say most analyses are some variation on receptive fields (single neurons) and population response vectors. There are obviously many more analyses out there, like stimulus (odor) prediction, or spectral power, which I will save for when I better understand them. Hopefully this is useful as an inventory of all the simple, standard things you can ask of data.

* It amazes me how much time people spend re-implementing simple techniques. For example, implementing a HMM for multiple observations is a moderately tricky thing, but should be general enough to be useful for anyone analyzing multiple spike trains. Yet, there's no code on the internet.

There are some reasons for this. No one wants to put their code out and find there's a typo. And each lab's experiments are different enough to not make them completely generalizable. Shoot, I don't have any code out there myself. But reusable code for simple things like HMM, or cross-correlation would save man-years of time. Which is why I really hope the NIH develops better software for neuroscientists.


  1. Xcorr should work fine for 'digital' signals. After all, if you are working in Matlab, you are never working with a truly analog representation of the data (unless you are doing your analysis on symbolic representations, which would be interesting but unusual). We give Matlab our sampled data, which is digital.

    So I think you mean it won't apply for data represented as a point process (a list of spike times). Chronux ( implements cross correlation for point processes as pxcorr. I haven't tried it yet to see how different it looks from simply using xcorr with binned data.

    However, you can avoid all that and just use xcorr easily. Just take your vector of spike times, create a histogram of that data for your two neurons, and run them through xcorr.

    Once you have your histogram-representation of the data (formed by running 'binned_data=hist(data,bin_centers)' on your time-stamp representation of your data), just do something like:
    cc=xcorr(data1, data2, max_lags)
    where max_lags is the number of bins before and after zero lag that you want to estimate the cross-correlation on.

    Then plot it with
    bar(lag_bins, cc,1,'k');

    Does that make sense? Let me know if it doesn't, I didn't put in every step obviously.

    That leaves out stuff about significance, shift-predictors, and such. I'm just starting to look into that. Right now I'm just plotting my data.

    You are tight, though, it is written for my specific data in a specific format. I do have psth significance algorithm and classification suite of algorithms I wrote that other people might be able to use, but it takes so long to do that it is easier to just keep it in an 'in house' format. Plus, if you find errors then it becomes a pain to keep track of everything.

  2. Concerning the HMM, if you're too lazy to implement vector output, you can use a standard HMM with scalar output with N = number of neurons + 1 symbols. Use small bins (say, .5ms) and encode the vector output at time t with scalar symbols as follows:0 -> no spike, 1 -> spike from neuron 1, 2-> spike from neuron 2 etc. In the off chance there's two spikes in a single time step assign the spike to one of the neurons at random. Then you can use standard HMM toolboxes, like this one (dhmm subtype).

  3. Wow, comments!

    Re: cross-correlations, I posted about this on Google+, and Tim Hanson suggested the same histogram trick you did, and Stephen Shepherd recommended Chronux. In the end I found this script which does it: The run time is obviously worse for large data sets, but it's quick with what I've been doing.

    Re: HMM. I tried the scalar trick on an old set of data when I was looking at this, but I summed the spike collisions instead of assigning them randomly; random would probably be better. If I remember correctly, the result was a single state, which probably means I did it wrong, or I had bad data. I've recently got more clearcut responses which would probably be more useful.