Monday, May 18, 2015

Playing with deconvolution and GCaMP6 imaging data

The Palmiter lab recently got an Inscopix microscope. We are still troubleshooting our surgeries and recordings right now, so we don't have any imaging data yet. Given that, I wanted to set up our analysis pipeline ahead of time. Specifically, I wanted to see how we can identify calcium events. In this post I will explore how well deconvolution works on calcium imaging data from the Svoboda lab.

Why deconvolution

Inscopix provides image analysis software, Mosaic, which is pretty good at motion correction, and identifying ROIs. For Mosaic's signal detection, it identifies events using the simple criterion:
ΔF / F > F0 + 3SD
This works adequately for low frequency events on a low background, but completely ignores the magnitude of events, or what happens when events occur in quick succession. While I've done imaging before, I've never done deconvolution, so I thought it would be fun to try it out.

To get sample GCaMP6 imaging data, I turned to the CRCNS, which has a bunch of neuroscience datasets. The most relevant dataset was from the original GCaMP6 paper wherein they recorded calcium fluorescence in parallel with loose seal electrophysiology, to allow comparisons between calcium and spikes. For this example, I loaded the processed data from the GCaMP6s imaging dataset. If you don't care about loading data in python, you can skip the next section and head straight to Deconvolution time!

Loading Svoboda lab data into python

The data is stored in .mat files using the Svoboda lab data format. Since understanding someone else's data structure is as difficult as understanding their musical taste, I'll walk through how to load the data in python. To load the data, you can simply use the function, with the following arguments:

import as scio

svoboda_data = scio.loadmat( 'data_20120521_cell5_003', squeeze_me = True, struct_as_record = False)

It is critical to use the squeeze_me and struct_as_record flags! If you don't, python will load a weird object that will crash your python instance. This took me an embarrassingly long time to figure out.

svoboda_data is a dictionary with four fields; the only one we care about is the 'obj' field, so we can extract it:

svoboda_obj = svoboda_data['obj']

svoboda_obj has a bunch of fields as well; the most important one is timeSeriesArrayHash.value, which contains the data. The value field is an array of five structs, number 0-4, which have:

0: average fluorescence of an ROI
1: average fluorescence of the surrounding neuropil
2: raw e-phys trace from the cell
3: high-pass filtered e-phys
4: detected spikes

The value struct has two fields we care about, .time (which has the... time) and .valueMatrix (which has the values of the fluorescence or voltage). Knowing this, we can load the fluorescence and spike data:

ca_data = svoboda_obj.timeSeriesArrayHash.value[0].valueMatrix

ca_time = svoboda_obj.timeSeriesArrayHash.value[0].time

spike_data = svoboda_obj.timeSeriesArrayHash.value[4].valueMatrix

spike_time = svoboda_obj.timeSeriesArrayHash.value[4].time

I wrote a small wrapper function in python to load full Svoboda lab experiments. You can then plot the calcium fluorescence and spike times on each other:
Fluorescence (blue trace, DF / F) and spike data (green) from the GCaMP6s data set, recorded May 15, 2012, cell 1.
For this cell, most spikes generated calcium events, and multiple spikes in a short time period generated larger calcium events. However, some spikes did not yield much (if any) calcium response. Calcium events, in general, had mostly exponential decays.

Deconvolution time!

Now that the data is loaded, we can deconvolve it. Thankfully, Alistair Muldal has implemented the fast non-negative deconvolution described in Vogelstein, 2010 (PyFNND. To be honest, I only partially understand the deconvolution algorithm. It starts by assuming that calcium can be modeled by spikes that return to baseline with exponential decay, but how they turn that generative model into a fit went beyond me.

In any case, we can run the PyFNND deconvolution on our calcium trace (I've calculated the ΔF / F separately), and get both a fit of the calcium trace, and an imputed spike train:

from pyfnnd import deconvolve, plotting 

n_best, c_best, LL, theta_best = deconvolve(
df_f.reshape, dt=0.0166, verbosity=1, learn_theta=(0, 1, 1, 1,0) )

plotting.plot_fit(df_f.reshape(-1, 1).T, n_best, c_best, theta_best, 0.0166)
Top: Original fluoresence (blue) and fit fluorescence (red), given the spike heuristics shown below.
Bottom: Imputed spike heuristics for the calcium trace shown above. Note that n_hat does not give a spike train, but rather a spike heuristic.
I played around with the theta parameters to see if I could get a better fit, but the best results I got were with the default ones above.

To get a spike train, you need to threshold the spike heuristic n_hat. Here I used a threshold of 0.1 (to avoid getting false positive spikes), and plotted the imputed spike train, real spike train, and calcium signal.

The imputed spike train (red) for a threshold of 0.1, and the real spikes (green).
Given how often the real spikes did not generate clean calcium events, I feel like the deconvolution did a pretty good job identifying spikes. The deconvolved spikes were often slightly delayed compared to the actual ones (although this could probably be fixed).

To get a sense for how accurately the deconvoluted spike train matched the real spike train, I calculated the Victor-Purpura distance (VP distance) between the two. The VP distance calculates how many times you would have to insert, delete, or move a spike to turn one spike train into another. VP was handily implemented by Nicolas Jimenez in the python module fit_neuron. (Note, if you want to use this module you should download the .tar.gz from Github, as the pip install has a bug at the moment.) I also wanted to use this metric to get a better sense of the optimal threshold for n_hat, so I ran the VP distance for thresholds of n_hat ranging from 0 to 1. I used a cost of q=5, so that spikes would be inserted or deleted only if there was not a nearby spike within 200 ms.

from fit_neuron.evaluate import spkd_lib as spkd

vp_cost = 5

spk_times = ca_data.spike_time[ca_data.spike_train>0.5]

vp_dist = spkd.victor_purpura_dist(ca_data.fluor_time[n_best >thresh], spk_times, vp_cost)

Similarity between the imputed spike train and real spike train. The normalized VP distance is the VP distance divided by the total number of real spikes. The threshold is the threshold for n_hat. 
A normalized VP distance of 0 means all the spikes were the same for each spike trains; and a distance of 2 means all the spikes from one train had to be deleted and re-inserted to recreate the other spike train. I think a distance of 0.8 means that just over half the real spikes were shared with the imputed spikes. I ran the deconvolution for all 9 neurons in the GCaMP6s dataset, and the minimum normalized distance ranged from 0.73-1. Some cells were undermined by bursts of firing, which caused a lot of false positives in the deconvolution. Others were difficult due to random non-spike related calcium fluctuations. I tried some filtering or baseline subtraction to make the results better but could not do significantly better than the default settings

A few concluding thoughts. Python is sweet, and people have implemented a lot of useful algorithms in it. This allowed me to try things out that would have been otherwise impossible. In the future I will try to clean up python posts by using iPython notebooks.

Deconvolution works pretty well at turning large calcium events into number of spikes. The timing won't be exact, but if you're doing calcium imaging, precise timing (<50ms) doesn't matter anyway.

[The following comments reflect my understanding after a few days of coding / playing. If I make any mistakes here, hopefully my commenters can correct me!] Finally, GCaMP6s was sold as having single spike resolution, but in my hands it does not, at least not for single trials. In Fig. 3F of the GCaMP6 paper they claim to detect 99% of single spikes in visual cortex with a 1% false positive rate. When I read that, I thought they had done something similar to what I just did: given a calcium fluorescence trace, and no prior knowledge, try to identify as many action potentials as you can, and compare to ground truth. What they actually did was answer the question: if you know what a single spike's fluorescent trace is, can you tell the difference between known spike events and random noise? In real imaging, of course, you don't know what spike calcium events look like (it's certainly not a simple exponential!). Also, that false positive rate, while sounding stringent, is quite permissive: if you image at 60Hz, with a 1% false positive rate, you will detect false spikes at a rate of 0.6Hz; if your cell fires at 0.5Hz, as their example cells do, that's a problem.

Having said that, using deconvolution with GCaMP6 works well depending on the type of data you're interested in. If you are recording from low frequency neurons where each action potential can cause spiking in downstream neurons, missing half the spikes is important. But if your neuron fires has a high baseline firing rate, and individual spikes aren't too important, the combination works well.

Wednesday, May 6, 2015

A cheap source for 230 μm ferrules

UPDATE: Somewhat embarrassingly, and fortuitously, I hadn't researched enough ferrule suppliers before making this post. After contacting a couple more, I found a supplier that sells 230 μm ID ferrules for $1.5 / pc. It is the Shenzhen Han Xin Hardware Mold Co. They do not have the 230 μm ID ferrules listed on their Alibaba page, but you can contact them for a custom order. Our lab ordered a set of 10 from them, and they worked well. If you use a lot of ferrules, I suggest ordering a test set yourself (shipping is $50).

If you want an American supplier, you can use Kientec. They sell 225 uμm ferrules (FZI-LC-225) for $3.25 / pc with an order of 100. These ferrules fit the fibre a little more snugly.

(The below is now outdated.)

Tl;dr: I am trying to get a bunch of labs to pool money to get create a new source for 230 μm ferrules.

The Palmiter lab makes its own fibre optic cannulae for implantation, using the protocol described in Sparta and Stuber. This has saved us a lot of money, making each cannula cost less than $10. As we have ramped up our optogentics, however, the monthly cost has risen enough that I wanted to find a way to save more money. The biggest cost in cannula construction is the zirconia ferrule (1.25 mm OD, 230 μm ID) we use, which cost $5 a piece. This is in contrast to ferrules with smaller 100 μm bores, which cost $1 a piece.

After hearing about Alibaba in the news, I looked there, and found lots of manufacturers for 100 μm ID ferrules, but none for 230 μm ID ferrules. Curious, I contacted Huangshi Sunshine Photoelectric, Co. to see if they could make 230 μm ID ferrules. They responded that they could, but that they would have to make a new "model." Making the new model would entail a one-time fixed cost of $7500, but once the model is made, the ferrules will cost $0.95 / piece for orders of 1000. (I sent them the specs of the Precision Fiber ferrules. They recommended a slight reduction of the diameter of the epoxy concavity, from 0.9 to 0.8mm; otherwise they should be the same. Once we have the money, they will make a confirmation diagram.)

I would like to get a group of labs together to cover the cost of the $7500 model (labs at UW should be able to cover $1500). Once the model is made, if your lab is big enough, you will be able to order directly for Sunshine Photoelectric in batches of 1000. If you can't use that many, the Palmiter lab should be able to pool together a bunch of smaller orders, and then distribute them.

If your lab uses these ferrules, and would be interested in contributing to getting the model made, please e-mail me at map222 at

UPDATE: The Zweifel lab here is on board, so we only need $6000 more!