Monday, May 2, 2011

(Neuro) Biology is computer science

When I was an undergrad trying to figure out my major, I asked a professor if there was some way to combine my favourite subjects, neuroscience and computer science.  And lo! there is computational neuroscience.  What I didn't realize is that these subjects interact in a far more practical, if less significant way: to be a great biologist, you need to be a competent programmer.

I'm biased about this.  I'm a computer dork.  I got an Android phone because I like playing with ROMs, and I have written multiple website scrapers to get data that I want. As such, I have long thought that all science majors - biology and chemistry included - should include introductory programming classes because all data analysis is done on computers.  Yet, when I tell people that, they patronizingly say, "Yeah, that's a good idea," as if it would be nice but not that useful. Let me try to  convince you by speaking from experience, and looking at what types of techniques and analysis are used today.

In graduate school I worked in an imaging lab.  While the layman may think of  images as pictures made of colored pixels, a computer programmer realizes what they really are: two-dimensional arrays of integers (or 3D for color images). In grad school, these images could even be 4 dimensional, as each pixel had a time dimension. When you analyze imaging data, you need to completely understand that images are just multidimensional arrays of data; how drawing an ROI is not just a circle, but means masking the data; an how to filter in time and space. While this does not require high level math, it does require familiarity with using arrays in programs. I saw firsthand how people with little to no programming experience struggled, and were at the mercy of others' programs (including the main ones written by the boss). There almost was a division in the lab between those who could program, and those who could not.

Now, I am working on an electrophysiology project, recording multiunit data.  Understanding electrophysiology obviously requires some math since it involves voltage and currents. Beyond the theory there is data analysis.  We record data off 32 electrodes, which generate gigabytes of data. From this data we need to extract spikes, which involves figuring out whether voltage changes are real or noise, and clustering spikes recorded off the different electrodes (thankfully this has been largely solved by others).  Then once we have the spikes, we have to interpret them: make histograms with different time indices; cross-correlate the spikes with each other and with the stimulus; perform principal component analyses; and run stats to see if any of it is true.  In short, we have to program.

As neuroscience progresses from recording from a single cell in a single nuclei over a short time period to recording from many cells in different nuclei over modest time periods, the amount of data is increasing orders of magnitude, and thus the analysis necessary to analyze that data is becoming more sophisticated, and requiring more automated (i.e. programmed) processes.  I've already shown how important programming is to imaging and electrophysiology, but the trend is pervasive.  In molecular biology, people now use microarrays to identify interesting genes, which requires statistics.  In developmental biology, people use scripts to identify synapses where synaptic markers overlap. In EM, as serial sectioning becomes feasible, you need algorithms to reconstruct spines and whole cells. Even in the land of Western blots, programming will be necessary.  As we generate more blots, we will need some stats to keep track of whether the differences are significant.

Other, more successful scientists have beaten me to the punch and stated that "biology is information science." This was hinted at when sequencing the first genome, and is now naked in the age of  hundreds of genomes where people are trying to extract meaning from them.  While neuroscience may not yet be an information science, I would say it is a computer science.

No comments:

Post a Comment