Saturday, January 16, 2016

Introducing the mechanisms Twitter bot

When describing what we know about the world, scientists often have to state what we don't know. Rather than simply stating, "We don't know how X works," scientists (and especially biologists) have come up with the beautiful syntax, "The mechanisms underlying X are not yet understood." Why use five syllables when you could use 14! In a previous blog post, I explored the history of this syntax, and found its basic form dates to 1950. Later, I invented the Mechanisms rating system, the number of sentences until the mechanisms syntax is used. Since I have grown more interested in data science, I decided to look at the mechanisms syntax from a medium data perspective.

So today, I'm going to present the highlights of 25,000 abstracts that contain mechanisms syntax. Then I will introduce a twitter bot that will once a day tweet out a new mechanism that we don't understand.

Some quick stats about mechanisms

I started by querying Pubmed for 100,000 abstracts containing both a mechanism word like "mechanism" or "pathway", and a clarity word like "unknown." I then filtered the abstracts for those which used a mechanism and clarity word in the same sentence, yielding a final pool of 25,000 abstracts. If you want to see a bunch of these sentences, you can visit the Jupyter notebook for this project.

The shortest sentence in this corpus is concise, "The mechanisms are unclear."

The longest sentence highlights the beauty of the syntax, how you can write dozens of words about what we do know, then switch it up and talk about what we don't know:
Although an increasing number of studies have identified misregulated miRNAs in the neurodegenerative diseases (NDDs) Alzheimer's disease, Parkinson's disease, Huntington's disease, and amyotrophic lateral sclerosis, which suggests that alterations in the miRNA regulatory pathway could contribute to disease pathogenesis, the molecular mechanisms underlying the pathological implications of misregulated miRNA expression and the regulation of the key genes involved in NDDs remain largely unknown. PMID: 26663180
If you didn't feel like reading that, I'll summarize: micro RNAs are important for brain diseases, but we don't know how they work.

For some summary stats of the 25,000 sentences:
  • 19,500 ended with some variant of "unknown"
  • 8,900 included "however"
  • 2,800 used "although"
  • 600 used "while"

Mechanisms Twitter bot

Since I can never get enough of these sentences, I made a twitter bot to find them for me. Each day, the bot queries Pubmed for new abstracts using the syntax, and tweets it with a link.

While this was a fun side project, when you look at thousands of these sentences, they form a sort of catalogue of everything we don't know. Biofilms, leukocytes, 6-gingerol... For many of these papers, I would have to spend some time to figure out what we do and don't know.

So if you want a fun daily reminder of everything we don't know, please follow.