Multi-sample SNP calling circa 1994

Last November, when news of Fred Sanger‘s death was making its way around scientific circles, so too were many images of Sanger DNA sequencing reactions visualized as autoradiograms. These images brought back memories of a style of Sanger sequencing gel that I first saw in an undergraduate class on population genetics taught by Charles (“Chip”) Aquadro at Cornell University in the autumn of 1994, which left a deep impression on me. My personal photograph 51, if you will.

At the time, I was on course to be a high-school biology teacher, a plan that was scuppered by being introduced to the then-emerging field of molecular population genetics covered in Aquadro’s class. I distinctly remember Aquadro putting up a transparency on the overhead showing an image of a Sanger gel where each of the four bases were run in sets that included each individual in the sample, allowing single nucleotide polymorphisms (“SNPs”) to be easily identified by eye. This image made an extremely strong impression on me, transforming the abstract A and a alleles typically discussed in population genetics into concrete molecular entities. Together with the rest of the material in Aquadro’s class, this image convinced me to pursue a career in evolutionary genetics.

I emailed Aquadro around that time last year to see if he had such an image digitized, and he said he’d try to dig one out. A few weeks ago he sent me the following image, which shows the state-of-the-art in multi-sample SNP calling circa 1994:


Multi-sample Sanger sequencing gel of a fragment of the Drosophila melanogaster rosy (Xdh) gene (credit: Charles Aquadro). The first four lanes represent the four bases of the “reference” sequence, followed by four sets of lanes (one for each base) containing sequencing reactions for each individual in the sample. Notice how when a band is missing from a set for one individual, it is present in a different set for that same individual. This format allowed the position and identity of variable sites in a sample to be identified quickly, without having to read off the complete sequence for each individual.

For those of us who now perform multi-sample SNP calling at the whole-genome scale using something like a Illumina->BWA->SAMtools pipeline, it is sometimes hard to comprehend how far things have progressed technologically in the last 20 years.

Perhaps equally dramatic are the changes in the larger social and scientific value placed on the use of sequence analysis and the identification of variation in natural populations. At that time, the Aquadro lab was referred to in a friendly, if somewhat disparaging, way as the “Sequence and Think Lab” by others in the department (because “all they do in that lab is sequence and think”). As the identification of natural molecular variation in humans quickly becomes the basis for personalized medicine, and as next-generation sequencing is incorporated into more basic molecular biological techniques, it is impressive to see how quickly the “sequence and think” model has moved from a peripheral to a central role in modern biology.


2 thoughts on “Multi-sample SNP calling circa 1994

  1. Multi-sample SNP calling circa 1994 | Tools and...

  2. Links 9/16/14 | Mike the Mad Biologist

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s