If you’ve been reading the comments to the posts I’ve written about Steven Greer’s alleged six-inch alien specimen, you’ll notice that several folks have presumed that the 9% unidentified DNA means the specimen is alien or an alien-human hybrid.

That isn’t the case. But since I’m no expert in genetics, I asked a friend who is to comment on that issue. (He has a PhD in biochemistry and an expertise in genetics and bioinformatics). He’s kindly obliged.

Here’s an excerpt from my email to him of a couple days ago and his response:

MSH: The discussion of this anomaly is ongoing. . . . The sticking point for many readers is the notion that 9% of the DNA cannot be identified. Many jump to the conclusion that this means the specimen is 9% alien. . . . The Stanford researcher who did the analysis says that the DNA he examined was not contaminated. You’d told me before (in relation to another “alien hybrid” specimen) that such inability to identify DNA is because (as memory serves) parts of the genome being examined don’t show up in a registry of some kind. Could you give me a paragraph (for non-specialists) as to how you would explain why DNA can’t be identified in DNA tests?

RESPONSE:

“9% unidentified” and “9% different” are two completely different comparisons. They’re not even related.

9% different is based on alignments. Comparing two sequences together requires a an alignment – where we can say for certain that some percentage of nucleotides in the complete alignment don’t match. Like so:

AGCTAGCTAGCTAGCTAGCTAGCT
AGCTAGCTAGGTAGCTAGCTAGCT

In the second sequence there’s a G that’s different from the C in the first sequence, but I know that’s the difference because it’s surrounded by matching nucleotides. There’s no funny business going on, it’s a genuine sequence mismatch – a real difference.

“9% unidentified” means there’s no alignment. So we have to ask, why is there no alignment? There can be a whole host of answers to that. It could be a sequence of human DNA that’s not been sequenced before (unlikely but it could happen), so it’s not in the database. That could happen in the case of local insertional variations. It could be poor quality sequence – DNA that the sequencing machine just didn’t read correctly. It could be chimeric sequence – where two pieces are stuck together that aren’t connected in the actual chromosome, so they don’t match the correct sequences in the database.

Most likely, it’s fungal and bacterial contamination that’s not in the database. At 9%, that would be the most likely explanation. If they say there’s no contamination, that’s because they don’t understand contamination. The vast majority of forensic DNA (including aDNA) that gets sequenced is contamination. It’s impossible not to get contamination from aDNA extractions.