Steve Pittelli, M.D.
This article was originally written in conjunction with my poster presentation at the American Society of Human Genetics (ASHG) annual conference last year in Washington DC. I had put it up as a preprint and, presumably, it was too much for even the preprint server to take, so it seems it was taken down. I am republishing it here. It’s a longer piece, but I think an easy read for those interested in this important issue who are looking for a more skeptical view of the field of behavioral genetics.
Abstract
For more than half a century, the field of behavioral genetics has touted studies claiming that differences in personality, intelligence and susceptibility to psychiatric disorders have a significant genetic component. Despite this long history of positive findings, however, little of substance has been garnered regarding how genes influence a particular trait, or by what mechanism. Moreover, the field has not been able to replicate their findings. Yet, the gene-hunting expedition continues unabated by its failures.
15 years of candidate gene studies purporting to demonstrate genetic correlations for behavioral traits have been refuted in their entirety. With little self-reflection, the field moved on to genome-wide association studies (GWAS). These studies have not been able to confirm the heritability claims derived from twin studies, instead citing minuscule “variance explained” calculations. Polygenic risk scores (PRS), derived from GWAS, have had little success in predicting traits.
Despite the lack of success, researchers have promoted their findings to mainstream media, which then sensationalize and amplify the findings, creating a sense of breakthrough discoveries and a perception in the public that genetic differences play a significant role in human character.
This perception has been harmful historically and still affects our society in many ways, with a potential for further harm. It seems unlikely that any grand findings will be found in the future, leaving only ideological persistence as a driving force for continuing this pursuit. It is time to concede that behavioral genetics is a null field.
Introduction
This article serves as a retrospective examination of the ongoing field of behavioral genetics to consider whether it is unfolding as a null field. A null research field is defined as a field in which there are no true findings at all to be discovered.i Such an absolute definition quickly raises objections from researchers in the field who assert that, at the very least, there is a small genetic effect seen in their studies. However, these small effects generally have little or no practical relevance. This belies the original premise of the field, which is that human behavior is significantly influenced by genetic variation. Thus, from a practical point of view, such small findings, even if valid, are effectively null findings. Researchers might argue that these small findings will eventually lead to larger, significant findings, but the assertion of this paper is that such is not likely and that the evidence to date, or lack thereof, is sufficient to abandon the premise of behavioral genetics as a significant driving force for differences in human behavior.
Most in the scientific community would agree that examples of historic null fields would include astrology, phrenology and alchemy, all of which had countless reported positive findings, albeit not in as stringent a scientific method as we see today. The purpose of this article, then, is not to critique individual behavioral genetic studies that claim a positive result, as such studies have been regularly reported for decades, but rather to examine how a field can have such a long record of positive findings without being able to produce anything of significance or clinical usefulness to date?
It may be that the reason for this failure is that the genetics of behavioral traits are far more complex than we first suspected, which is the prevailing opinion in the field. However, another rarely considered possibility is that the very foundation of the field, the idea that differences in behavioral traits among people or groups are driven by genetic variation is a flawed assumption and, in fact, there is nothing of significance to find. There is no definitive moment to arrive at such a conclusion. It would require a consensus in the field and a discarding of a scientific ideology. In this case, the former is probably easier than the latter. Thus, the case being made here is that ideology is driving the field, rather than scientific findings, leading to an unconscious bias for producing positive results. Consequently, there is a never-ending sense, in the moment, that the latest study is more evidence for a genetic basis for one trait or another, despite the fact that so many studies in the past were accepted as true findings, then faded from consciousness when they could not be consistently replicated. If you ignore the last few years of studies in the field, this is inarguably the case, as nothing beyond that has held up to replication. The question, then. is whether the focus on “the last few years of studies,” is a timeless myopia, keeping alive expectations of discoveries on the horizon, but asymptotically never reaching that point?
To be clear, the focus here is on the primary, canonical premise of behavioral genetics: the assumption that a quantitative summation of common genetic variants in individuals, or shared within groups, influence human character, intelligence or the risk of developing a mental disorder. It is not a denial of the importance of genetics in human biology. Certainly, pathological genetic/chromosomal disorders, such as Down Syndrome, Fragile X Syndrome, Huntington’s chorea or other genetic disorders can lead to significant intellectual, behavioral or neurological difficulties or deficiencies. In fact, this is often used to argue by extension for the plausibility of behavioral genetics in the canonical sense. However, we are comparing specific mutations with clear, consistent effects, including concomitant neuropathological or physical problems (seizures, heart problems, changes in physical appearance or stature, etc.), to a theory that hundreds or thousands of genetic variants in a polygenic model, with imperceptible singular effects on a trait, somehow combine quantitatively to produce, at best, a stronger likelihood that a person will present with a trait phenotype, although potentially not even possessing the trait at all, while avoiding any concomitant pathological issues. In short, we are talking about two very different things, the former of which is clearly observable and reasonably consistent, while the latter is largely a theoretical assumption with arguments based on statistical probabilities.
The term “null” is, of course, rather absolute and there are gray areas to consider. For example, a less functional variant of the gene responsible for the enzyme aldehyde dehydrogenase, which is involved in the metabolism of alcohol, is often seen in people of Asian descent. This can cause a temporary buildup of aldehyde in the system when drinking alcohol and lead to symptoms sometimes described as “Asian flush,” including hives, migraines and other largely histaminergic symptoms that are unpleasant, perhaps leading to an aversion to drinking alcoholic beverages and lowering the rate of alcoholism among those with this variant. There is certainly debate related to the validity of this finding but, nonetheless, it would constitute a plausible, causal genetic mechanism affecting a trait (alcohol abuse) that is technically under the purview of behavioral genetics. Arguably, though, this skirts the current canonical premise of behavioral genetics. It is a monogenically derived physical phenomenon, whereas the implication of behavioral genetics is that a genetically derived, neurological or, at the very least, endocrinological mechanism would in some way affect a person’s proclivity for a trait, in this case leading to a greater or lesser urge to drink alcohol. Such anomalies are worth exploring, but this is not the primary thrust of the research in behavioral genetics and presumably, with the plethora of genetic studies to date, most such monogenic variants of any significance have already been discovered and such examples now serve mostly as a cudgel to justify further behavioral genetics research.
Some also like to point to obvious physical similarities and differences related to genetic variation, such as eye and hair color, facial features, height, etc., to argue by extension for a genetic origin for behavioral traits like intelligence or personality. This argument bypasses centuries of philosophical discussion pertaining to the nature of the mind, confidently declaring that the minds of individuals are a function of neurophysiological characteristics of their brain, with variation presumably derived from genetic differences. This is a bit of a circular argument and leaves little room for any interpretation of the human condition beyond a “nature vs. nurture” argument (a term originally coined by the eugenicist, Francis Galton) where nature is presumed to be exclusively of genetic origin and nurture describes the family, society and circumstances of a person’s life.ii There is perhaps a bit of arrogance and a lack of intellectual curiosity in that assumption. It’s worth noting that genes related to physical traits, such as skin color, height, facial characteristics, etc., can certainly affect someone’s station in life and, by extension, some behavioral traits. Again, though, this sidesteps consideration of behavioral genetics in the canonical sense.
Behavioral genetics was a valid scientific pursuit. The theory that genetic variation is a significant factor in what makes each human unique in character is plausible enough to explore. However, after decades without any tangible success, can we rule it out? There is an opportunity, in real time, to assess the field of behavioral genetics and to consider whether it is a null field. The assumption that human character is genetic in origin, whether true or not, has real-world implications that are potentially detrimental to society and individuals, as we have already seen historically. If this assumption is based on a null field, there is little reason to continue this pursuit and, in fact, it would be unconscionable to do so.
The Origins of an Assumption
The idea that people inherit traits from their parents or ancestors is probably as old as civilization. Anyone can observe physical similarities between parents and their children and will extend that to similarities in character seen. However, our own civilization has developed a more directly physical understanding of such inheritance, beginning with the work of Charles Darwin and, particularly, the ideas of his half cousin, Francis Galton, who applied Darwin’s ideas to human traits, viewing different races and social classes as being of inherent, inferior or superior character and intelligence due to evolution. These ideas led to the eugenics movement, responsible for unfortunate institutionalization and sterilization policies in the U.S. and other countries and culminating in the Nazi atrocities. Thus, post-World War II, eugenics had become taboo and perhaps in response to that, many psychologists from that time leaned in a decidedly behavioral, “nurture” model.
By the time that behavioral genetics became a field, generally demarcated as 1960 with the book, “Behavior Genetics,” iii a few things had changed. The most notable was the discovery of the DNA double helix. This was perhaps amplified by the burgeoning computer era, presenting the possibility that human traits, whether physical or behavioral, could be “coded” in some sense and such coding could be delineated for various traits, at least at some point in the future, when scientific technology was advanced enough.
Early on, the leaders in the field wanted to separate it from the eugenic past, initially focusing more on animal studies to give it a scientific sheen, although they maintained some of the statistical methodology of Galton and other eugenicists like Karl Pearson and Ronald Fisher. With little concrete evidence, it would seem the basis for the belief underlying behavioral genetics was more a scientific faith in Darwinian evolution extended to human behavioral characteristics than any scientific findings. Anyone can observe significant differences in the proclivities and preferences of siblings, even monozygotic twins, so the claims have always required a statistical argument. Moreover, convoluted evolutionary theories are needed to explain the existence of disorders like schizophrenia, depression, autism, or even homosexuality which, until 1974 was classified as a mental disorder in the Diagnostic and Statistical Manual for psychiatry (DSM).
The early animal studies were not particularly successful and little if anything could be easily transferred to human behavior. Studies of dog breeds, for example, were unable to note identifiable differences in behavior between breedsiv, nor were they successful in breeding a dog specifically for intelligence, a project that was funded by the eugenically leaning Rockefeller Foundation in the hopes of increasing the acceptance of genetic determinism.v
The only successful research was twin studies. Thus, early on and arguably to date, the backbone for behavioral genetic findings are twin and adoption studies garnering high heritability calculations and bolstering claims that there is a significant genetic basis for character traits, psychiatric disorders and human intelligence.
Twin and Adoption Studies, and Clones
Through movies, books, legends and soap operas, we have developed a mythology around twins, long lost siblings, and even clones. This no doubt contributes to an enthusiasm around twin and adoption studies and an acceptance of any positive results. Reunited twins who reportedly were not aware of each other are noted to have “remarkable” similarities:
“Both of them are fast eaters, both failed algebra as students and both always pull out three paper towels in public restrooms.”
“Both were volunteer firefighters, carried big key rings on their belts, which each had big buckles, and both drank only Budweiser beer.”
“Both had very similar hairstyles, and both enjoyed freaking people out by rolling their eyes upward so far that only the whites were exposed.”
“Both were very fashion-conscious, despite one being raised in the country and the other in the city.”
Such coincidences make for good television, but are hardly scientific, and perhaps reminiscent of the Lincoln/Kennedy Assassination Coincidences many of us marveled over in grade school. Nonetheless, one well-known twin researcher stated, “I do not regard these really as coincidences; rather, they’re genetically influenced commonalities that may ‘masquerade’ as coincidences.” This seems to assign near magical qualities to monozygotic twins.
In more controlled twin studies, it does generally appear to be the case that monozygotic twins have more behavioral similarities than dizygotic twins. That is not always the case, however, particularly when studies are held to more scrutiny. For example, the Minnesota Study of Twins Reared Apart (MISTRA) has been brought into question regarding their claim of IQ heritability in monozygotic twins, which was not significantly different from the IQ similarity of dizygotic twins.vi Twin studies have also had issues of fraud, such as Cyril Burt’s twin studies used by Arthur Jensen to make the claim that Black people had lower IQ’s due to genetic differences.vii
Even if one accepts the notion that monozygotic twins are more alike than dizygotic twins, taking this as evidence that their similarities are genetically driven and, moreover, that a simple heritability equation can be used to determine exactly how genetically driven a trait is, without consideration of other plausible reasons for these similarities seems dubious, at best. For example, monozygotic twins are not treated the same as dizygotic twins, are perceived more as a pair by others and themselves, and often model their behavior accordingly. Moreover, behavioral phenotypes like mental illnesses are hardly definitively measurable.
If one looks more closely at twin studies, some of the claims seem less impressive. As an example, we can look at a recent Danish Twin study of schizophrenia.viii Denmark has excellent national records, identifying monozygotic and dizygotic twins and diagnoses that they might have, including schizophrenia. This is far more likely to be reliable and inclusive than earlier studies which were generally institutionally conducted, and relied on noticing that a person diagnosed with schizophrenia had a monozygotic or dizygotic twin. The Danish study claimed a 79% heritability for schizophrenia, which is consistent with earlier studies. However, of the 81 twin pairs, where one had a diagnosis of schizophrenia, the other twin was also diagnosed with schizophrenia in only 12 of the pairs. This gives a 14.8% concordance rate. This alone is interesting, as it is still generally taught to this day that the concordance rate for monozygotic twins is 50%. In fact, an earlier Finnish study, another country with good registry records, found an 11% concordance rate. ix
With a 14.8% concordance, a 79% heritability claim might seem surprising to those not familiar with heritability calculations. The reason for this is that the calculation uses a comparison between monozygotic and dizygotic twins. In this case, the concordance rate for dizygotic twins was 4%.x Such high heritability claims based on very low concordance rates might fit a mathematical model, but it is a bit idealistic to make such a claim when in only one of seven instances does a genetically identical twin also get diagnosed with this severe, debilitating, life-long disorder. Appeals to stochastic development or “non-shared environment,” do not shed any light on the cause of the disorder or how it would be so variable in monozygotic twins and are little better than, “There’s something in the ether,” as an explanation.
It is also worth noting that psychiatrists are taught to favor the diagnosis of schizophrenia if a monozygotic twin has that diagnosis, so even this 14.8% figure might be inflated for that reason, alone, since that will bias the clinician into giving the diagnosis. Moreover, regardless of diagnosis, monozygotic twins are three times more likely to live together than dizygotic twins, so they are more likely to share the same mental health treatment, which could also inflate the concordance rate for monozygotic twins.xi It also would be interesting to see to what extent the concordant monozygotic twins have a specific genetic disorder affecting neuropsychiatric issues, with an additional diagnosis of schizophrenia, an issue that will be discussed in more detail shortly. Thus, even this low concordance rate might be significantly inflated, suggesting that poor results in genetic studies might be much closer to reality than twin studies, in which case, the idea of “missing heritability” might be replaced by “twin study heritability inflation.”xii
Like twin studies, adoption studies have been held up as evidence for a genetic basis for behavioral traits, and like twin studies, have the potential to inflate claims of genetic influence on behavioral traits. Generally, the studies attempt to demonstrate that adoptees are more likely to resemble the behavior of their birth parents and birth siblings rather than their adoptive parents and adoptive siblings. Results tend to be more modest than twin studies and have some of their own inherent difficulties. For example, individuals who are adopted, even near birth, often have life-long psychological effects from the experience of being adopted, as any therapist who has worked with adoptees could confirm. The circumstances for birth parents putting a child up for adoption also cannot be equated with the circumstances for those who are adopting a child in terms of socioeconomic status, education, etc.
Moreover, studies often do not confirm the genetic assumption. For example, “The Texas Adoptive Project” purported to show that IQ was more correlated between the birth parent and the adopted child than the adoptive parent, but ignored that adoptive parents were also not correlated to their birth children and that the birth mother’s IQ correlated more closely to their birth child’s adoptive siblings.xiii This is a good example of how the bias of researchers will affect the conclusions of these studies. Another study of personality through the large Colorado Adoption Project (CAP) did not find that personality was more similar among adoptees and their birth parents than their adoptive parents.xiv The authors noted, “On the face of it, these results from CAP suggest that neither nature nor nurture contribute importantly to individual differences in self-reported personality.” The authors go on to say, “Until this issue is settled, the high heritabilities from twin studies cannot be assumed to be correct…. The major methodological factors that might make twin estimates of heritability too high involve various violations of the equal environments assumption. That is, identical twins might be treated by others or by themselves more similarly than fraternal twins, which would inflate twin estimates of heritability. For self-report questionnaires, identical twins might exaggerate their similarity (called assimilation effects) or fraternal twins might underestimate their similarity (called contrast effects).”
The fact of the matter is that twin and adoption studies are blunt scientific instruments that, while giving some early justification for the field of behavioral genetics, have little to offer in the modern era, where we can look at the actual genetics of individuals to see whether the heritability claims of twin and adoption studies hold up. In truth, they have not. On the contrary, the difference between heritability claims from twin and adoption studies and genetic studies are profound, leading researchers to search for this missing heritability, without ever considering whether the heritability was ever there. Twin and adoption studies had their purpose, but at this point in time, they seem like little more than quaint exercises for graduate psychology students interested in an archaic vestige of their field.
It’s worth also mentioning clone experiments in this context, as they might be more likely to be performed in the coming years (presumably not in humans). An interesting finding in clone experiments observing behavior in animals such as micexv and fishxvi, is that they generally find significant individual behavioral differences among the cloned animals. This curiosity is frequently seen anecdotally by pet owners who have had a loved (generally deceased) pet cloned and notice significant differences in the cloned animal. For example, a woman who had two clones of her pet poodle remarked, “One likes TV [as did the original] and the other doesn’t care. One is lazy and the other is always on the run. One is scared of a leaf in the yard while the other is fearless. One is a bully and the other is more submissive.” In one of the early cloning endeavors, a rancher who had an attachment to a docile bull he had kept as a pet, had it cloned, and the cloned bull attacked him, almost killing him on more than one occasion. Despite this, he did not give up on the idea that the bull would one day be more like the original. Such is the power of belief in genetic determinism.
It seems likely that future clone studies will continue to demonstrate that the cloned animals are going to exhibit significant differences in behavior. One might expect attempts to explain these differences while still accepting genetic explanations for behavior, but this result is also consistent with a null field.
Candidate Gene Studies: The Null in Action
Attempts to use genetic linkage studies to find specific genes for disorders like schizophrenia and bipolar disorder in the way that genes for Sickle Cell Anemia and Huntington’s Disease were identified, proved fruitless and this was given up by the early 1990’s. This led to the assumption that perhaps a few genes in combination were involved in mental disorders (rather than giving up the premise that they had a genetic basis at all). This ushered in what were known as “candidate gene studies,” which involved searching for predetermined, specific genetic variants, usually chosen because they might correspond to plausible mechanisms based largely on presumed pharmacological successes, such as variants related to Serotonin or Monoamine Oxidase-related genes for depression due to selective serotonin reuptake inhibitors like Prozac (Fluoxetine) or the old MAO-inhibitors like Nardil (Phenelzine). Generally, dopamine related genes were the focus for schizophrenia due to the success of dopamine receptor blocking antipsychotic medications like Haldol (haloperidol) and Thorazine (chlorpromazine).
Hundreds of candidate gene studies were performed for behavioral traits, mostly in the 1990’s and early 2000’s that claimed correlations between specific genetic variants and psychiatric disorders, personality traits and human intelligence. One notorious example was a degenerate repeat in the 5-HTTLPR region of a serotonin transporter gene (SLC6A4) region that was initially implicated in depression, with other studies claiming links to Seasonal Affective Disorder, Insomnia, Alzheimer’s Disease, “Nostalgia proneness,” etc. None of them consistently replicated. This led researchers to refer to it as an “orchid gene” that only expressed under the right conditions and suggested that the depressive symptoms would only affect someone under stressful situations, such as children living in adverse situations, adolescent girls being bullied and children with depressed mothers. Much of this found its way into national media stories. Mechanisms were proposed, generally related to an activation of the amygdala, a region of the brain associated with emotional responses. None of these studies held up.xvii
As replications were failing, meta-analyses were conducted to pool positive results. Researchers trying to understand why they were not having success attributed it to various issues, including poor phenotyping (they weren’t properly identifying the trait), environmental moderators and “variable number tandem repeat” (VNTR) and other areas of the genome that don’t directly correspond to a gene, but might moderate genetic expression. The possibility that these studies might be false positives was not acknowledged for many years.
By the early 2000’s, expectations were quite muted, at least amongst some in the field, with Behavioral Geneticist Eric Turkheimer discussing the “gloomy prospect,”xviii suggesting that large causal genetic findings might be out of reach. In sharp contrast to this, the public was being given a much more optimistic view of the findings, with regular news stories touting gene discoveries for various mental illnesses, personality traits and intelligence. A Science Magazine article in 2003 noted as a top discovery for that year, “Decoding Mental Illness,” and stated that genes had been found that increase the risk for schizophrenia, depression and bipolar disorder. If one were to rely entirely on newspapers and magazines, they would have had little doubt that researchers were finding more and more genetic evidence for mental illnesses, personality traits and human intelligence. Despite this, absolutely none of this evidence stood up for depression, schizophreniaxix or any other psychiatric trait.
Thus, we already have an example of this field producing false positive results for years, with near unanimous acceptance of the results by researchers, who then publicized these results in mainstream media outlets, creating a consensus in the public eye that our differences are genetic in nature. These studies were blithely waved in the face of skeptics, xx characterizing these skeptics as unscientific and even questioning the mental health of those critical of such studies.xxi They were also used as a rationale for a more biological approach to psychiatry, transforming the psychiatric profession, to the delight of pharmaceutical companies and health care insurers. Despite this, all of these studies were abandoned with little self-reflection or public correction. It is an embarrassment to science. Instead, there was a pivot in the field to a new type of genetic research, seemingly without concern that they might be repeating the entire debacle once again.
Genome-Wide Association Studies
With advances in technology, instead of studying one or a few genetic variants correlated with a particular phenotype, it became possible to explore thousands of potential genetic variants by examining regions throughout the genome (loci) in a hypothesis-free manner, by conducting a genome-wide association study (GWAS). Any trait that could be gleaned from a questionnaire could then be evaluated for genetic correlations by this method. The trait could be a traditional behavioral genetic trait, such as psychiatric diagnoses, personality traits or intelligence, but also more dubious traits like “church attendance,” “ice cream flavor preference,” or “walking at a brisk pace.” It’s worth pointing out that “hypothesis-free,” does not eliminate potential biases or ideological positions as some researchers imply, since there is a lot of room for interpretation and choice of potential phenotypes. For example, when examining a trait like “educational attainment,” basically a measure of how far someone went in school, researchers rarely consider the fact that they are among the most highly educated people in our society and generally come from privileged backgrounds, themselves. Blue bloods looking for blue blood genes that might demonstrate some genetic influence for their educational success beyond their station in life are unconsciously going to be primed for a positive result.
The switch from candidate gene studies to GWAS also ushered in a philosophical shift in how behavioral genetics is understood. Previously, it was assumed that there would be fewer genes with larger effects affecting behavioral traits. GWAS are meant to detect genetic variants with much smaller effects. It is worth noting that this shift was not due to discoveries in the field, but instead due to a lack of discoveries. Since candidate gene studies failed to find genetic variants significantly affecting a trait, the assumption was that variants with much smaller effects likely combine quantitatively to express a phenotype, and these could be identified with a GWAS. This is the “polygenic” assumption that, rather than one or a few genetic variants leading to a phenotype, a combination of hundreds or even thousands of genetic variants are involved. This conclusion is inescapable if one holds onto to the premise that behavioral differences have a significant genetic basis. However, it again ignores the possibility that behavioral differences are not primarily genetic. Again, ideology takes precedence.
Beginning in the early 2000’s, GWAS were conducted for most traditional behavioral phenotypes and generally were successful in identifying correlations that reached statistical significance for many behavioral traits, giving the field a shot in the arm. However, after a few years, the results became somewhat contradictory. Each new study would find correlations but tended to find different “statistically significant” correlations. The accepted statistical significance is generally a p value less than 5 X 10-8. While there is some basis for this, it perhaps is somewhat arbitrary. Nonetheless, if it is viewed as a cutoff point for possible correlations that could then be replicated, it seems to be a reasonable threshold. The problem for behavioral geneticists is that such replications, where two independent studies identified a loci or SNP that was statistically significant in both, just didn’t happen. This created a “replication crisis” in the field.
Much like candidate gene studies, this might suggest that studies are mostly finding false positive results. Once again, this did not appear to be a serious consideration among those in the field. Candidate genes had already been rejected but this was not taken as a cautionary tale. Instead, the primary focus then became increasing the number of participants in GWAS, under the assumption that the effects were simply too small to be picked up consistently in smaller GWAS, ignoring the fact that the same was said about candidate gene studies and few still propose larger candidate gene studies.
Increasing the study sizes soon became possible, thanks to the UK Biobank and commercial genetic ancestry sites such as 23andMe, allowing researchers to perform GWAS with hundreds of thousands of participants. With the increase in study size, it was possible to find hundreds of genetic correlations in a single study. It should not be surprising that larger studies find a larger number of statistically significant loci. By itself, this does not validate the correlations, which could merely be a larger accumulation of false positives. Therefore, one would still want to replicate these findings with another independent GWAS. In practice, this rarely occurs anymore. For most of the standard behavioral phenotypes, whether it be Schizophreniaxxii, Depressionxxiii, Educational Attainmentxxiv, etc., rather than perform an independent GWAS from a new dataset, new data is simply added to previous GWAS in what is described as a “meta-analysis,” despite much of the new data never having been assessed in an independent study.
Other methods of “replication,” are often used, such as comparing the correlations to a “replication sample” to see if the correlations have sign test concordance in the same direction of effect. If a p value is not significant, but the effect is in the same direction (both either correlated to the phenotype or both correlated away from the phenotype), then this is considered a replication. This has a p-hacking quality to it, since it allows one to skirt the 5 X 10-8 p threshold. Since the replication sample is generally drawn from a similar population to the study population, an attenuated population stratification affecting the replication data sets cannot be ruled out, since this, by itself, could explain the sign concordance. In any case, there really is no excuse for bypassing an independent GWAS for new data before adding the previous results in a meta-analysis. One might consider whether there is an unconscious avoidance of such an analysis.
Another work-around replication is what is called “enrichment.” This involves looking at the function of the SNP’s in statistically significant loci to determine whether they are predominantly cognitively focused, such as a neurotransmitter gene, or at least more than one might expect from random false positives. There is room for embellishment in such an analysis, since significant loci may have more than one SNP to choose from and a researcher is more likely to choose an SNP with neurocognitive function. Moreover, genes can have more than one function and most genes will have some cognitive function. Such an analysis would be more convincing if it were done blindly, with the parameters set up prior to conducting the GWAS. If an independent GWAS was performed and could algorithmically determine by assessing the statistically significant loci what the phenotype was, this would make a stronger case. Moreover, such analysis could be tested against a null phenotype for comparison.
Population Stratification and GWAS
Generally, researchers performing GWAS attempt to remove population stratification, which will give spurious genetic correlations due to varying frequencies of minor alleles in genetically distant ancestries. A classic example would be chopstick use. Clearly, a GWAS would find many spurious correlations for chopstick use that are simply ancestral markers for people of Asian descent from countries that favor chopstick use. That is a clearer example, but when doing GWAS and looking for tiny correlations, there is a potential for more subtle population stratification due to geography from genetic drift, as well as cultural identity, socioeconomic status, religious affiliation, etc., due to assortative mating. Clearly, there will be genetic correlations that have no causal connection to the phenotype. Researchers use principal component analysis (PCA) in an attempt to control for population stratification, by allowing them to identify and adjust for significant clustering of alleles unrelated to the trait in question. This helps eliminate false positives, and presumably allows us to see only causal alleles for a trait. Such techniques are partially successful, so we would want to know to what extent and proportion the remaining variants are causal or true positives.
Generally, the variance explained in a GWAS by such correlations is already quite small, even if you accept them as being all true positives. For example, the most recent Schizophrenia “meta-analysis” had 2.6% of the variance explained via the statistically significant correlations (Interestingly, the previous meta-analysis gave the variance explained as 3.6%. Thus, doubling the number of cases actually lowered the variance explained, despite claims that increasing sample size would find more of the variance explained). Of course, 2.6% is not zero, but even accepting its validity, it appears to have reached a peak with no discernible clinical usefulness or advancement in our understanding. Thus, it is effectively a null result and seems just as likely to shrink as to expand with larger datasets, particularly as study diversity increases.
Behavioral Geneticist Eric Turkheimer famously posited three laws of behavioral genetics,xxvthe first being: All human behavioral traits are heritable. If we focus on the results of GWAS, though, perhaps we can look at this in another way and say: All human behavioral traits can be genetically correlated. In this sense, the genetic correlations do not need to be causal. It is impossible in a world of different cultures, nations, religions, wealth disparities, etc., to have a pure data set, and the trait correlations are likely little more than markers for inevitable sub populations, known or unknown, that have their own preferences and predilections. This is even further confounded by the datasets themselves (such as the UK Biobank), that have unique participation biases.xxvi Moreover, is it more likely that studies are finding causal genetic correlations for ice cream flavor preferencexxvii, church attendance and walking at a brisk pace, or that the correlations for such studies are false positives?
Introducing Null Traits in a GWAS
With the larger study sizes, it appears that most GWAS will find significant correlations for just about any trait (at least for those that are published). If one assumes, however, that the significant loci for schizophrenia, depression, IQ and personality traits are valid, what can we say about more suspect traits like “church attendance,” “ice cream flavor preference,’ or “going for a brisk walk.” Some will laugh these off, but what is the basis for making a distinction if they generally find a similar number of significant correlations? Others will try to find a rationale for legitimate genetic correlations for these unconventional traits. In either case, we can say little about the likelihood that the correlations found are valid and causal for a trait.
The author suggests that one way to lend more insight into this conundrum would be to compare the results to an accepted null. If the number or quality of correlations for a null phenotype is comparable to the phenotype being questioned, this would raise questions about the validity of the results. Two different types of nulls will be suggested here. The first would be a pure null. A pure null trait should show no significant correlations and if any such correlations occurred, this would indicate a random false positive result. For example, one could combine the cases and controls and randomly reassign them to one group or another and perform another genome wide association. The assumption would be that no significant correlations would be found in such a scenario except by chance.
A second, more useful null, would be a null trait that produces several spurious correlations by taking advantage of population stratification, as one might see in the classic “chopstick” example. With a bit of imagination, any number of such null traits could be considered. As an example, one could perform a GWAS on Americans by using their Social Security Number as a trait. If we used the last digit of the Social Security Number, we should have a pure null without any significant correlations, since no one is going to have genetic variants that increase the likelihood that they have a particular last digit of their randomly assigned Social Security Number. Conversely, if a GWAS was performed using the first digit, or first few digits of the Social Security Number, one might find significant correlations, because the first three digits of the Social Security Number are based on the region of the country where the application was made (generally, the region a person is from). It has already been established that genetic correlations can be found based on geographic location due to population stratification.
It’s difficult to predict which null traits would produce significant correlations. Birth day or month would seem likely to produce random null traits (unless one ascribes to astrology), while one might see that the first or last letter of a person’s name might produce correlations for various reasons, particularly ethnicity. These are simply suggestions and nulls could be established by researchers, even keeping a collection of such nulls for indexed use.
The usefulness would extend beyond a simple comparison for expected false positives. If one conducts a “within family” analysis on a null trait, would all the correlations evaporate? This would add validity to a within family analysis of a trait, while if many correlations remained, this would highlight its limitations. Likewise, one could blindly compare an enrichment analysis on a trait versus the same for a null trait to make the case that the enrichment for the trait in question is significantly different than what is seen by a null enrichment.
These are simply suggestions. The larger point here is to consider a change in focus from doggedly pursuing positive results to starting with a null assumption from which one needs to be convinced of the validity of positive results. This is the more classic scientific approach, and it seems likely that there could be an even further attenuation of the claimed positive results and give a better assessment as to whether we are repeating the candidate gene error.
Polygenic Risk Scores
As the limitations of GWAS are becoming clearer, there has been a pivot of late among researchers to what are called “polygenic risk scores.” A polygenic risk score (PRS), also referred to as a polygenic score (PGS), tries to estimate an individual’s “genetic risk” for a trait. These are developed using the results of a GWAS, by aggregating and quantifying the effects of many SNP’s or common variants in the genome, under the assumption that each variant can have a small effect on a person’s genetic risk for a given disease or condition. If a person has a high PRS for a trait, then this suggests that they would be more likely to have the trait in question. For example, if someone had a higher PRS for schizophrenia, they would be more likely to be diagnosed with schizophrenia, which doesn’t usually present until late teens or early twenties. The idea would be that a person’s risk for schizophrenia could be known even at birth, presumably allowing some sort of treatment intervention before the person presents with symptoms, although it is difficult to imagine what kind of intervention could be done in such a situation, even if these scores had validity.
PRS appears to be another conceptual shift from finding “causal” genetic variants to “risk” genetic variants, with one researcher arguing that that this risk is itself causalxxviii. This bypasses the issue of “missing heritability,” since even with low variance explained from a GWAS, one can be said to have a high PRS in comparison to others. One issue, though, is that two people with identical risk scores might not both have the trait. Moreover, someone might have a high PRS and not have the trait and someone with a low PRS might still have the trait. Again, we are left with the question of what is in the ether?
Another issue with PRS scores is that variants that did not reach clinical significance in the GWAS are often used to develop the score. In other words, 5 X 10-8, the generally accepted p value threshold for statistical significance is not met and variants with larger (less stringent) p values are used, much like with the sign replication noted previously for GWAS, which is arguably another form of p-hacking.
It has also been observed that, in samples not representative of the population from which the PRS was constructed, the PRS has very little value. For example, a PRS constructed from a white European population, has little validity in a population of African ancestry. Behavioral geneticist Kathryn Paige Harden noted in her book, “The Genetic Lottery,” “I anticipate that scientists will have developed a polygenic score that is as strongly related statistically to academic achievement in Black students as it is in White students.” Segregated PRS scores in this context would seem to raise not only ethical concerns but might also lead one to question the premise.
For example, if a schizophrenia PRS for a White European population has a different set of SNP’s than you have with a population of African or other ancestries, it isn’t clear how that would produce the same phenotype. For a supposed complex phenotype like schizophrenia, if you were to assume a multifactorial neurological cause, it seems unlikely that a very different collection of genetic variants, presumably affecting the brain in different ways, would somehow create a phenotype that is functionally indistinguishable between different ancestries.
Moreover, different PRS’s for different ethnicities might also suggest that the PRS is picking up a large amount of population stratification. One could easily create a “chopstick” PRS with some predictability. So, to call a PRS “causal” without being able to specify what any of the SNP’s involved have to do with a phenotype is, charitably, premature.
PRS’s to date, have not had much success in predicting a phenotype in any clinically useful way.xxix To give an example, a PRS developed for schizophrenia was used in a sample of individuals from the Netherlands to test how well it could predict whether someone had a diagnosis of schizophrenia.xxx This gave a dismal performance, with a 0.5% prediction success. To put that in perspective, if one predicted that someone had the diagnosis of schizophrenia based solely on the fact that he was a a young male, that would be twice as good a predictor. If a person had gone to the doctor at some point in their life complaining of pain, that would be 4 times more accurate in predicting schizophrenia than the derived polygenic score.
The argument, of course, is that increasing sample sizes will make PRS’s that are better at predicting a phenotype. As with candidate studies and GWAS, though, there is little basis for the belief that PRS will become a clinically useful predictor for behavioral traits, and it is becoming clear that they will not. There are already millions of subjects in these GWAS. One can expect diminishing returns from larger studies, especially as the studies become more ethnically diverse. It would be difficult to rule out the possibility that PRS is largely a measure of population stratification for the phenotype in the population studied, and that is the basis for even its modest predictive capabilities.
Another issue with PRS’s is that, whether they are valid, they have the potential to create a fait accompli if someone is told at an early age that they don’t have the genes for, say, musical ability, or mathematical aptitude or “educational attainment.” If you started to encourage individuals with a high musical aptitude PRS into pursuing music, you will begin to “validate” the PRS. If you begin developing academic curriculum around a PRS for mathematical aptitude, you are reinforcing the PRS, even if it is invalid. More disturbing would be informing someone that they have a high PRS for schizophrenia. This has a real ability to alter the course of a person’s life, without any evidence that such a risk is genuine. If you inform a potential spouse of your high PRS for schizophrenia, how might that be received? There is little recourse when these are only “risk” scores, giving plausible deniability when they are incorrect in that prediction. Assuming these scores have no validity, they could take generations to refute, with ever-changing scores as new studies arise. It would be akin to astrology, which, despite a lack of evidence of its validity, maintains it popularity, but arguably a PRS has more harmful repercussions than you would see with the generally more playful astrological predictions (it is worth pointing out that astrology studies continue to be produced with claimed positive resultsxxxi).
Clinical Issues with Psychiatric Phenotypes
When claims are made about the genetics of mental illness, there is little discussion or study of how or why individuals are given a particular diagnosis. This might seem irrelevant when performing a GWAS, which does not necessarily require an understanding of psychiatric diagnoses to scan for genetic correlations to that diagnosis. However, this assumes that these diagnoses are as definitive and unbiased as physical traits, like height or body mass index. As a psychiatrist with 30 years of clinical experience, it seems useful to briefly discuss some of the issues related to diagnoses that might confound a GWAS.
For starters, diagnoses come and go and change over time. For example, ADHD and autism have changed significantly in the past two decades, being far more commonly diagnosed. Twenty years ago, it was highly unusual to diagnose an adult with ADHD, even more so if they had not been given that diagnosis in early childhood. Likewise, highly functional adults and children were rarely given the diagnosis of autism. Leaving aside debates about the validity of these diagnoses, it would be difficult to demonstrate that what is being referred to as ADHD in adults versus traditional “hyperactive children,” are on a continuum or why one would assume these two designations have genes in common. Moreover, to classify a young child who is mute and unable to function without assistance to a highly functioning student at U.C. Berkeley both as “autistic,” with the assumption that they have a similar presentation on a continuum that might even have a similar genetic basis, simply stretches credulity. It is an embarrassment to the field that it has been so influenced by pop psychology and pharmaceutical companies promoting (in the case of ADHD) stimulant medications. If one is finding genetic correlations for these diagnoses, this should not serve as proof of their relationship, but should raise the level of skepticism one has for the results of any GWAS.
Another issue with practical clinical diagnosing is what one might refer to as a soft diagnosis. Classically, patients diagnosed with bipolar disorder, in addition to severe depressive episodes, have clear-cut episodes with symptoms of mania, extreme amounts of energy, no need for sleep for days at a time and a grandiosity with psychotic delusions such as a belief that they are a billionaire, or secretly married to a celebrity, or hearing the voice of God, etc. Such intense symptoms might lead to speculation that the disorder is neurological, perhaps with a genetic basis. The problem from a research standpoint is that patients with such symptoms are probably a minority of those given that diagnosis. In practice, most patients who are given a “bipolar” disorder, present far differently. Generally, they have mood swings which might have a greater intensity and frequency than what might be viewed as the normal range. Sometimes such presentations are given a diagnosis of “Bipolar II Disorder,” but the “II” often fades by the time the designation gets to a genetic study. The idea that this has some relationship, genetic or otherwise, to classic “Bipolar I Disorder,” has little basis other than the semantic use of words like “mood,” and “grandiosity,” that are superfluously describing different phenomena, to fit them in a rigid diagnostic category in the Diagnostic and Statistical Manual (DSM), and legitimizing prescribing medications and getting hospital admissions. Thus, when one comes across a study finding of genetic correlations for “bipolar disorder,” some up-front skepticism is warranted, since very many, if not most of the cases in such a study probably don’t really have classic bipolar disorder symptoms. Again, positive results for a GWAS for this diagnosis should raise questions about the validity of GWAS results, generally.
Like bipolar disorder, not all patients diagnosed with schizophrenia have what might be considered the classical symptoms of the disorder, such as organized paranoid delusions, believing groups (the CIA, Secret Service, Masons, etc.) are monitoring them in some way, hearing specific voices that talk to them and that they experience as definitively auditory, as well as disorganization, and poor social functioning. It is not as common to give the diagnosis of schizophrenia to individuals who do not have classical symptoms as it is with bipolar disorder, but it still happens frequently, particularly with a specific subset of individuals with mental disabilities, whether from a genetic cause or otherwise. These individuals often run into significant behavioral problems causing them to enter the mental health system. From an early age, their difficulties are pathologized, often in the framework of a mental disorder, describing their impulses as “voices” that were telling them to do whatever behavior got them into trouble. In reality, this is more akin to poor impulse control and difficulties with emotional modulation. They might also describe a kind of general paranoia about people talking about them, mocking them, or being against them in some way. Such concerns would not be unusual for people who have been marginalized and mocked from an early age due to their mental disabilities. Again, although it is semantically reasonable to then check off “auditory hallucinations” and “paranoia” as DSM symptoms for schizophrenia, it does not resemble classic schizophrenia. Many of these individuals get the diagnosis of “schizoaffective disorder,” which incorporates their mood swings and justifies the use of mood stabilizing medication (lithium, sodium valproate, tegretol, etc.) in addition to antipsychotic medication. This is often lumped in with schizophrenia in genetic studies. In practice, one cannot simply diagnose a patient with a mental disability (or, the previous DSM diagnosis of mental retardation) to justify putting them on medications or provided treatment in the mental health system. It’s an unfortunate reality of the mental health system and is worth discussing in its own right, but is mentioned here only as it pertains to potential confounding in genetic studies.
Although their smaller numbers are less likely to directly confound a schizophrenia GWAS in the way that a bipolar disorder study might be confounded, they do contribute to contentions about “rare variants,” being part of the missing heritability from GWAS for schizophrenia. Arguments that schizophrenia is primarily a genetic disorder are often buffeted by the fact that many individuals with genetic disorders that interfere with normal neuropsychiatric development and cause intellectual disabilities such as 22q11.2 deletion syndrome and Fragile X syndrome are at much higher risk of being diagnosed with schizophrenia. The argument, by extension, is that people without obvious pathology who are diagnosed with schizophrenia might have rarer genetic variants that are not yet picked up in a GWAS and that these will cause schizophrenia without the concomitant pathology usually seen in genetic disorders. This misses the point that the neuropathology is exactly the reason these individuals are being given the diagnosis of schizophrenia (or schizoaffective disorder).
The broader point here is that psychiatric diagnoses are not definitive, nor can they be considered wholly scientific, despite efforts to make them so. They are subject to clinician biases, regional biases, class and racial biases, diagnostic trends, the limitations of the DSM, the bureaucratic and economic realities of mental health treatment facilities and the influence of pharmaceutical companies, among other factors. One could go through the DSM, a manual subject to the whims and biases of those who created it, and imagine issues related to any diagnosis there in ways that could bias and confound genetic studies. It is a bit of folly to make assumptions about genetic correlations for traits that are conceived within a particular cultural milieu as if they are definitive human traits.
Discussion
There is a long-running television series called, “The Curse of Oak Island,” the premise of which is the belief that this isolated Canadian island contains buried treasure, based on over 200 years of folklore. The series chronicles their attempts to find it. Each episode involves some excavation or exploration, usually with some small trinket discovered, like a coin or piece of metal or wood, a stone carving, a bone, a nail or hints of some objects beneath the ground detected by equipment. These small findings lead to wild speculation about the Knights Templar, Roman Shipwrecks, the Aztec Empire, etc. What isn’t found, of course, is a treasure. The treasure hunters seem undaunted in their endeavor, however, using more and better equipment to dig deeper, with more precision. After eleven years, the show has become formulaic in finding some small “artifact,” taking it as evidence for the treasure and moving on to the next show, leaving for viewers a kind of insouciant demonstration that progress is being made, without ever questioning the existence of a real treasure. The show lacks a null, which would be any other island or plot of land that has been inhabited for a period of time, since one would inevitably find small trinkets almost anywhere when you dig for them.
The analogy is not subtle. The gene-hunting expedition has produced little more than trinkets of genetic correlations, with researchers presenting the most optimistic view of their meaning, after decades of “digging” with better and better scientific techniques. Yet, no real treasure has been found. The optimism appears to be based on the unflagging belief that behavior has a genetic component, with changes in our understanding of the genetics of this behavior being proposed based only on the failure to find anything of substance. If the field of behavioral genetics operated under the assumption that there were no genes to find, no treasure, it may have died after the failure of the candidate gene studies. At this point, it seems to have become too big to fail. Resources and careers are at stake. Even without evidence, it’s difficult to say whether the field of behavioral genetics will continue indefinitely.
Nonetheless, even a null field is not inconsequential to society if the perception is that it has validity. Decades of positive spin on the part of researchers, coupled with sensationalist media reports, have given rise to a sense of genetic determinism that affects our society. Government policies like immigration and financial assistance, education policy and criminal justice will be influenced by attitudes about what is genetic. Psychiatric treatment revolves around a medical model justified by genetic assumptions. Classist and racist assumptions are often justified by claims of genetic superiority. On an individual level, beliefs about one’s own genetic limitations create a sense of fate. If behavioral genetics had strong validity, then perhaps such is our fate. However, it seems more likely that it is a house of cards. Toppling this house of cards could be a starting point for a different perception of behavioral science and even humanity.
The author would like to close this article with a clarion call for any brave researchers in the field to earnestly explore the possibility that it is a null field!
i Ioannidis, John P., Why Most Published Research Findings are False, 2005
ii Beauchamp, et al., Nature-Nurture Interplay: Evidence from Molecular Genetics and Pedigree Data in Korean American Adoptees, 2023.
iii Fuller, John, Behavior Genetics, 1960
iv Scott JP & JL Fuller, Genetics and the Social Behavior of the Dog, Chicago, 1965
v Panofsky, Aaron, Misbehaving Science, Chicago, 2014
vi Joseph, Jay, A Reevaluation of the 1990 “Minnesota Study of Twins Reared Apart” IQ Study, 2022.
vii Tucker, W H, Re-considering Burt: Beyond a Reasonable Doubt, 1997.
viii Hiker, et al., Heritability of Schizophrenia and Schizophrenia Spectrum Based on the Nationwide Danish Twin Registry.
ix Koskenvuo et al., Psychiatric Hospitalizations in Twins, 1984.
x It should be noted that despite the true concordance rate calculated at 14.8%, the Hiker study claimed a 33% “probandwise” concordance rate, with 7% for dizygotic twins. It is unclear the rationale for this, other than giving a higher concordance rate and subsequent heritablility measurement.
xi Koskenvuo et al., Psychiatric Hospitalizations in Twins, 1984.
xii Chen, et al., Dominant Genetic Variation and Missing Heritability for Human Complex Traits: Insights from Twin versus Genome-wide Common SNP Models, 2015
xiii Richardson, K, Sarah H. Norgate, A Critical Analysis of IQ Studies of Adopted Children, 2006.
xiv Plomin, et al., Adoption Results for Self-Reported Personality: Evidence for Nonadditive Genetic Effects? 1998.
xv Neiderhiser JM Between-litter and within-litter variance in inbred strains of mice as evidence of shared and nonshared environment Behav Genet 1989
xvi Laskowski, et al., The Emergence and Development of Behavioral Individuality in Clonal Fish, 2022.
xvii Border, et al., No support for Historical Candidate Gene or Candidate gene-by-interaction Hypothesis for Major Depression Across Multiple Large Samples, 2019.
xviii Turkheimer, Eric, Heritability and Biological Explanation, 1998.
xix Farrell, et al. Evaluating Historical Candidate Genes for Schizophrenia, 2015.
xx The author notes that he literally experienced this on at least two occasions.
xxi The author notes that, after sending a letter to the Journal of Clinical Psychiatry critical of a genetic study for schizophrenia, they wrote him back and said they would only consider printing the letter if he could get two other psychiatrists to vouch for his character in writing. The author declined the request.
xxii Ripke, et al., Mapping Genomic Loci Prioritizes Genes and Implicates Synaptic Biology in Schizophrenia, 2020.
xxiii Howard, et al., Genome-wide Association Study of Depression Phenotypes in UK Biobank Identifies Variants in Excitatory Synaptic Pathways, 2018.
xxiv Okbay, et al., Polygenic Prediction of Educational Attainment Within and Between Families From Genome-wide Association Analyses in 3 Million Individuals, 2022.
xxv Turkheimer, Eric, Three Laws of Behavior Genetics and What they Mean, 2000.
xxvi Schooler, et al., Participation Bias in the UK Biobank Distorts Genetic Associations and Downstream Analyses, 2023.
xxvii https://blog.23andme.com/articles/genes-scream-for-ice-cream
xxviii Plomin, Robert, Blue Print, How DNA Makes Us Who We Are, London, 2018.
xxix Hingorani, et al., Performance of Polygenic Risk Scores in Screening, Prediction, and Risk Stratification, 2022.
xxx Marsman, et al., Do Current Measures of Polygenic Risk for Mental Disorders Contribute to Population Variance in Mental Health? 2020
xxxi Bhandary, et al., Prediction of Mental Illness Using Indian Astrology: Cross-sectional Findings From a Prospecitive Study, 2018.
Thanks for this comprehensive and sobering piece. I came to it by way of reading your comment on another Substack, and I’m lucky I did. I feel sheepish reading it—not because I worked in behavioural genetics, but because I was close enough to possibly (?) know better and didn’t ask myself critical questions.
I still remember my first Society for Neuroscience meeting in 2010. A leading autism researcher stood on stage, gestured at a pie chart showing the vast majority of cases labeled “idiopathic,” and said, “We’ll fill the genetics in within five years.” I was a little astounded by the chutzpah—but everyone else seemed to believe him, so I did too. Bigger samples, better tools—it felt inevitable that the field would get there, even as GWAS after GWAS failed to replicate.
But I never questioned the core assumptions: How were these traits defined? What does heritability really capture? Were any of these models falsifiable, or just endlessly adjustable? In hindsight, it was faith in the shape of empiricism.
I see the same pattern in Alzheimer’s research, where I’ve been writing about the slow collapse of the amyloid cascade hypothesis. But you could see all the signs back in the heady days of developing the first transgenic mouse model—it was all there in the very first paper that launched a thousand clinical trials. Critics have called it “too big to fail,” and that feels about right. Despite decades of failed predictions, billions in sunk cost, and no meaningful therapeutic breakthroughs, the field hasn’t let go—the originators of the hypothesis just doubled down.
And this isn’t just about behavioural genetics or amyloid. It’s about a broader problem in science: where incentives reward positive results, flashy headlines, and publication over rigor, replication, or humility. What happened to Platt’s strong inference: clear, testable hypotheses; competing explanations; and a willingness to walk away when the evidence says we should?