Bugbank Navigation

What happened to the XPNPEP2 signal?

The last two releases of COVID-19 test results for the UK Biobank (18 May and 31 May) have not reproduced the signal of genome-wide significance in XPNPEP2. By contrast, another signal in chromosome 3 has meanwhile been detected in an independent cohort and appears to be supported by the COVID-19 Host Genetics Initiative's May 15th meta-analysis.

So what is the current status of the signal of association in rs2076205 in XPNPEP2? This signal was unearthed in a comparison of COVID-19 PCR positive versus negative individuals of European ancestry. The rare allele was commoner than expected among PCR negative individuals. Depending how you performed the analysis, the signal was significant or very significant in the UK Biobank cohort. However, there was a difficult-to-explain sensitivity to whether individuals with close relatives in the rest of the cohort were included in the analysis. Even so, I was unable to explain away the signal with measured confounders or population stratification.

The signal is no longer genome-wide significant in UK Biobank, whether or not you exclude individuals with close relatives. Nor was its significance supported by the May 15th meta-analysis combining UK Biobank results with other international cohorts, notably Lifelines, FinnGen and the Netherlands Twin Register.

Therefore it is tempting to write off the association as noise in a relatively small sample. Without dismissing that as the possible explanation, here I do a little more digging to try to explain why else there might have been a signal that is now much reduced.

Perhaps surprisingly, the deviation of allele frequencies from their expectation under the null hypothesis (of no association, as judged against overall allele frequencies among individuals of European ancestry in the Biobank) was driven mainly by PCR negative individuals, rather than PCR positive individuals. PCR negative individuals showed an enrichment for the rare allele. While PCR positive individuals showed a corresponding depletion of the rare allele, the signal was weaker. The trend is still apparent in the data, but its magnitude - and therefore significance - is now reduced.

The graph shows that the significance of the association generally increased with sample size between mid-March and May 1st, before reversing. Why should that be?
Observed and expected genotype counts and statistical significance of the rare allele at rs2076205 in members of the UK Biobank with European ancestry, classified by SARS-CoV-2 test result: anypos.in (ever tested positive, ever tested while an inpatient), anypos.nin (ever tested positive, never tested while an inpatient), neg.in (never tested positive, ever tested while an inpatient), neg.nin (never tested positive, never tested while an inpatient). Red lines represent analyses including individuals with close relatives in UK Biobank. Blue lines represent analyses excluding individuals with close relatives in UK Biobank. Significance calculated crudely with a binomial test. Expected genotype frequencies calculated from all UK Biobank participants of European ancestry.
Given the reliance of the signal on the interpretation of PCR negative individuals, one possible explanation could be a change in inpatient testing at the end of April. The testing criteria before and after the end of April probably differed by hospital, and the date of any change in testing would have varied too, but my clinical colleagues in Oxford have characterized it as follows (and apologies if there are any errors in reproducing the account here):

  • Before circa 25th April, only individuals deemed likely to have COVID-19 were tested.
  • From around 25th April onwards, testing of inpatients was drastically broadened.
There is some evidence of an increased rate of testing in the graphs: the allele counts for negative inpatients become steeper around the end of April.

The idea - and this is only an idea - is that PCR negative individuals before 25th April contained a sizeable subgroup of individuals exposed to SARS-CoV-2 who did not present detectable levels of virus, perhaps because they have a degree of resistance to infection. After 25th April, many individuals without true exposure to SARS-CoV-2 were also tested, diluting the signal.

When UK Biobank releases more detailed data on hospital episodes, it may be possible to test this idea, for example by comparing individuals with and without a diagnosis of COVID-19.

There are other possible explanations, including unmeasured confounding, ascertainment bias and noise. The explanation offered above does not explain why the signal in positive inpatients (weaker though it was) also reversed. And there are alternative interpretations of negative inpatients - another clinical colleague of mine has suggested they contain a subgroup of individuals whose disease is more progressed (i.e. worse) at the time of admission, that there may be a window of opportunity early after infection for detecting the virus from throat swabs, and that window was missed in this subgroup.

Whatever the explanation, there is excitement at the discovery of signals elsewhere in the genome by others (which appear to be replicated in independent cohorts including UK Biobank), and the enhanced prospects the discovery represents for finding new ways to tackle the disease.

No comments:

Post a Comment