Bugbank Navigation

Interpreting UK Biobank COVID-19 test data

Following the press release earlier this week, UK Biobank have released the COVID-19 lab-confirmed test results for English participants that we helped link from Public Health England's Second Generation Surveillance System. In this post I describe how the data is obtained and make some recommendations for interpreting the test results.

Registering for the data

To access the data, you need to be a registered researcher attached to an approved project. (This can be a lengthy process.)

UK Biobank emailed existing researchers this week with information on how to access the new data: "If you wish to receive the primary care data restricted for COVID-19 research purposes, or the more frequent updates of other health outcome data, then please go to UK Biobank's Access Management System (AMS) and go to the Data tab of your Project/Application where you will find a button that takes you to a sign-up page for requesting these data. 

"Researchers requiring access to the primary care data will be asked to confirm that they will only use them for COVID-19 related research. The Application's Principal Investigator on an approved project will need to sign-up to apply for these data before they will be released."

Downloading the data

UK Biobank have created instructions explaining how researchers with existing projects who have successfully registered for additional access to COVID-19 data can download it.

Downloading the data is through the data portal, which requires some knowledge of SQL. To download (updated 5 June 2020):
  • Log in as usual at bbams.ndph.ox.ac.uk 
  • Navigate to your Project
  • View/Update it
  • Click the Data tab
  • Select the Go to Showcase to refresh or download data button
  • Select the Data Portal tab
  • Click Connect
  • At the bottom of the page, click the Table Download tab
  • In the Name of table box, enter covid19_result and click Fetch Data
  • Click the link produced to download the file.

Interpreting the data

NB: a description of the data format has been provided by UKB. The data format and definitions may change.

The data look like this:
eidspecdatespectypelaboratoryoriginresult
XXXXXXX29/03/20201911
XXXXXXX29/03/202021010
XXXXXXX31/03/202021001

The first thing to notice is that each row of the data refers to a COVID-19 test. So most users will need to reinterpret that table at the level of an individual participant, using the eid column.

Here I will explain how my colleagues and I envisage the data being used, which will be explained in more detail in draft 2 of the preprint. We believe the data are suited for addressing the question Why do some people suffer from severe COVID-19? For statistical analyses, we are defining:
  • A case as anyone with severe COVID-19: any person with at least one positive test while an inpatient. This means anyone with one or more test results with origin=1 (inpatient) AND result=1 (infected).
  • A control as anyone not known to have COVID-19: any person with no positive test results at all. This means anyone with no tests or anyone whose tests were all negative (result=0).
  • An excluded person as anyone with COVID-19 of unknown severity: any person with at least one positive test but never while an inpatient.
The reason for defining cases this way is because UK policy since 16 March 2020 has been to admit to hospital only patients with severe disease. To help define severe cases only, no test results have been included before this date. Many individuals with positive tests who were not hospital inpatients at the time of the test are likely to be healthcare workers, so they did not necessarily have severe disease. We are assuming that members of the control group (anyone not known to have COVID-19) have not been exposed, or have not suffered severe disease.

In the first tranch of data, using the above definitions, there were 572 cases, 97 excluded participants and 805 participants mentioned in the results file but never tested positive who therefore qualify as controls. [This article previously incorrectly said that all 97+805=902 non-cases in the results file were excluded participants.]

Note that only test results in England are reported. One way of excluding individuals who cannot be cases because they do not live in England is to assume participants live in the same country as the recruitment centre they originally attended. Assessment centre at recruitment is field 54, and the following are the codes for English recruitment centres:
11012Barts
11021Birmingham
11011Bristol
11008Bury
11024Cheadle
11020Croydon
11018Hounslow
11010Leeds
11016Liverpool
11001Manchester
11017Middlesborough
11009Newcastle
11013Nottingham
11002Oxford
11007Reading
11014Sheffield
10003Stockport
11006Stoke
11025Cheadle
11026Reading
11027Newcastle
11028Bristol
For me, this removed 56,650 participants, none of whom had test results (as expected). Excluding other participants who cannot be cases, such as those lost to follow-up, is also recommended.

No comments:

Post a Comment