Bugbank Navigation

Presentation on identifying COVID-19 inpatients from Public Health England data

This is a presentation I gave at the COVID-19 Host Genetics Initiative meeting on 2nd July 2020 about using Public Health England's Second Generation Surveillance System to identify COVID-19 inpatients among SARS-CoV-2 positive individuals in England.

For further information, please see this bugbank blog post comparing inpatients identified using SGSS and Hospital Episode Statistics.

Identifying inpatients: comparison to Hospital Episode Statistics

Since April, the bugbank project has fed information on SARS-CoV-2 PCR-based swab tests to UK Biobank (UKB) and other cohorts from Public Health England (PHE)’s Second Generation Surveillance System (SGSS). The lag in this database is very short (around 4 days, closer to 10 days by the time it reaches UKB researchers) allowing agile analysis of risk factors. However, there are a range of limitations, including a lack of clinical information. Since a primary research goal is to understand why some people suffer severe COVID-19 from SARS-CoV-2 infection, we sought to identify a subset of SARS-CoV-2 positive individuals that was enriched for hospital inpatients, and therefore enriched for severe disease. Inpatient status is not directly recorded in SGSS, and nor can it be because often tests are ordered before a decision to admit, e.g. in the emergency department. Therefore we tried to identify inpatients indirectly using available information (Armstrong et al. 2020). The aim of this post is to assess the performance of that approach.

 

Working assumptions

There are two key assumptions underlying the approach:

 

1.     SARS-CoV-2 positive inpatients are enriched for severe COVID-19.

 

Starting around 27 April 2020, English hospitals began routine screening for SARS-CoV-2 infection among inpatients. Before this date, when testing capacity was limited and testing was focused on suspected cases, we consider assumption (1) to be reasonable. After late April, we consider it to be questionable, so the focus here is on the period 16 March (when hospitalization was restricted to severe cases only) to 30 April (46 days).

 

2.     Inpatients can be identified indirectly from other data.

 

SGSS is a microbiology database, so information on whether a test subject was hospitalized can only be inferred indirectly. The fields we used to infer inpatient status (all of which are now available in the UKB covid19_result table) are:

·      The ‘Acute Trust’ flag (acute), meaning that the test came from a healthcare institution delivering emergency care,

·      The ‘Hospital Acquired Infection’ flag (hosaq) and 

·      The Requesting Organization Type associated with the test (reqorg).

 

Previously proposed inpatient indicators

In the paper, we considered two indicators:

·      A more specific method: reqorg==1 (Hospital inpatient)

·      A more sensitive method: (reqorg==1 OR reqorg==5 (Hospital A&E) OR acute==1 OR hosaq==1) AND reqorg!=4 (Healthcare worker testing)

The second, more sensitive method, recorded in the origin column of UKB’s covid19_result table, was the method we recommended because we preferred to trade off specificity for more cases.

 

Hospital Episode Statistics as a gold standard

Assessing the indirect identification of inpatients requires an independent source of reliable information. This is now possible using Hospital Episode Statistics (HES).

 

HES offer a comprehensive source of data on spells of continuous stay in hospital, episodes under the care of each consultant within those spells, and diagnoses. The lag for HES is around three months. Mid-August saw the release by UKB of HES data for England up to 31 May 2020, with a warning that the last month may be less complete than previous months. Censoring is important because only ‘finished consultant episodes’ are recorded in HES. Length of hospital stay for COVID-19 in the UK in March and April was typically under 30 days, with a median of 14 days, but exceeded 60 days for some patients (Karagiannidis et al. 2020). The latest HES data are therefore likely to be fairly complete but not definitive for the period of interest (16 March to 30 April).

 

Assessing Performance

We performed three comparisons of HES and SGSS data, treating HES as the gold standard, restricting attention to 16 March to 30 April, and excluding UKB participants not resident in England or lost to follow-up:

 

1.     Detection of inpatients

a.     For all participants with a SARS-CoV-2 test, inpatient status (from SGSS) versus inpatient status (from HES).

 

2.     Detection of COVID-19 diagnosed inpatients

a.     For all participants, SARS-CoV-2 positive inpatient status (from SGSS) versus inpatient diagnosis codes for COVID-19 (from HES).

b.     For all participants with a positive SARS-CoV-2 test, inpatient status (from SGSS) versus inpatient diagnosis codes for COVID-19 (from HES).

 

Comparison 1a differs in that it does not require diagnosis codes for COVID-19 in HES, only the presence of a hospital episode. This tests the stated aim of distinguishing inpatients from non-inpatients among all those participants tested for SARS-CoV-2.

 

Comparisons 2a and 2b use ICD10 diagnosis codes U071 or U072 to define diagnosis of COVID-19 inpatients in HES. This tests the implied aim of identifying inpatients specifically suffering from COVID-19. Two baseline populations are considered: all participants, and SARS-CoV-2 positive participants.

 

Both the specific (reqorg==1) and sensitive (origin==1) methods of identifying inpatients in SGSS were assessed.

 

For each comparison, a two-by-two contingency table was constructed with each element corresponding to counts of UKB participants. The following performance metrics are reported, where truth is determined by HES and discovery is determined by SGSS data:

·      Sensitivity: % true discoveries among all true inpatients

·      Specificity: % true non-discoveries among all true non-inpatients

·      Positive predictive value (PPV): % true discoveries among all discoveries

·      Negative predictive value (NPV): % true non-discoveries among all non-discoveries

 

Other methodological details:

·      English residence was determined by recruitment centre.

·      Tests outside 16 March to 30 April were ignored.

·      Hospital spells ending before 16 March or beginning after 30 April were ignored.

·      Hospital spells with incomplete admission or discharge dates were ignored.

·      Hospital spells with patient classes other than classpat_uni==1000 (inpatient) were ignored.

·      In the first pass, the date of the test and the date of the hospital spell were not required to overlap. This decision is scrutinized below.

·      In comparisons 2a and 2b, the identification of SARS-CoV-2 inpatient status from SGSS requires that the positive result and inpatient flag coincide on the same test.

 

Results

1a. For all participants with a SARS-CoV-2 test, inpatient status (from SGSS) versus inpatient status (from HES).

 

 

 

 

SGSS

(origin==1)

 

 

 

 

 

SGSS

(reqorg==1)

 

 

 

 

Tested inpatient

Tested other

 

 

 

 

 

Tested inpatient

Tested other

 

 

HES

Any inpatient

1499

377

Sens

80%

 

HES

Any inpatient

844

1032

Sens

45%

Other

641

575

Spec

47%

 

Other

140

1076

Spec

88%

 

 

PPV

NPV

 

 

 

 

 

PPV

NPV

 

 

 

 

70%

60%

 

 

 

 

 

86%

51%

 

 

 

Baseline PPV: 61% of participants with a SARS-CoV-2 test were inpatients (by HES).

 

2a. For all participants, SARS-CoV-2 positive inpatient status (from SGSS) versus COVID-19-diagnosed inpatient status (from HES).

 

 

 

 

SGSS

(origin==1)

 

 

 

 

 

SGSS

(reqorg==1)

 

 

 

 

SARS-CoV-2 + inpatient

Other

 

 

 

 

 

SARS-CoV-2 + inpatient

Other

 

 

HES

COVID-19 inpatient

557

253

Sens

69%

 

HES

COVID-19 inpatient

289

521

Sens

36%

Other

262

425821

Spec

100%

 

Other

62

426021

Spec

100%

 

 

PPV

NPV

 

 

 

 

 

PPV

NPV

 

 

 

 

68%

100%

 

 

 

 

 

82%

100%

 

 

 

Baseline PPV: 0.2% of all participants were COVID-19-diagnosed inpatients (by HES).

 

2b. For all participants with a positive SARS-CoV-2 test, inpatient status (from SGSS) versus COVID-19-diagnosed inpatient status (from HES).

 

 

 

SGSS

(origin==1)

 

 

 

 

 

SGSS

(reqorg==1)

 

 

 

 

SARS-2 +ve inpatient

SARS-2 +ve  other

 

 

 

 

 

SARS-2 +ve inpatient

SARS-2 +ve other

 

 

HES

COVID-19 inpatient

557

69

Sens

89%

 

HES

COVID-19 inpatient

289

337

Sens

46%

Other

262

221

Spec

46%

 

Other

62

421

Spec

87%

 

 

PPV

NPV

 

 

 

 

 

PPV

NPV

 

 

 

 

68%

76%

 

 

 

 

 

82%

56%

 

 

 

Baseline PPV: 56% of all SARS-CoV-2 positive participants were COVID-19-diagnosed inpatients (by HES).

 

Dates of SARS-CoV-2 tests and hospital spells

We looked at the effect of insisting that the SARS-CoV-2 test specimen date (specdate) from SGSS overlaps with the hospital spell (admidate, disdate) from HES, focusing on comparison 2a/b.

 

For origin==1, the dates did not overlap exactly for 37/557 true COVID-19 inpatients, falling to 4/557 when allowing a ±7 day tolerance. For reqorg==1, the dates did not overlap exactly for 9/289 true COVID-19 inpatients, falling to 2/289 when allowing a ±7 day tolerance. So the test dates and hospital spell dates were compatible in almost all cases, meaning that the results are not strongly affected by insisting on chronological overlap (with tolerance) or not.

 

Summary

The relative performance of the tests behaved as advertised, with origin==1 more sensitive and reqorg==1 more specific (Table 1).

 

 

origin==1

reqorg==1

%

1a

2a

2b

1a

2a

2b

Sensitivity

80

69

89

45

36

46

Specificity

47

100

46

88

100

87

Table 1 Sensitivity and specificity of SGSS-derived inpatient indicators

 

Table 2 shows baseline PPV (portion of true inpatients in the comparison) against which to compare the PPV and 100%-NPV (respectively, the portions of true inpatients among those flagged and not flagged as inpatients using SGSS). Baseline PPV indicates the enrichment of the baseline population for inpatients.

 

 

origin==1

reqorg==1

%

1a

2a

2b

1a

2a

2b

Baseline PPV

61

0.2

56

61

0.2

56

PPV

70

68

68

86

82

82

100% - NPV

40

0.0

24

49

0.0

44

Table 2 Positive and negative predictive values of SGSS-derived inpatient indicators

Baseline PPV varied widely depending on the baseline population considered (1a: SARS-CoV-2 tested, 2a: all participants, 2b: SARS-CoV-2 positive) and inpatient definition (1a: any inpatient, 2a/2b: COVID-19-diagnosed inpatients).

 

Directly comparing PPV and 100%-NPV contrasts the trade-off in behaviour between the origin==1 and reqorg==1 flags. The origin==1 flag wrongly removed true inpatients less often. The reqorg==1 flag wrongly retained false inpatients less often. Thus, the former was better at excluding: it achieved higher 100%-NPV or “false omission rate”. The latter was correspondingly better at including: it achieved higher PPV.

 

The origin==1 flag returned larger numbers of inpatients at the cost of a higher dilution of false inpatients. In all cases, the origin==1 flag also returned much larger numbers of true inpatients than the reqorg==1 flag. Of course the largest number of true inpatients would be achieved by retaining all participants in the comparison, but the origin==1 flag outperformed the baseline PPV in all comparisons.


These analyses have some limitations. In particular, not all relevant hospital stays were considered: some were excluded because HES only included finished episodes under the care of a particular consultant. Further, records without discharge dates were excluded, so only spells of continuous stay completed before the censoring date of 31 May 2020 were analysed. A more complete picture will become available in the coming months - until then, the performance metrics presented here are provisional.

 

The PPV and NPV depend on the baseline population and its proportion of true inpatients. Therefore the metrics calculated apply specifically to the date range 16 March - 30 April. Post April, there were several changes in the tested population. Testing in the community was expanded, although due to technical limitations, only positive test results are currently imported to SGSS from the commercial ‘lighthouse’ laboratories handling this Pillar 2 testing. More importantly from the perspective of identifying participants with more severe COVID-19, inpatient screening for SARS-CoV-2 became routine after April. Therefore it is not reasonable after April to assume that SARS-CoV-2 positive inpatients necessarily showed symptoms of COVID-19, mild or severe, irrespective of how well inpatients were identified using SGSS data.

 

By September, HES will cover the majority of the peak epidemic in England, so switching to HES for the identification of COVID-19 is strongly recommended for retrospective analyses. A more difficult question is what to do if a second wave occurs during autumn or winter. There might then be a case for enriching for inpatients using rapidly-available SGSS data. By that time, we would have more data to evaluate the impact of post-April changes to testing on the identification of inpatients using SGSS data to form a better-informed judgement, although the situation is likely to remain dynamic.