FOCUS: Signature Science Funded Modeling Project to Forecast COVID-19 in the United States

VP Nagraj
Bioinformatics Data Scientist

Contact VP

The COVID-19 pandemic has demonstrated the need for timely, data-driven projections of infectious disease activity. Near-term forecasts can provide useful information to public health decision makers. In April 2020, the University of Massachusetts – Amherst partnered with the Centers for Disease Control (CDC) to launch the COVID-19 Forecast Hub. The consortium marshals infectious disease modelers
and data science experts from industry, academia, and government. Participants submit weekly forecasts of cases, deaths, and hospitalizations in the United States. The organizers combine submissions into an ensemble model, which is reported by the CDC along with the predictions of constituent models.

The COVID-19 Forecast Hub Visualization Tool

Figure 1: National incident case forecasts from the FOCUS team (“SigSci-TS”) are displayed alongside the COVID-19 Forecast Hub ensemble and baseline model results for the week of March 7, 2021. The dots correspond to the point estimates and the shading represents the 50% prediction interval around the estimate. The image was captured in July 2021 and shows the observed case counts before, during, and after the selected horizon. Note that while more than 50 teams have contributed to the COVID-19 Forecast Hub, for simplicity none of the other individual team forecasts are shown here.

Signature Science began participating in the COVID-19 Forecast Hub in January 2021 via the Forecasting COVID-19 in the United States (FOCUS) project. The internally funded project aimed to build capacity in modeling infectious diseases, designing cloud infrastructure, and developing machine learning pipelines. Between January and April 2021, the FOCUS team issued 14 weekly entries to the COVID-19 Forecast Hub, all of which were included in the ensemble model and reported by the CDC. Submissions included 1-4 week ahead distributional forecasts of incident cases, incident deaths, and cumulative deaths nationally and by state. The team conceived of a time series modeling approach that performed particularly well for predicting case counts. The project included development of an open-source R package (focustools) to implement models and automate data engineering tasks. The forecasting pipeline was further automated with custom cloud architecture designed in Amazon Web Services (AWS). The “FOCUS: Forecasting COVID-19 in the United States” preprint provides additional details on statistical methods, automation pipelines, and performance evaluation.

Cloud Computing Automation Pipeline

Figure 2: The weekly workflow starts with a scheduled AWS CloudWatch event to trigger an AWS Lambda operation. The function can access IAM credentials to launch the pipeline instance from a template that includes bootstrapping code, storage specifications, and details for the instance to be run. When the forecasting instance starts, it installs necessary software, generates forecast output, writes submission-ready files to an S3 bucket, and then self-terminates. Another EC2 instance hosts a web app with access to the same S3 bucket, allowing users to interactively review and download forecast output prior to submission. After reviewing through the web app, users manually inspect and then download the validated submission file, commit the file to the fork of the COVID-19 Forecast Hub repo, and submit a pull request upstream, which triggers an automated validation process.

The FOCUS project played a small but meaningful part in the ongoing public health efforts to study, understand, and anticipate the trajectory of COVID-19 in the United States. The success of FOCUS demonstrates proficiency in epidemiological modeling, automated forecasting, scientific software development, and cloud computing. Signature Science is positioned to address needs in these areas, which are common to a range of markets and domains.



Want more information about FOCUS?