Skip to main content

Predictive models for health outcomes due to SARS-CoV-2, including the effect of vaccination: a systematic review

Abstract

Background

The interaction between modelers and policymakers is becoming more common due to the increase in computing speed seen in recent decades. The recent pandemic caused by the SARS-CoV-2 virus was no exception. Thus, this study aims to identify and assess epidemiological mathematical models of SARS-CoV-2 applied to real-world data, including immunization for coronavirus 2019 (COVID-19).

Methodology

PubMed, JSTOR, medRxiv, LILACS, EconLit, and other databases were searched for studies employing epidemiological mathematical models of SARS-CoV-2 applied to real-world data. We summarized the information qualitatively, and each article included was assessed for bias risk using the Joanna Briggs Institute (JBI) and PROBAST checklist tool. The PROSPERO registration number is CRD42022344542.

Findings

In total, 5646 articles were retrieved, of which 411 were included. Most of the information was published in 2021. The countries with the highest number of studies were the United States, Canada, China, and the United Kingdom; no studies were found in low-income countries. The SEIR model (susceptible, exposed, infectious, and recovered) was the most frequently used approach, followed by agent-based modeling. Moreover, the most commonly used software were R, Matlab, and Python, with the most recurring health outcomes being death and recovery. According to the JBI assessment, 61.4% of articles were considered to have a low risk of bias.

Interpretation

The utilization of mathematical models increased following the onset of the SARS-CoV-2 pandemic. Stakeholders have begun to incorporate these analytical tools more extensively into public policy, enabling the construction of various scenarios for public health. This contribution adds value to informed decision-making. Therefore, understanding their advancements, strengths, and limitations is essential.

Peer Review reports

Introduction

Coronavirus 2019 (COVID-19) is a pathology caused by the SARS-CoV-2 virus. Since 2020, it has resulted in numerous cases and fatalities [1]. Its transmission primarily occurs from person to person, exhibiting high transmissibility and a variable, unpredictable course [2]. Consequently, in March 2020, the World Health Organization (WHO) classified COVID-19 as a pandemic due to its global spread.

Throughout the initial 2 years of the pandemic, cases spread swiftly across the world, albeit asynchronously, yielding heterogeneous effects among different territories. At present, there exist multiple highly effective vaccines against COVID-19, with over 12.5 billion doses administered worldwide [3]. It is important to note that the success of COVID-19 vaccination hinges on factors such as the duration of immunity conferred by vaccines, their efficacy against new SARS-CoV-2 variants, and the implementation protocols in each country [4,5,6].

The scientific community has been fervently engaged in describing and studying epidemiological phenomena through theoretical and methodological modeling. Hence, this review delves into diverse facets, including the types of epidemiological mathematical model employed, the simulation software used as well as sociodemographic, socioeconomic, clinical, and vaccination-related factors [7]. Epidemiological mathematical modeling furnishes vital information for informed decision-making in public policy. It plays a pivotal role in comprehending and managing infectious diseases, as its tools facilitate the simplification of complex and uncertain scenarios [8]. These quantitative models serve to build scenario planning, evaluate possible scenarios, and analyze them according to their potential risks based on different health outcomes.

Furthermore, the interaction between modelers and policymakers has become more prevalent due to the escalating computing speed in recent decades. This relationship is characterized by complex models that collect the reality of the epidemiological situation in detail and interventions to mitigate it [7]. The recent SARS-CoV-2 pandemic is closely intertwined with this interaction. As a result, the objective of this research is to compile and analyze the predictive models developed for studying diverse health outcomes stemming from COVID-19. This analysis adopts a mathematical epidemiological approach, involving the scrutiny of real-world data. Given the new generation of quantitative health analyses applied to the pandemic with real-world data, we wanted to review what kind of new approaches had been developed that could serve as a basis for future applications of predictive analytics.

Methodology

A systematic literature search was conducted in accordance with the rapid review format guidelines established in the Cochrane international methods [9]. This approach was complemented with the literature review methodology from software engineering [10, 11]. The PROSPERO registration number for this study is CRD42022344542. A comprehensive and generic search strategy was formulated and subsequently tailored for the diverse sources of information. Language, study type, and date restrictions were not applied. The search strategy encompassed articles available up to April 1, 2022.

The study’s target population comprised individuals who had been vaccinated against COVID-19 within the context of mathematical epidemiological models. The intervention under examination was the COVID-19 vaccine, regardless of brand name. The health outcomes sought were the number of deaths, recovered, hospitalized, infected, and susceptible due to COVID-19. Inclusion criteria therefore referred to the development of mathematical epidemiological models that had used real-world data for their analysis.

The consulted databases included PubMed, JSTOR, medRxiv, LILACS, EconLit, IEEE Transactions on Software Engineering, ACM Transactions on Software Engineering Methodology (TOSEM), Empirical Software Engineering Journal, Journal of Systems and Software, and Information and Software Technology. Google Scholar was utilized for grey literature search. These databases selection was based on the quantitative approach of the literature review and the expertise of the authors.

The search strategy, outlined in Supplement N°. 1, was also complemented with manual searches. The articles included in this review provided information on mathematical epidemiological models of the SARS-CoV-2 virus, incorporating vaccination and relying on real-world data for analysis. Mendeley software facilitated reference deduplication, and Microsoft Excel® software was employed for the screening process. Two independent groups of reviewers (OE, DR, JR, and VB, AR, CS) evaluated each title and abstract in a blinded manner, with conflicts resolved by a third evaluator (OE or LM) when necessary. Full-text screening followed a similar process. Studies included in the full-text review phase were the incorporated into the data extraction phase. Excluded studies are listed in Supplement N°. 2.

Five reviewers (VB, DR, AR, CS, JR) extracted the information, which was independently verified by two reviewers (OE and LM) using a data extraction form designed in Microsoft Excel® (Supplement N°. 3A). Variable definitions are provided in Supplement N°. 3B. The extracted characteristics were summarized descriptively. Results were categorized based on the country’s income, determined by gross national income (GNI) per capita [12], into high-income countries (USD > 12695), upper-middle income (USD 4096–12695), lower-middle income (USD 1046–4095), low-income (USD < 1046), and multi-income studies encompassing countries from different income levels.

Methodological quality was assessed by independent reviewers using two tools: initially, the Joanna Briggs Institute (JBI), classic checklist based on the methodological design [13, 14], and PROBAST, a specialized tool for assessing the risk of bias and applicability assessment in predictive modeling studies in health sciences [15]. Methodological quality assessment was carried out by an expert (LM) and, when needed, agreed upon with a second methodological expert (OE). The methodological tools are shown in Supplement N°. 4.

Results

Literature search results

A total of 5646 references were identified from indexed databases, and an additional 197 documents were located in the grey literature. Among these, 2362 (40.4%) were excluded as duplicates, leaving 3481 (59.6%) references for title and abstract screening. From these, 398 (11.43%) references proceeded to full-text evaluation, resulting in the inclusion of 202 (5.8%) articles. Furthermore, by applying a snowball strategy using the Connected Papers platform (which uses advanced textual analytics techniques) and reviewing the reference lists of the included articles, an additional 209 references were incorporated, yielding a total of 411 documents included in this review. The information flowchart is depicted in the PRISMA figure (see Supplement N°. 5), and the list of included studies is presented in Supplement N°. 6.

In terms of publication distribution, 16 [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31], 280 [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311], and 115 [312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426] articles were published in 2020, 2021, and 2022, respectively. The majority of these articles were concentrated in high-income countries, constituting 69.3% of the publications [16, 17, 20,21,22,23,24, 26, 29,30,31, 33, 34, 37, 39,40,41,42,43, 46,47,48,49, 51, 52, 57,58,59, 61, 62, 66, 68,69,70, 72,73,74,75,76,77, 79, 80, 82,83,84,85,86,87, 90, 92,93,94, 98,99,100,101,102, 106,107,108,109,110,111,112,113,114, 116,117,118,119,120,121,122,123,124, 126,127,128, 130,131,132,133,134,135,136,137, 140, 142, 145, 147, 150, 151, 153,154,155,156,157,158,159,160,161, 164, 165, 167, 169,170,171,172,173, 175, 178,179,180,181,182,183,184,185,186,187,188, 191,192,193,194,195, 197,198,199,200,201,202, 204, 205, 207, 209, 211, 212, 217, 220, 222, 224, 227, 232,233,234,235,236,237, 239,240,241,242,243,244,245,246,247,248,249,250,251,252, 254, 255, 257, 258, 260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283, 285, 286, 288,289,290,291,292,293,294,295,296,297,298, 300, 302,303,304, 307, 310, 314,315,316,317, 323, 324, 326,327,328, 332, 333, 335,336,337,338, 340, 342,343,344,345, 347,348,349, 351,352,353, 355, 357,358,359,360,361,362,363,364,365,366,367,368, 372,373,374, 376,377,378, 381,382,383,384,385, 387,388,389,390,391, 393, 394, 397, 400, 401, 403,404,405, 407, 410, 412, 415, 416, 418, 420, 421, 423,424,425,426]. Only 8% of the articles were for low-middle income countries [36, 38, 44, 45, 54, 56, 89, 104, 105, 115, 129, 138, 143, 144, 146, 163, 206, 218, 223, 256, 287, 299, 306, 312, 318, 319, 331, 339, 369, 380, 398, 399, 414]. The United States (USA) had the most articles related to the topic, with 106 articles (25.8%) [16, 17, 21, 24, 26, 29, 31, 34, 42, 43, 47, 49, 51, 57, 62, 74, 76, 82, 85, 87, 90, 93, 94, 107, 112, 113, 118,119,120,121,122, 131, 133, 135, 137, 142, 147, 151, 153, 155, 159, 172, 179, 180, 182,183,184, 191,192,193,194, 198, 209, 217, 220, 227, 234,235,236, 241,242,243,244,245, 248, 251, 254, 255, 263, 265,266,267, 270, 273, 281, 290, 292, 296, 310, 328, 332, 333, 335, 337, 345, 348, 349, 351, 353, 358, 363, 366, 367, 377, 382,383,384, 388, 390, 391, 400, 401, 407, 410, 415, 424]. Canada followed with 22 documents (5.4%) [22, 33, 41, 66, 77, 111, 126, 145, 154, 156, 181, 197, 200, 202, 205, 240, 269, 276, 300, 317, 357, 364], and China with 20 articles (4.9%) [25, 55, 63, 81, 88, 91, 95, 97, 174, 208, 216, 221, 225, 229, 230, 320, 329, 330, 409, 411]. No information from exclusively low-income countries was found (see Table 1).

Table 1 General characteristics of the articles included by income level of the country

Description of epidemiological mathematical models

The primary mathematical model employed was the SEIR (susceptible, exposed, infectious, and recovered) compartmental model, utilized in 47% [193]) of the retrieved articles [17, 19, 24,25,26,27,28, 31,32,33,34,35,36, 42, 45, 46, 57, 60, 65, 67, 69, 71, 73, 77, 80, 87, 89, 91, 92, 94, 96,97,98,99,100, 103,104,105,106,107, 109, 111,112,113, 115, 118, 122,123,124,125,126,127,128, 130, 131, 133, 135, 138,139,140,141,142,143,144,145,146, 148,149,150, 156, 157, 159,160,161,162,163,164,165, 167, 168, 174,175,176,177, 179, 180, 183, 185, 187, 188, 192, 199, 210, 212, 215, 218,219,220, 223,224,225,226,227, 233,234,235,236,237, 241, 243, 247, 251,252,253,254,255, 258, 260, 261, 264, 266, 267, 269, 271, 272, 275, 276, 279,280,281, 283, 284, 288, 289, 291, 294, 297,298,299,300,301, 304, 306, 307, 310, 312,313,314, 317,318,319, 326, 327, 331, 341, 344,345,346,347, 350, 352,353,354,355, 357, 359,360,361, 369, 371, 381, 382, 384, 390,391,392, 396, 397, 399, 401, 403, 405, 408, 412,413,414,415,416, 418, 420, 422, 423, 426]. The category “other models” encompasses less commonly models, such as Bayesian networks, Poisson models, and other compartmental models (see Table 2).

Table 2 Characterization of the types of mathematical models by income level of the country

Regarding the operational characteristics and accessibility to the databases and models employed in the included studies, it was noted that only 16.3% of the references did not engage in mathematical development of the proposed model [18, 20, 30, 40, 43, 52, 59, 62, 81, 83, 90, 120, 121, 131, 137, 142, 151, 154, 157, 167, 170, 172, 173, 177, 178, 188, 189, 192, 194, 202, 205,206,207,208, 211, 215, 217, 222, 242, 245, 247,248,249,250, 254, 273, 277, 285, 295, 301, 305, 324, 332, 334, 339, 340, 351, 358, 365, 372, 385, 387, 390, 393, 404, 417, 426]. The main software utilized was R, accounting for 19.7% of cases [27, 29, 32, 33, 40, 41, 53, 60, 62, 67, 76, 82, 91, 92, 94,95,96, 106, 109, 111, 114, 117, 132, 138, 157, 164,165,166, 179, 180, 211, 218, 219, 222, 238, 244,245,246, 248, 253, 254, 261, 264, 275, 277, 279, 283, 288, 292, 295, 296, 311, 313, 319, 323, 325, 326, 328, 330, 332, 334, 336, 346, 347, 349, 355, 361, 363, 373, 378, 380, 382,383,384, 388, 389, 392, 394, 397, 407, 415]. A majority of the references, 59.9%, did not present the programming code with open access. Merely 3.6% of these references developed a dashboard [60, 74, 84, 114, 180, 250, 256, 260, 280, 295, 331, 338, 378, 380, 382] (see Table 3).

Table 3 Operational characteristics of the mathematical models

Description of sociodemographic aspects

When analyzing the sociodemographic characteristics reported in the included articles, it was observed that 252 articles (61.3%) incorporated age in their models [16,17,18, 21, 23, 25, 26, 29, 31, 33, 34, 36, 39,40,41,42, 44, 47, 48, 50, 52,53,54,55, 58,59,60, 64,65,66, 68,69,70, 72, 74, 77, 79, 80, 82,83,84,85,86,87,88, 90, 92, 95, 96, 99, 102,103,104,105,106, 108, 109, 111,112,113,114, 116,117,118,119,120,121,122,123,124,125,126, 129, 133, 134, 137, 138, 140,141,142, 144, 147,148,149,150,151, 153,154,155,156,157, 160,161,162, 164, 165, 167,168,169,170, 174, 175, 178, 179, 184, 187, 188, 192, 196, 199, 201,202,203,204,205,206, 208, 210, 211, 213, 215,216,217, 219, 221, 222, 227, 229, 234,235,236,237, 241, 242, 245,246,247,248,249, 253,254,255, 258,259,260, 262, 263, 266, 269, 271,272,273,274,275,276,277, 279, 281,282,283,