Predictive models for health outcomes due to SARS-CoV-2, including the effect of vaccination: a systematic review

Background The interaction between modelers and policymakers is becoming more common due to the increase in computing speed seen in recent decades. The recent pandemic caused by the SARS-CoV-2 virus was no exception. Thus, this study aims to identify and assess epidemiological mathematical models of SARS-CoV-2 applied to real-world data, including immunization for coronavirus 2019 (COVID-19). Methodology PubMed, JSTOR, medRxiv, LILACS, EconLit, and other databases were searched for studies employing epidemiological mathematical models of SARS-CoV-2 applied to real-world data. We summarized the information qualitatively, and each article included was assessed for bias risk using the Joanna Briggs Institute (JBI) and PROBAST checklist tool. The PROSPERO registration number is CRD42022344542. Findings In total, 5646 articles were retrieved, of which 411 were included. Most of the information was published in 2021. The countries with the highest number of studies were the United States, Canada, China, and the United Kingdom; no studies were found in low-income countries. The SEIR model (susceptible, exposed, infectious, and recovered) was the most frequently used approach, followed by agent-based modeling. Moreover, the most commonly used software were R, Matlab, and Python, with the most recurring health outcomes being death and recovery. According to the JBI assessment, 61.4% of articles were considered to have a low risk of bias. Interpretation The utilization of mathematical models increased following the onset of the SARS-CoV-2 pandemic. Stakeholders have begun to incorporate these analytical tools more extensively into public policy, enabling the construction of various scenarios for public health. This contribution adds value to informed decision-making. Therefore, understanding their advancements, strengths, and limitations is essential.


Introduction
Coronavirus 2019 (COVID- 19) is a pathology caused by the SARS-CoV-2 virus.Since 2020, it has resulted in numerous cases and fatalities [1].Its transmission primarily occurs from person to person, exhibiting high transmissibility and a variable, unpredictable course [2].Consequently, in March 2020, the World Health Organization (WHO) classified COVID-19 as a pandemic due to its global spread.
Throughout the initial 2 years of the pandemic, cases spread swiftly across the world, albeit asynchronously, yielding heterogeneous effects among different territories.At present, there exist multiple highly effective vaccines against COVID-19, with over 12.5 billion doses administered worldwide [3].It is important to note that the success of COVID-19 vaccination hinges on factors such as the duration of immunity conferred by vaccines, their efficacy against new SARS-CoV-2 variants, and the implementation protocols in each country [4][5][6].
The scientific community has been fervently engaged in describing and studying epidemiological phenomena through theoretical and methodological modeling.Hence, this review delves into diverse facets, including the types of epidemiological mathematical model employed, the simulation software used as well as sociodemographic, socioeconomic, clinical, and vaccination-related factors [7].Epidemiological mathematical modeling furnishes vital information for informed decision-making in public policy.It plays a pivotal role in comprehending and managing infectious diseases, as its tools facilitate the simplification of complex and uncertain scenarios [8].These quantitative models serve to build scenario planning, evaluate possible scenarios, and analyze them according to their potential risks based on different health outcomes.
Furthermore, the interaction between modelers and policymakers has become more prevalent due to the escalating computing speed in recent decades.This relationship is characterized by complex models that collect the reality of the epidemiological situation in detail and interventions to mitigate it [7].The recent SARS-CoV-2 pandemic is closely intertwined with this interaction.As a result, the objective of this research is to compile and analyze the predictive models developed for studying diverse health outcomes stemming from COVID-19.This analysis adopts a mathematical epidemiological approach, involving the scrutiny of real-world data.Given the new generation of quantitative health analyses applied to the pandemic with real-world data, we wanted to review what kind of new approaches had been developed that could serve as a basis for future applications of predictive analytics.

Methodology
A systematic literature search was conducted in accordance with the rapid review format guidelines established in the Cochrane international methods [9].This approach was complemented with the literature review methodology from software engineering [10,11].The PROSPERO registration number for this study is CRD42022344542.A comprehensive and generic search strategy was formulated and subsequently tailored for the diverse sources of information.Language, study type, and date restrictions were not applied.The search strategy encompassed articles available up to April 1, 2022.
The study's target population comprised individuals who had been vaccinated against COVID-19 within the context of mathematical epidemiological models.The intervention under examination was the COVID-19 vaccine, regardless of brand name.The health outcomes sought were the number of deaths, recovered, hospitalized, infected, and susceptible due to COVID-19.Inclusion criteria therefore referred to the development of mathematical epidemiological models that had used realworld data for their analysis.
The consulted databases included PubMed, JSTOR, medRxiv, LILACS, EconLit, IEEE Transactions on Software Engineering, ACM Transactions on Software Engineering Methodology (TOSEM), Empirical Software Engineering Journal, Journal of Systems and Software, and Information and Software Technology.Google Scholar was utilized for grey literature search.These databases selection was based on the quantitative approach of the literature review and the expertise of the authors.
The search strategy, outlined in Supplement N°. 1, was also complemented with manual searches.The articles included in this review provided information on mathematical epidemiological models of the SARS-CoV-2 virus, incorporating vaccination and relying on realworld data for analysis.Mendeley software facilitated reference deduplication, and Microsoft Excel ® software was employed for the screening process.Two independent groups of reviewers (OE, DR, JR, and VB, AR, CS) evaluated each title and abstract in a blinded manner, with conflicts resolved by a third evaluator (OE or LM) when necessary.Full-text screening followed a similar process.Studies included in the full-text review phase were the incorporated into the data extraction phase.Excluded studies are listed in Supplement N°. 2.
Five reviewers (VB, DR, AR, CS, JR) extracted the information, which was independently verified by two reviewers (OE and LM) using a data extraction form designed in Microsoft Excel ® (Supplement N°. 3A).Variable definitions are provided in Supplement N°. 3B.The extracted characteristics were summarized descriptively.Results were categorized based on the country's income, determined by gross national income (GNI) per capita [12], into high-income countries (USD > 12695), uppermiddle income (USD 4096-12695), lower-middle income (USD 1046-4095), low-income (USD < 1046), and multiincome studies encompassing countries from different income levels.
Methodological quality was assessed by independent reviewers using two tools: initially, the Joanna Briggs Institute (JBI), classic checklist based on the methodological design [13,14], and PROBAST, a specialized tool for assessing the risk of bias and applicability assessment in predictive modeling studies in health sciences [15].Methodological quality assessment was carried out by an expert (LM) and, when needed, agreed upon with a second methodological expert (OE).The methodological tools are shown in Supplement N°. 4.

Literature search results
A total of 5646 references were identified from indexed databases, and an additional 197 documents were located in the grey literature.Among these, 2362 (40.4%) were excluded as duplicates, leaving 3481 (59.6%) references for title and abstract screening.From these, 398 (11.43%) references proceeded to full-text evaluation, resulting in the inclusion of 202 (5.8%) articles.Furthermore, by applying a snowball strategy using the Connected Papers platform (which uses advanced textual analytics techniques) and reviewing the reference lists of the included articles, an additional 209 references were incorporated, yielding a total of 411 documents included in this review.The information flowchart is depicted in the PRISMA figure (see Supplement N°. 5), and the list of included studies is presented in Supplement N°. 6.

Types of vaccines, number of vaccines, and heterologous vaccination
Although most articles indicate the use of vaccines in the modeling, many did not specify a particular vaccine.Specifically, 319 articles (77.6%) considered the use of only one type of vaccine [16-32, 34, 35, 37-41, 43 [91,96,115,122,127,201,322,333,341,422].
Heterogeneous vaccine was indicated for high-income countries in two (0.5%) articles [106,382].In 95 articles (23.1%), the difference in days between administered articles considered a period equal to or greater than three months [203,211,222,269,300,314,317,333,334,350,361].Four articles identified had dose intervals greater than 150 days, all in high-income countries [211,222,314,333], with one having a 240-day interval for the booster dose [333].

Vaccine effectiveness and difference between effectiveness
Most studies were conducted in high-income countries.Within the total references, effectiveness rates of 50%, 60%, 70%, 80%, and 100% were presented in 63, 40, 47, 52, and 33 articles, respectively.Notably, articles focused on low-middle-income countries predominantly employed an effectiveness rates of 80%.Out of the 33 articles considering an effectiveness rate of 100%, 23 (5.6%) were conducted in high-income countries.
The majority of articles did not specify the exact outcome against which the effectiveness was measured.Among those reporting a 100% effectiveness rate, two indicated effectiveness in preventing death [99,282].In 14 articles (3.4%), the effectiveness rates were reported to be less than 10% [23,28,78,106,112,179,229,311,319,326,364,379,391,400].Notably, two of these articles mentioned an effectiveness of 0%.The first dealt with the gamma variant using the CoronaVac vaccine [311], while the second addressed subsequent infection [379].
In applying the PROBAST tool, it was concluded that 65.3% of the articles presented a low risk in the bias risk domain.On the other hand, in the applicability domain, 89.8% reported a low risk.Articles classified as high risk in the bias risk domain often lacked information about the population studied.In contrast, those classified as unclear in the applicability domain lacked sufficient methodological details for evaluation.Many of these articles also lacked information on the methods used to control for third variables or analyze predictive outcomes (Supplement N°. 7).

Discussion
In this review, most models were developed in countries classified as high income, and the SEIR model was more frequently used.This research identified a considerable increase in these articles in the years 2021 (1650%) and 2022 (618.8%)compared to the initial year of the pandemic, a fact that reflects the unavailability of vaccines until the end of 2020.
High-income countries have used these strategies extensively to advise on health decision-making.In this sense, it is striking that only Singapore, the United Arab Emirates, South Korea, Japan, Italy, and Canada are among the countries that exceeded 80% of the population vaccinated [427].The USA, where the largest number of identified models were developed, has one of the highest crude mortality rates from the disease, even after deploying vaccinations, suggesting a disconnect between the developers of the models and their actual implementation and articulation with public health.
This systematic review provided insight into the main features that have been incorporated into mathematical epidemiological models worldwide, allowing for a clear picture of the different advantages and disadvantages of the multiple analytical approaches to describe COVID-19 behavior and vaccine effectiveness.For governmental decision-making and depending on the public policy question to be answered, this research is a comprehensive compilation that provides sufficient information on the different options for scientific teams to choose from.Depending on the health outcomes to be predicted, the socio-demographic variables to be included, among other aspects, there is an amalgam of algorithms to carry out the mathematical epidemiological modeling.One point to keep in mind during mathematical modeling in COVID-19 is that the inputs used should ideally come from reliable sources close to the context of the community in which the model is being applied.Hence the importance of an effective surveillance system in capturing cases and outcomes of the disease.Many highincome countries opted to use new technologies to analyze patterns of cases, to strengthen their surveillance system [428].
Among the main advantages of mathematical modeling in infectious diseases and in the context of public health are the multiple representations of scenarios, which can predict essential health outcomes at lower costs [429].However, the realization of these models often implies the need to financially support multidisciplinary groups carrying out such processes.Unfortunately, not all institutions and governments can afford this.In our systematic review, this gap becomes visible in the scientific production of high-income countries compared to lowermiddle and low-income countries.
One limitation of this study is that the predictive capacity of the analytical models was not reviewed, given the impossibility of measuring this characteristic homogeneously.In addition, unfortunately, most of the articles did not present their open programming codes, making it impossible to replicate and review their computational developments.Another limitation of this research was that it was not possible to take into account predictive models used (by some governments) that were useful for decision-making, because some governments considered it a national security issue and never shared these advances with society.
Although the most widely used model (SEIR, in its classic version) has limitations [430], such as the absence of a defined case model, discrepancies in the information available at the population level compared to the individual [431], a lack of incorporation of individual behavior and social influence, and the lack of flexibility to incorporate new evidence [432], it is a very useful and practical model for different purposes in environments with high uncertainty.Therefore, it is necessary to dynamically adjust the models to the reality of the evolution of the pandemic.
Based on our exhaustive compilation, we consider it good international practice for future analytical modeling of infectious diseases to (i) use as many parameters as possible from real-world evidence from the area being studied; (ii) be clear about the mathematical structure of the model being applied (annexes can provide such details of the document); (iii) show all the parameters with which the basis of the exercise is modeled; and (iv) where possible leave the programming code open or available on request to the reader.
Likewise, one of the significant challenges at present is to know the impact of vaccination (effectiveness) in the long term, whether hybrid protection (natural and vaccine immunity) is better, and what vaccination schemes (one, two, three, or four doses) or what type of heterologous schemes produce a greater response.Mathematical modeling is of great help in epidemiology and public health; however, increasing the number of parameters can make the analysis, calibration, implementation, and interpretation of results difficult.
Epidemiological mathematical models are a tool that allows us to predict the behavior of the virus, with certain limitations, providing information for decision-making on public health control measures.The development of more and better epidemiological mathematical models in public health serves as a tool to mitigate negative scenarios and to aid policymakers in navigating through uncertain contexts.It will be essential to standardize the methods used for epidemiological modeling to guarantee high-quality results.Similarly, it will be vital to ensure human and financial resources so that these models are made in the best possible way, with accurate data and in real time.Thus, health policies must be based on evidence to generate the best results in the population.

Table 2
Characterization of the types of mathematical models by income level of the country

Table 5
Variants identified in the articles by income level of the country

Table 6
Outcomes identified in the included articles by income level of the country