Modeling Approach: Avicenna

Challenges with Traditional Modeling Approaches

The underlying technology behind our capability relies on ROSS, or Rensselaer’s Optimistic Simulation System. ROSS is the world’s largest scale simulation engine, having been developed and tested on millions of CPU cores. 


Over the past 20 years hundreds of models have been developed for ROSS, including our own pandemic modeling framework. This framework, collectively known as Avicenna, provides high fidelity models of infectious disease spread, human behavior, supply chain modeling, various mobility models and much more.


Avicenna's primary model is the epidemiological model, driving the transmission of disease between individuals as they interact at different locations over time. This modeling approach for disease spread has been implemented on the small scale as “agent-based models” for decades. Avicenna is unique in that the fidelity of the scenarios is nearly limitless given the scalability of the underlying engine. That fidelity is what really enables Avicenna to produce results that are unprecedented in the field of infectious disease spread.


The traditional approach in the study of infectious disease spread is based on mathematics and can be traced back to the 1700’s to Daniel Bernoulli who studied smallpox. Today, the most prevalent modeling is based on SIR modeling. A typical SIR model generates curves that indicate the sizes of the Susceptible, Infectious and Recovered persons within a population over time. These models are able to mathematically incorporate factors to adjust for variations in different diseases, social distancing such as school closures, or work from home policies.


SIR models are used by researchers to broadly study the impact of various mitigation strategies, and to provide guidance to decision makers on what steps might be most important to take for a given disease. What these models do not do well are to consider the variations in population density, variations in population demographics, variations in population mobility over time, and they very frequently neglect to incorporate human behavior. As we have seen in the United States, even simple mitigations can become highly personal, individual choices.


For example, we have recently seen large variations in the impact of COVID-19 on Italy. It has been observed that while the northern areas were hit very badly, some towns in the north were “skipped over”. Most traditional models assume that the mobility of individuals between areas is unimportant. SIR models assume that the demographics of a region are unimportant. Almost all models neglect human behavior and the choices individuals make for themselves. And so we are left with ad-hoc observations that cannot be explained by the math, because the math was never designed to account for all of these factors.

Global Pandemic Modeling with Avicenna

With Avicenna, we capture of all these effects and more. We account for mobility, and mobility decrease over time. We account for population demographics and densities, which are the largest predictors of disease impact on a community. We are able to achieve this through a combination of scalability and availability of data provided by our partners to understand how the disease impacts people at the individual level, and we are able to take into account those features that make each individual unique in how they respond to the spread of disease in their local area over time.

When we aggregate the results in Avicenna up to the full population, we find the same graphs as you would see in any traditional model. The power of Avicenna is not in understanding the impact of infectious disease at the top-level, but rather at the smallest level possible, right down to the individual level. Visualizing the results of Avicenna results are typically aggregated at the neighborhood, or Census Tract level in the United States. We also include individual location data, for first responders such as Fire Stations, EMS Stations, and Law Enforcement. We provide highly detailed data in over 4.7 million persons in the U.S. healthcare sector (for which we have data available) and align them with Hospital, Veterans Affairs, Urgent Care and other medical facilities to show the impact on hospital operations over time. We are able to break out other areas of interest such as food producers such as poultry producers or long-term care facilities such as nursing homes to understand the impact of the disease on these unique communities. For our commercial customers we are able to provide information on their workplace locations, and their workforce community down to the individual level.

Pandemic Spread in Avicenna

The Avicenna pandemic model is based largely on seminal epidemiology research of Ferguson, Longini and Eubank. Spread of the disease follows the general Susceptible-Exposed-Infected-Recovered (SEIR) and Susceptible-Exposed-Infected-Susceptible (SEIS) algorithms. In the SEIR algorithm, an individual contracts the disease through its interaction with other agents and passes through the four stages of the disease based on a variety of disease parameters such as time of incubation, time of contagion, time of recovery, mortality rate, and human behaviors. The SEIR algorithm was implemented through mixing in home, office, and school locations in the metropolitan area.


The home locations were derived from data on census tract populations from the U.S. Census Bureau, 2019 dataset. Office, school and other daytime locations were derived from data from the Department of Transportation that measured commuting and mobility patterns between census tracts. Disease transmission is then computed as a factor of the number of individuals at a given location and the duration of time spent in proximity to one another. Locations may be rooms, buildings such as homes or offices, and when data is available, planes, trains and other modes of public transportation.


At the individual level, we apply high fidelity demographic data to determine disease dynamics, such as mortality, but also to model human behavior. For example, what is the probability, based on the individual characteristics of a person that they will follow social distancing guidelines or wear a face mask? Our paid services incorporate a minimum of 1500 features per individual over the age of for all person in the United States. These features are used to determine when a person works while symptomatic; or socially distances while well (worried well); does a person adhere to a work or stay at home policy in their region and many other effects.


Each time a person enters or leaves a location, the group of individuals changes and the probability of infection must be computed for the timeframe those individuals spent together in their respective households, working groups or community. The probability of infection is computed based on the transmission probabilities for each potential infectious contact based on the group size and duration of time. If the infectious contact is receiving antiviral treatment, this transmission probability is further reduced by the antiviral efficacy for infectiousness. Similarly, if an individual has been vaccinated, the vaccine efficacy for infectiousness reduces the transmission probability. The transmission probability can be further reduced for asymptomatic (and possibly infectious) contacts. The probability of a susceptible person becoming infected is then computed as a product of all of the possible infectious contacts each day. A Bernoulli trial is conducted by generating a uniform [0,1] random number; if this number is less than P, the susceptible individual becomes infected and enters the latent phase of infection.


Additional increases or reductions in transmission probability can be applied by accounting for the location type, person type or both. For example, in hospital settings we assume healthcare workers are applying additional precautions that can reduce their probability of transmission by up to 90%. We can also apply an infectious probability to the location itself to model the spread of the disease through surfaces or airborne contact for a period after infectious individuals have left the location.

Natural Disease History

For COVID-19 we use the an influenza natural history model that includes the latent, incubation, and contagious periods, and the durations for each are sampled from discrete distributions, over the periods periods of 0.5-10 days, 0-2 days and 8-16 days, respectively. The contagious period includes both the slight difference between latent and incubation periods, as well as the standard post-incubation period when symptoms appear in the majority of infected people. Any infectiousness that is not accompanied by overt symptoms (namely, the post-latent part of the incubation period, if any, and the 33% of infected people who never develop symptoms) is assumed to be one-third as great as the infectiousness of symptomatic individuals, reducing the transmission probability by a factor of 3. The probability of infection during the incubation phase of COVID-19 appears to be real, though as yet undefined. We define an infection rate that is an additional factor of 3 lower than the infectious period for this phase of COVID-19.  Each of these parameters can be adjusted for a given modeling study.