Monday, May 11, 2020

Reactions to R0, the SIR model and herd immunity



Reactions to R0, the SIR model and herd immunity

Nowadays, anybody with internet access who has read a couple of papers fancies himself/herself to be an epidemiologist. As a physicist, by training, I also want to jump in with my reactions to what little I have read, to highlight the small fraction that (I think) I have understood.

Time Variations: exponential, logistic, SIR and SIRD models:

In the initial phases the growth is exponential. In the later phases, the growth slows down and can be fitted to the logistic curve, as done by Rhett Allain [1,2], because there is only a finite population that the virus can infect, while the exponential would keep going forever. Ranjan [2] points out that the exponential curve predicts the initial build-up, but fails to predict the eventual flattening out. The logistic curve:

I (t) = K/[ 1 + A exp(-rt)]

Where A = (K/I0) -1, reduces to the exponential for small values of t: 

I (t) = I0 exp(rt).

But the logistic curve, once fitted to the data, fails to fit the initial time variation.

However, most epidemiologists follow the basic SIR model. Proposed by Kermack & McKendrick [3], the model considers three categories of people: those susceptible (S) to infection, those infected (I) and those who have recovered (R). It is cast in terms of 3 coupled differential equations:

dS/dt = - a SI
dI/dt = aSI – b I
dR/dt = bI

where N is the population size, S is the number of susceptible people, I is the number of infected people, R is the number of recovered people, (such that N = S + I +R) , a is the infection rate (in units of 1/day), b is the recovery rate (1/day) and c is the mortality rate (1/day). This is expressed schematically in terms of transitions between compartments:

The basic SIR model was modified to take the probability of death, after getting infected, into account, through the mortality rate c (1 /day), becoming the SIRD model by adding another differential equation and adding one term to the dI/dt equation:

   dS/dt = - a SI
dI/dt = aSI – (b +c)  I
dR/dt = bI
dD/dt = cI
such that N = S + I +R + D.

This is expressed in compartments as:
Basic Reproduction Number (BRN), R0:

But what of R0? According to Aronson et al [4], the parameter R0 only characterizes the transmissibility, or the average reproduction number, of the virus at the very beginning of the epidemic in a homogeneous population (when there is zero immunity in the population to the virus, so the whole population is susceptible).
The importance of R0 is that if it is less than unity, the virus cannot spread (a physicist would call it ‘sub-critical’); but if it is more than one, the virus multiples, exponentially (in physicist-speak:  avalanche multiplication).
As the number of infected people changes, the transmissibility decreases, and it is given by an effective reproduction rate Re. This is related to R0 by the formula:

Re = R0 (1 – Pi), 

where Pi is the proportion of the population that is now immune to the virus – which keeps increasing with time. But, to back up a bit, R0 is affected by [3]:

(i)                  The size of the population N, and the fraction of the population that is susceptible at the beginning Ps(0) (Above Aronson said all are susceptible?)
(ii)                The infectiousness of the organism
(iii)               The recovery rate b (in the SIR model, recovery includes death in addition to remission).

Both R0 and Re increases as N increases, and as Ps(0) increases, and  both decrease as the recovery rate b increases. The effective reproduction rate Re is also affected by the behavior of people e.g. physical distancing will decrease it.

According to Jones [5] and Delamater et al [6], it must be emphasized that R0, the basic reproduction number (BRN), is dimensionless; it is often misleadingly characterized as a ‘rate’, which could only happen if it had units of (time)-1:

R0 = tcavd

Where d is the duration of infectiousness (in days), cav is the average rate of contact between susceptible and infected people (number/day) and t is the transmissibility of the infection, or the probability that the contact will result in infection.

Jones [5] also differentiates between R0 and Ri. The first generation of an epidemic is all the secondary infections that result from infectious contact with the index case, or generation zero. Ri refers to the BRN of the ith generation, and R0 refers to the number of infections generated by the index case (Gen zero).  This is similar to the parameter Re mentioned above [4]. The time between generations is called the serial interval, or the average time between successive infections (1/b), or the number of infectious days [7]. The problem with accurately estimating R0 is that, initially, numbers are small and subject to statistical fluctuations. The estimation of R0 is generally done by modeling and is tricky [5].

Similarly, Fernandez-Villaverde and Jones [8] express the parameter R0 as a product of two variables in an intuitive way: it is the product of the number of infectious days (1/b) and the number of (lengthy) contacts per day (a) (or 1/a is the time between contacts):

 R0 = (1/b)(a).
Where a = tcav.

In the SIR model: R0 = a/b, 
while in the SIRD model: R0 = a/(b + c). 

If there is no mortality, it reduces to the SIR formula. The number of infectious days is characteristic of the virus, and not controllable. But people can increase the time between contacts by social distancing.

Ranjan [2] uses the formula:

R0 = (a/b) [1 – (I0/N)]
But initially I0 << N.

Similar to the above formula: Re = R0 (1 – Pi) is another formula (they are the same if we neglect the recovered population) used by Caccavo in his SIRD model [9]:

Re = R0 (S/N)

Initially, S and N are almost the same, but as the number of susceptible people decreases Re declines. The million dollar question is: when does it hit one? That is the point in time at which the number of infections peaks, and beyond that the epidemic dies out as Re decreases further.

Herd Immunity:

The concept of herd immunity [4] was introduced in the context of vaccination [10,11]. If a population is immunized above a threshold H then the remainder of the population is given indirect protection, because the infection/virus encounters too may immunized individuals and is unable to propagate in the population. The herd immunity threshold is related to R0 by:

H = 1 – (1/R0)

For the case of covid-19, R0 is somewhere between 2.5 and 3.5, so the safe value of H is about 71% of the population, and about 29% of the population, though susceptible (never having been infected), will remain protected from infection.

What is interesting is a different equation derived by Bastin [11] and by Jones [5]. This relates R0 to the maximum fraction xim of the population that ever gets infected, just before the epidemic collapses to zero [11]:

R0 = - [ln(1 –xim)]/xim

Jones [5] expresses it equivalently in terms of the final susceptible fraction s¥:

ln (s¥) = R0 (1 - s¥)

Bastin also gives an equation for the maximum value of infected individuals, Imax:

Imax = N [1 – {{1 + ln(R0)}/R0}]

Bastin [11] even provides an I-S plot with an epidemic trajectory:

The maximum number of infections occurs at the value of S = b/a (or when Re = 1).

Compare this with the herd immunity threshold: H = 1 – (b/a), and with the formula for the maximum number of s¥.

For simplicity, assume R0 = a/b = 3. The peak Imax occurs when (S/N) = 1/3.
The herd immunity threshold is H = 2/3.
But what is the fraction s¥? it takes the value 0.06. That is, the fraction of susceptible individuals is not 1/3 as expected from the herd immunity formula but lower.

Here it is necessary to clarify: Bastin derives the formula for xim in the context of a model that is different from the SIR model mentioned above. Without going into details of his derivation, he introduces another compartment called ‘exposed’, so it is now an SEIR model. A person who is ‘exposed’ has been exposed to the virus, but is not infectious yet. The SEIR model takes into account the processes of births, deaths and vaccinations.

The question then really is: does herd immunity apply when there is no vaccination? Can the infection itself be considered as equivalent to vaccination, since the infected person is now immune to further infection (never mind whether it was due to a live virus or an inactivated virus)?

The problem within the SIR model – which does not consider vaccination – is that the infections continue beyond the herd immunity threshold, until s has fallen to s¥ (see the Figure), at which point the number of infections drops to zero.

One might try to define the herd immunity threshold as the point at which herd immunity just starts up, without really giving all of the susceptible individuals immunity immediately - just an increasing level of immunity to more and more individuals as s approaches s¥.  This is a consistent way of interpreting the equations – but that is not what many prominent epidemiologists have claimed in public. The advocates of herd immunity vocally state that all we have to do is get 70% of the population infected, and the remaining 30% will be protected. The advocates of this approach, initially in the U.K. and even today in Sweden [4], argue that we need only isolate the elderly and the sick (with co-morbidities), who have higher chances of dying than the young and healthy majority (but see below!) According to Aronson et al [4]:

“The problem with leaving people to catch the infection spontaneously, leading to herd immunity, would increase the death rate. For example, on 10 April, the number of confirmed cases in Sweden was 9685 with 870 deaths (9.0%), compared with Norway with 6219 confirmed cases and 108 deaths (1.7%) and Denmark with 5830 confirmed cases and 237 deaths (4.4%).”

However, Aronson et al [4] also argue:
“For example, if R0 = 2, immunization needs to be achieved in 50% of the population. However, if R0 = 5 the proportion rises steeply, to 80%. Beyond that the rise is less steep; an increase in R0 to 10 (for measles) increases the need for immunization to 90%.
When other children become immune the infected child who encounters 10 children will not be able to infect them all; the number infected will depend on Re. When immunity is 90% or more the chances that the child will meet enough unimmunized children to pass on the disease falls to near zero, and the population is protected.” (emphasis added by me).

The point being made here is that the number to be infected is higher: 94%, not 70% for R0 = 3. The lower and upper limits of R0 are also tabulated:

R0
(1/R0)
s¥
 s¥ (from SIR program)
2.5
0.400
0.11
0.103
3
0.333
0.06
0.056
3.5
0.286
0.035
0.0306
5
0.200
0.007
0.00500
7
0.143
0.0008
0.00047

But look at the last 2 rows: for a high R0 of 5 only a very small fraction (0.7 %) is ‘protected’. And for even higher values, that fraction becomes even smaller (0.05%).

So the question about the SIR vs the SEIR model has real consequences.

The results of the SIR simulation (using Euler’s method [11]) for R0 = 3 are similar to Bastin’s plot above. The arrow on the S-axis indicates S¥ = 0.056. The arrow indicating the maximum has its position at S/N = 0.33 = 1/R0 as expected, and the value of the maximum is Imax = 3.07 x 105, close to the number expected (3.00 x 105) from the equation for Imax given above (by Bastin).

I just found an article by Prof. Gautam Menon [13] which states that there are ‘sound methodological reasons’ to prefer the SEIR model to the SIR model. Among others, the covid-19 has asymptomatic patients (the ‘E’ in SEIR) which SIR does not capture. More importantly, Menon states that the SIR model: “predicts the numbers of people with COVID-19 in India will decline immediately after a lockdown is imposed. In contrast, an SEIR model would have predicted that the case load would continue to increase before beginning to drop”.

There is plenty of criticism of the compartment models. For example, Delamater et al [6] points out that R0 will fluctuate if human-human or human-vector interactions vary with time and space. These compartment models need to be modified to take into account the fact that populations may be homogeneous or heterogeneous (e.g. different social classes or age groups), that interactions are dynamic – either deterministic or stochastic (with various possible distribution functions) – and pathogen pathways may be complex involving intermediate hosts. In other words, some of the basic simplifying assumptions made in early models have had to be modified to take complex realities into account. Prof.Menon raises these concerns as well [13].

Similar criticisms have been leveled by Huppert and Katriel [14]:

a)  the assumption of “well-mixed populations” disregards the facts of geographical and social proximity in enabling or preventing contacts
b)  individuals are different, some are more prone to infection, others may be more effective in spreading infection (super-spreaders).
c) the duration of infection is assumed to be exponentially distributed, but an individual need not become infectious immediately after contracting an infection, and recovery from infection may depend upon how much time has passed since getting infected.
d) populations are assumed to be  large, so they can be treated deterministically by differential equations; while small populations (a village or a school) may be dominated by stochastic effects.

Menon [13] points out that there are other approaches than the compartment model: a) an agent-based model b) Monte Carlo models c) purely statistical or machine learning models d) network models. 

Interestingly, Roda et al [16] claim to have fitted the post-lockdown data in Wuhan to both SIR and SEIR models, and say that the former is preferable, because the latter has too many parameters that need to be estimated, and so the simpler SIR model is better.

Regalado [17] says that for an R0 of 3, 66% of the population has to be immune before the effect (of herd immunity) ‘kicks in’, according to ‘the simplest model’. Regalado is being careful, because herd immunity, even at 66%, is not a free pass…

As a parting shot, note the following paragraph in a UK newspaper [18]:

“About 60 per cent is the sort of figure you need to get herd immunity.” The coronavirus won’t stop at 60 per cent, though, according to the British scientists. Models promoted by Dr. Vallance and Britain’s chief medical adviser, Chris Whitty, indicate that 80 per cent will ultimately become infected. The trick is to limit that 80 per cent to those who can be safely immunized, since infection for the others could mean death.

This is not the dominant narrative: see the characterization by Aronson et al above. Are epidemiologists just dumbing down the narrative for a public that they believe is incapable of understanding the details? But Chris Whitty did explain the details... 

To sum up: getting to 80% is harder than getting to 60%, since there are more deaths along the way. The idea of herd immunity is not as attractive as it seems to be at first sight.

References:
1.       Rhett Allain Wired 24th March 2020 https://www.wired.com/story/the-promising-math-behind-flattening-the-curve/
2.       Rajesh Ranjan,”Predictions for covid-19 outbreak in India using epidemiological models,” 3rd April 2020     preprint: https://www.researchgate.net/publication/340314461
3.       W.O.Kermack and A.G.McKendrick Proc.Roy.Soc. (Lond) A115 (1927) 700
4.       J.K.Aronson et al “’When will it be over?’: An introduction to viral reproduction
numbers, R0 and Rewww.cebm.net/oxford-covid-19/
5.       J.H.Jones “Notes  on R0”  1st May 2007
https://web.stanford.edu/~jhj1/teachingdocs/Jones-on-R0.pdf
6.       P.L.Delamater et al, “Complexity of the Basic Reproduction Number, R0” Emerging Infectious Diseases 15 (2019) 1       www.cdc.gov/eid
7.       Max Fisher, “R0, the messy metric,” https://www.nytimes.com/2020/04/23/world/europe/coronavirus-R0-explainer.html

8.       J.Fernandez-Villaverde and C.L.Jones, “ Estimating and simulating a SIRD model of Covid-19 for many countries, states and cities,” 28th April 2020 Ver 0.6
9.       D.Caccavo, “Chinese and Italian outbreaks can be correctly described by a modified SIRD model,”
17th Apr.2020 medRxiv preprint doi: https://doi.org/10.1101/2020.03.19.20039388
10.   P.E.M.Fine, “Herd immunity: history, theory, practice”, Epidemiologic Rev. 15 (1993) 265
11.   G.Bastin, “Lectures on mathematical modeling of biological systems,” 22nd Aug.2018 (GBIO 2060)
12.   Peter Adams,” Theory and Practice in Science”  https://people.smp.uq.edu.au/PeterAdams/SCIE1000/scie1000_notes_part2.pdf
14.   A.Huppert and G.Katriel,”Mathematical modeling and prediction in infectious disease
epidemiology, “ Clinical Microbiology and Infection (Nov.2013) DOI: 10.1111/1469-0691.12308
16.    W.C.Roda et al, “Why is it difficult to accurately predict the covid-19 epidemic?” Infectious Diseases Modeling 5 (2010) 271
17.   A.Regalado, “What is herd immunity and can it stop the coronavirus?” 17th Mar 2020 https://www.technologyreview.com/2020/03/17/905244/what-is-herd-immunity-and-can-it-stop-the-coronavirus/
18.   Lawrence Solomon 16th March 2020, “Britain’s approach to coronavirus: will herd immunity work?” https://www.theglobeandmail.com/opinion/article-britains-novel-approach-to-coronavirus-will-herd-immunity-work/












No comments: