Saturday, December 26, 2020

Risk of catching covid-19 on a long-haul flight

 

Plane risk on long-haul flights

Health Warning: the following is both long and messy, and in parts, reflects my lack of clarity.

Freedman studied [1] all peer-reviewed or public health publications in a 6-month period of possible, likely or unproven in-flight transmission of covid-19. Freedman argues that the absence of large numbers of confirmed cases of in-flight transmission of covid-19 is encouraging – but not definitive evidence that fliers are safe. He also notes that a few cases have been noted of infections having occurred at more than 3 rows away from the index case, so the standard ‘two row rule’ may need to be re-examined.

Doucleff [2] cites Freedman’s evidence [1] that Emirates airlines has a strict masking policy and that even though they transported covid-positive patients nobody else got infected because everyone rigorously wore masks (Emirates staff ensured this policy was implemented).

Barnett & Wilder-Smith [3a] found that the odds of getting the virus in a standard 2-hour flight are about 1/4300, and the odds of getting the virus are about half that, 1/7700, if airlines leave the middle seat empty.

More recently, based on data from late September 2020 [3b], Barnett & Fleming revised the odds to about 1 in 3,900 for full flights and 1 in 6,400 when middle seats are kept empty.

.

In making his estimates, Barnett approximated the probability that a given airline passenger has COVID-19, the probability that universal masking could prevent a contagious passenger from spreading the disease, and how risk of infection changes based on the locations of the infected and non-infected passengers. [4]

Barnett [3] assumed that everyone on a plane is wearing masks (all U.S. airlines have mandated mask policies), and that the primary risk to passengers comes from others in the same row and, to a lesser degree, the rows behind or in front of the passenger. Seatbacks provide some measure of protection from passengers in other rows, Barnett said. “Other passengers do not pose as much of a risk because of the air purification systems on airplanes, he said.” [5]

Barnett’s estimate of  1/(4,300) was for the U.S. (1 in 6,500 people confirmed positive daily); his estimate for the U.K. was about 10 times lower i.e. about 1/(40,000) [6], because of the lower prevalence rate of infection in the U.K. at the time (1 in 60,000 people confirmed positive daily).

An infected couple flew from China to Canada on 22 January, none of the other 350 passengers on the 15-hour flight were infected, probably because masks were worn.

The air in planes is replaced every 3 to 5 minutes, and the air that is recirculated goes through HEPA filters that should remove almost all droplets containing viruses [6]. “The ventilation systems on planes are very effective in reducing the overall concentration of any airborne pathogen exhaled by passengers,” says Dr.Julian Tang (University of Leicester). The main risk may be face-to-face conversations where air can be exchanged before being pulled away – along with any conversations before or after the flight.

Barnett argues that: "…three things have to go wrong for you to get infected (on a flight). There has to be a Covid-19 patient on board and they have to be contagious," he says. "If there is such a person on your flight, assuming they are wearing a mask, it has to fail to prevent the transmission. They also have to be close enough that there's a danger you could suffer from the transmission" [7]. Barnett says he took all of these probabilities into account before determining an overall transmission risk.

Barnett [7] states that there isn't much of a difference in terms of risk between passengers sitting in an aisle seat on a full flight and those in the window seat. However, the chances of becoming infected are ever so slightly higher for those in aisle seats, because they simply have more people around them.

Barnett [7] states that because of his age (72) he will not travel by plane soon – but he advises that for any high-risk person, one should wear not merely a mask but also a face shield – to prevent aerosols from entering in due to mask in-leakage.

Additive:

Dr Henry Wu [8] associate professor at Atlanta's Emory School of Medicine, said the findings were inconclusive on their own because the minimum infective dose remains unknown, and risks increase in step with exposure time.

Boeing tests concluded that sitting beside an infected economy passenger is comparable to seven-foot distancing in an office, posing an acceptably low risk with masks [8]. Airbus showed similar findings, while Embraer tested droplet dispersal from a cough. Some 0.13% by mass ended up in an adjacent passenger's facial area, falling to 0.02% with masks [8].

"It's simply additive," said Wu, who would prefer middle seats to be left empty. "A 10-hour flight will be 10 times riskier than a one-hour flight" [8].

If the risk of infection in a 2 hr flight is 1/4300 (as calculated by Barnett [3]), for a 10 hr flight it is 5/4300 i.e. about 0.116% according to Wu’s logic.

Exponential:

a)     If the risk is constant per unit time, then the risk for 1 hr is:

  (1/4300)1/2 = 0.0152;  compared to: 1/4300 = 2.32x10-4

The chance of not getting infected in 1 hr is 1  - 0.0152 = 0.9847.

So the chance of not getting infected in 10 hrs is: (0.9847)10 = 0.8571.

So the chance of getting infected in 10 hrs is 1 – 0.8571 = 0.1429.

That is, about 14%.

This method is wrong because the risk for 1 hr has to be less than the risk for 2 hrs!

b)     

For 1 hr:

[1 – (1/4300)]1/2 = 0.99988

For 10 hrs: 1 - (0.99988)10 = 0.00116  i.e. 0.116%

(Whereas: For 5 hrs: 1 - (0.99988)5 = 0.0006  i.e. 0.06%

So this method is correct).

This gives the same result as Wu’s additive method. Probably because of the binomial approximation:

1 - (1 - t)n @ nt.

Usually the condition for the approximation to be valid is: t << 1.

In this case, when n can be very large: nt << 1 (n is large when t is given as a rate per minute instead of a rate per hour).

Recent CDC guidelines suggest that a total of 15 minutes exposure in a day to a virus-infected person is enough to cause infection; it need not be a continuous 15 minute exposure to an infected person [9].

The critical viral load for infection is still not known. But since the virus can survive up to 9 hours on human skin [10], and aerosols from an infected person are being continuously emitted, the ‘additive’ model seems less likely to be correct than the constant rate (exponential) model.

c)

Hertzberg [11] has used the risk of infection as 1.8% per minute on the basis of an incident in 1977 in which 38 out of 54 passengers stuck in a plane on the tarmac for 4.5 hrs without air circulation were infected with an influenza-like illness (ILI). However, bio-mathematician Howard Weiss (Penn State Univ) – and a co-author - has criticized this study by saying that nobody knows these probabilities for SARS or covid-19, and that assuming an omnidirectional flow of virions is a crude assumption. Nevertheless, it is interesting that using this number for a 10 hr flight gives the risk of infection as: 1 – (1 – 0.018)600 = 0.99998, or about 100%. Instead if we assume 1.8% per hour, then we get: 1 – (1 – 0.018)10 = 0.17, or 17%. Hertzberg does mention that the number 1.8% per minute is ‘conservatively high’ and based on an incident when the aircraft ventilation system was off (the plane was on the ground).


Figure from [12a]: The airflow in the plane is also designed to keep any droplets spewed by a passenger from floating around the cabin. The air flows from the top of a passenger’s head and is collected down by their feet, which keeps whatever we exhale from spreading too far horizontally.

d)

A study (partly funded by airlines) [12a, 12b,14] showed that it would take 54 flight hours for an individual sitting next to an infected individual to get the assumed infectious viral dose. The assumed dose has not been confirmed by virologist because it is not known for covid-19. “Mannequins with and without face masks sat in various seats on the aircraft while fluorescent tracer particles were released at intervals of two seconds to simulate breathing for a minute during ground and in-flight tests. Real-time fluorescent particle sensors were placed throughout the aircraft at the breathing zone of passengers to measure concentration over time,” in a Boeing 767. The other issue is that aerosols in covid-19 may not be the same as the tracer particles used. Also: the researchers assumed only one person aboard the plane was infected, that everyone was wearing masks at all times, and that the infected passenger (mannequn!) never turned their head and sat facing forward for the entire flight.

Anyway, just for fun assume 1.8% per hour, with 54 hrs:

1 – (1 – 0.018)54 = 0.625, i.e. 62.5% chance of getting infected.

e)

But Barnett [3b], estimated the transmission risk on different flights per minute in the range of 0.0005 to 0.05.

 (0.9995)120 = 0.942:   for 2 hours; low transmission risk t: 6% (Tel Aviv-Frankfurt)

(0.95)120 = 0.0021: for 2 hours; high transmission risk t: 99.8% (Vietnam flight)

Assuming the Tel Aviv for 10 hours:

1 – (0.9995)600 = 0.259.   That is: 26%.

Assume masking reduces transmission risk by 50% from 0.05 to 0.025.

Then: 1 - (0.975)600 @ 1

How does one explain the range of t?

An IHME study [12] by Mokdad was a meta-analysis of the effect of mask wearing. Mokdad concluded that if 95% of people work masks this could avert 30% of the deaths. This is based on a particular case of prevalence in the U.S. where the degree of transmission goes down, decreasing the R0. IHME made a similar prediction for India that 95% mask use could prevent 200,000 deaths [13]. This analysis is disputed by others (such as Natalie Dean [12]), who doubt whether the model captures real world effects. However, it is clear that it does not account for the variation in transmission coefficient (0.0005 to 0.05!), which can only be the result of variations in prevalence in the group of actual flyers and in their behavior (social distancing, hand-washing etc.).

Using Wu’s additive method for 10 hrs: (600)(0.0005) = 0.3

Most likely risk for 10 hrs: t = 1/(3,900) for a full-flight of 2 hrs with everyone wearing masks [3b]. So for 5 hrs, the risk of infection is: 5/(3,900)  or 0.13%.

Note:  this risk (as pointed out by Barnett) depends upon the assumed prevalence. The risk may go down if the prevalence is lower than the assumed value, and go up if it is higher.

The TRANSCOM report [14] - not peer reviewed - uses the figure of 4,000 virions/hr, but acknowledges the fact that this is only indicative. Another study by Jianxin et al [15] is quoted by them as mentioning a figure of 1,03,000 to 22,500,000 viral particles/hr produced by infected individuals in exhaled breath by 14 out of 52 patients – but the remainder had no detectable virus in their breath. Further, these are particles of viral copy numbers derived from RT-PCR – but not all of these viral copies are infectious, and the ratio of infectious copies to total viral copies may be as low as 1/300 [14]. This becomes a range of 350-75,000 virus/hr (infectious).

The study by the airlines (Airbus, Boeing & Embraer) [16] said that the HEPA filters and airflow patterns in an airplane meant that 1 foot of separation on an airplane corresponds to 6 feet of separation on the ground in open air.

Sophie Bushwick [17] points out that the TRANSCOM study [14] assumes that passengers are always wearing surgical masks, whereas people often take them off for meals and talking, and does not account for movement within the plane. She also quotes two peer-reviewed studies by Sebastian Hoehl said: “An airplane cabin is probably one of the most secure conditions you can be in“ [18,19].

 In the  NEJM letter [18] Hoehl et al found 2 out of 114 patients tested positive by RT-PCR (1.8%) and cell culture indicated potential infectivity, in a German air force evacuation of mostly German passengers from Wuhan. In the JAMA study [19], Hoehl et al looked at passengers in a 4 hr 40 min flight from Tel Aviv to Frankfurt and found that 7 out of 24 members of a tourist group tested positive by RT-PCR and concluded that there were likely 7 index cases and two potential in-flight covid-19 transmissions, that may also have occurred before the flight, but both infected passengers sat within two rows of an index case. No one wore masks, so this low risk could have been further reduced had all passengers worn masks.

A 2011 study [20] by A.Ruth Foxwell et al found a 3.6% increased risk of contracting H1N1 from a symptomatic passenger within two rows in a plane, and a 7.7% risk if the index case was even closer: 2 seats in front, 2 seats behind or 2 seats either side.

Is this risk per hour? Hertzberg [11] assumed 1.8% per minute!

The size of the covid-19 virus is thought to be between 0.06 and 1.4 microns [21], whereas the HEPA filters in the plane remove 99.97% of particles above the size of 0.3 microns. What about particles smaller than 0.3 microns? Not clear, but presumably a lower percentage.

Khanh et al [22] studied a few in-flight transmission clusters during a long-haul (10 hours) commercial flight from London to Hanoi in March 2020. 16 persons were later found to be infected, of whom 12 had been sitting in business class with just one symptomatic person. Seating proximity to an infected person was strongly correlated with risk of infection risk (risk ratio 7.3: 11 persons (out of 12 infected, 92%) were sitting within 2 rows of the index case, and just 1 (out of 12, 13%) more than 2 rows away. The most likely mode of infection were aerosol or droplet transmission from the index case. Face masks were not recommended, or widely used, in March 2020 – and the authors say there is no data on whether they were used in this flight.

One study (quoted in [23]) suggests that “infrared thermal image scanners for mass screening of travellers at airport have a specificity of 71% and sensitivity of 86% to detect fever, but there are variations depending on where the camera is positioned, which part of the body is being scanned, and other environmental and individual factors that can affect the precision of these thermal scanners.” Neither the sensitivity nor the specificity seems to be particularly high! This probably explains why no one relies exclusively on IR thermal scanners, apart from the additional problem that many patients infected with covid-19 get a fever for a limited time period – or not at all.

Currently airlines require a negative RT-PCR test conducted within 72 hours of travel. This certainly would help reduce the numbers of escapees, but would not bring it down to zero because even RT-PCR tests do not have 100% sensitivity, and one might get infected at any point after the test was taken. Still, this is a useful step taken by airlines and public health authorities.

Lastly, one must mention a recent Korean study by Kwon et al [24] of infection that occurred in a restaurant. The index case was 6.5 metres away and the person who got infected was exposed for just 5 minutes, as determined from CCTV footage. The infection happened because the airflow due to the AC system happened to direct the viruses from the index case to the person who got infected. It is not clear if the index case was coughing, sneezing or speaking in a loud voice. Kwon et al quote the CFD study by Dbouk and Drikakis [25] to the effect that droplets could travel 6 metres with a 4 km/hr wind, whereas they measured the airflow as between 3.6-4.3 km/hr.

If we take the lower limit of 1,03,000 viral particles/hr [15] mentioned above, and the 5 mins exposure time [24], the critical viral load for infection may be  as low as 8,600 viral particles. Silcott [14] states that the literature has values as low as 300 and as high as several thousand as the infectious dose (compared with 280 for SARS-COV-1).

Bottomline:

Barnett [6] has calculated the risk of infection as 1/4,300 for a full flight (middle seat occupied) that lasts 2 hours in a high prevalence area (1 positive case in 6,500 per day). He calculated 1/4,300 based on an average of the per capita rate of infection per week in Texas (high, 1/184) and New York (low, 1/1,000) [3a] as 1/310.

Wu pointed out that the risk for a 10 hour flight is 5X greater (since the risk is low, using the binomial approximation, as above).

Barnett added that in low prevalence areas (1 positive case in 60,000 per day) the risk is even lower (~1/40,000). Barnett also has assumed the probability of infection as a function of the distance d between the index case and the infectee is given by:

T = 0.13 exp(- 0.69d)

based on a meta-analysis by Chu et al [26].

Barnett very rightly emphasizes that these are only estimates, because of the large number of assumptions made in the calculations.

Currently in the US (25th December 2020), the number of cases every day is roughly 200,000 based on worldometers data [27]. On a per capita basis, that is about 1/1650 per day. This is about 4X of the value quoted above (1 in 6,500 per day). Thus the risk is about 5/4,300. Take a 10 hour flight, and the risk is 20/4,300 or about 1/215 or about 0.45%.

The TRANSCOM study [14] estimated 54 hours of flight time to reach as infectious dose. Just as a check we can now calculate with current US covid-19 numbers of 200,000 per day:

1 – {1 – [4/4300]}54 = 1 - 0.95 = 0.05

I.e. a 5% chance of getting covid. Note that the TRANSCOM study predicts a higher risk (maybe 95%?) because an aerosol particle generator (proxy for an infected person) is definitely present, unlike possibly present as in Barnett’s calculations [3].

My conclusion is that today a 10 hr flight from the U.S. would give you about a 0.45% chance of catching covid in-flight.

References:

1.     D.Freedman & A.Wilder-Smith  J.Travel Med. doi: 10.1093/jtm/taaa178 (18th Sep.2020)

2.     M.Doucleff https://www.npr.org/sections/goatsandsoda/2020/10/20/925892185/do-masks-really-cut-your-risk-of-catching-covid-19-on-long-plane-flights

3.     a). Arnold Barnett &  doi: https://doi.org/10.1101/2020.07.02.20143826.this version posted August 2, 2020

b) Arnold Barnett & Keith Fleming doi: https://doi.org/10.1101/2020.07.02.20143826   22nd Oct.2020

4. https://www.bloombergquint.com/opinion/is-it-safe-to-fly-here-are-the-odds-of-catching-covid-on-a-plane

5. https://mitsloan.mit.edu/ideas-made-to-matter/study-empty-middle-seats-make-flying-safer-during-covid-19

6. Michael LePage https://www.newscientist.com/article/2252152-how-likely-are-you-to-be-infected-by-the-coronavirus-on-a-flight/

7. Tamara Hardingham-Gill https://edition.cnn.com/travel/article/odds-catching-covid-19-flight-wellness-scn/index.html

8. Laurence Frost https://in.mobile.reuters.com/article/amp/idUSKBN27411C

9.  Ryan Malosh https://theconversation.com/amp/an-epidemiologist-explains-the-new-cdc-guidance-on-15-minutes-of-exposure-and-what-it-means-for-you-148707

10. https://www.livescience.com/amp/coronavirus-survives-9-hours-on-skin.html

11.  V.S.Hertzberg et al PNAS 115 (2018) 3263-67

12. a) Julia Belluz & Brian Resnick: https://www.vox.com/21525068/covid-19-airplane-risk-coronavirus-pandemic-airports

b) https://www.ustranscom.mil/cmd/panewsreader.cfm?ID=C0EC1D60-CB57-C6ED-90DEDA305CE7459D

12. https://www.npr.org/sections/health-shots/2020/07/23/894425483/can-masks-save-us-from-more-lockdowns-heres-what-the-science-says

13. https://www.hindustantimes.com/india-news/mask-use-can-prevent-200k-covid-deaths-in-india-study/story-y525iKAHAJdJiZVz0wySwI.html

14. David Silcott et al ,”TRANSCOM/AMC Commercial Aircraft Cabin

Aerosol Dispersion Tests(final TRANSCOM report).

15. Jianxin Ma et al Clinical Infectious Diseases, 28Aug.2020, ciaa1283, https://doi.org/10.1093/cid/ciaa1283 

16. Adam Rogers in Wired (18th Nov.2020) https://www.wired.com/story/can-you-get-covid-19-on-an-airplane-yeah-probably/

17. Sophie Bushwick et al Sci.Am. (19thNov.2020) https://www.scientificamerican.com/article/evaluating-covid-risk-on-planes-trains-and-automobiles2/

18.Sebastian Hoehl et al NEJM letter 382 (26th Mar.2020)

19. Sebastian Hoehl et al (18th Aug.2020) JAMA Network Open. 2020;3(8):e2018044. doi:10.1001/jamanetworkopen.2020.18044

20.A.Ruth Foxwell et al EID 17 (2011) 1188-94

21. Sandee LaMotte CNN, 16th Dec.2020 

https://edition.cnn.com/travel/article/air-travel-risk-covid-19-wellness/index.html

22.N.C.Khanh et al EID 26 (Nov.2020) 2617

23. Aisha Khatib et al J.Travel Medicine (2020) doi: 10.1093/jtm/taaa212

24. K.-S.Kwon et al  J Korean Med Sci. 2020 Nov 30;35(46):e415

https://doi.org/10.3346/jkms.2020.35.e415

25. T.Dbouk & D.Drikakis Phys. Fluids 32, 053310 (2020); https://doi.org/10.1063/5.0011960

26. D.Chu et al The Lancet, 395(10242), pp. 1973-1987, June 27, 2020

https://doi.org/10.1016/S0140-6736(20)31142-9

27. https://www.worldometers.info/coronavirus/country/us/

Saturday, November 14, 2020

Scrabble Bingo and Word Density

 

Scrabble bingo & Wordensity.

I thought I could work out the probability of getting a 7-letter word in Scrabble. Apparently this is known as Scrabble bingo.

I  could not do it, because it depends on how many total letters are there to start with, and how many are left at a given stage of the game – not to mention the total numbers of each letter in the beginning, and the number remaining at a given point in the game.

 Finally, I found a thread which discussed the probability of finding a 7-letter word on the starting draw in Scrabble. The estimated numbers are higher than what I obtained (for reasons beyond my ken). But a fair amount of searching online got me some pointers, which I quote later in this blog.

Since I found Scrabble probability too tough, I tried to get an upper limit by calculating the word density: namely, the number of actual English words out of the total number of possible combinations of letters.

But first some background (in other words: time-wasting tactics) [1]:

“Most adult native test-takers range from 20,000–35,000 words. Average native test-takers of age 8 already know 10,000 words. Average native test-takers of age 4 already know 5,000 words. Adult native test-takers learn almost 1 new word a day until middle age.”

The total number of words in English [2a] according to particular criteria:

“There are an estimated 171,146 words currently in use in the English language, according to the Oxford English Dictionary, not to mention 47,156 obsolete words.”

That is, most adult native speakers of English know about 10% of the total.

According to another website on 8th July 2008, however, [2b], it also quotes the OED as saying that there are 600,000 words. And this number is predicted to go up to 1 million by 2009. So the number should be much higher by now… The prediction is by Paul Payack, founder of Global Language Monitor and yourdictionary.com.

In other languages [3]:

Language

Words in the Dictionary

Korean

1,100,373

Japanese

500,000

Italian

260,000

English

171,476

Russian

150,000

Spanish

93,000

Chinese

85,568

 

 

a)      According to [4a]: the number of 7 letter words in English is: 32,909

While: the total number of 7 letter combinations in English is:


C (26,7) = (26!)/[(19!)(7!)]

Assuming perfect knowledge and recall, the maximum probability of getting a 7-letter word in Scrabble is:

P(7) = 32,909/687,500 = 0.0500   i.e. 5%.

 

b) The number of 8-letter words [4a] is: 40,161.

The total number of 8 letter combinations in English is:


C(26,8) = (26!)/[(18!)(8!)]

The binomial coefficients can be calculated using an online calculator, for example:  [5].

Assuming perfect knowledge and recall, the maximum probability of getting an 8-letter word in Scrabble is:

P(8) = 40,161/1,562,275 = 0.0257  i.e. 2.57%.

In general for n letters, where n goes from 3 to 20:

 

No.of letters in word

No.of words

Total number of combinations

Ratio (%)

3

1,292

2,600

49.69

4

5,454

14,950

36.48

5

12,478

65,780

18.97

6

22,157

230,230

9.62

7

32,909

657,800

5.003

8

40,161

1,562,275

2.571

9

40,727

3,124,550

1.303

10

35,529

5,311,735

0.669

11

27,893

7,726,160

0.361

12

20,297

9,657,700

0.210

13

13,857

10,400,600

0.133

14

9,116

9,657,700

0.0943

15

5,757

7,726,160

0.0745

16*

783

5,311,735

0.0147

17*

407

3,124,550

0.0130

18*

70

1,562,275

0.00448

19*

84

657,800

0.0128

20*

49

230,230

0.0213

N.B. Ref.[4a] stops at 15 letter words; so the number of words with 16 and more (indicated by *)  is from [4b].

Note that the binomial coefficient term in Col.3 goes through a maximum at n =13 as expected, and is symmetric about 13.

The total number of words (up till, and including, 12 letters) in the above Table in column 2 is: 238,897  - that may be compared with the OED number [2]: 171, 476 + 47,156 = 218, 632 – which is roughly comparable.


The maximum number of words in the wordlist seems to be at about 9 letters. 

Is this due to our cognitive limitations – or is it that we just do not need more words?

The average person can only remember 7 digit numbers reliably, but it's possible to do much better using mnemonic techniques [6]. That is similar – but chunking letters is a lot easier than remembering blocks of numbers – so the comparison is, at best, suggestive.

The word density (ratio) exhibits roughly an exponential decrease as the number of letters increases.

 

Getting back to Scrabble, I found a math thread which gives details of a correct Scrabble calculation [7]. The answer is: “There are 26,514 unique sets of 7-letters that generate a legitimate Scrabble word, and 3,199,724 possible Scrabble racks of 7 letters, so the probability that a starting draw is a valid 7-letter word is 0.0082.” (i.e. 0.82%).

The number of 7-letter words is a bit lower than the number tabulated above [4] (32,909), but that is okay. The problem is the 3-million number! It is larger by a factor of 4.86 than my estimate. This calculation takes into account the number of letters in the Scrabble starting draw:

My program assumes a rack size of 7 and these tile counts:
A=>9, B=>2, C=>2, D=>4, E=>12, F=>2, G=>3, H=>2, I=>9, J=>1, K=>1, L=>4, M=>2, N=>6, O=>8, P=>2, Q=>1, R=>6, S=>4, T=>6, U=>4, V=>2, W=>2, X=>1, Y=>2, Z=>1, '#'=>2
.”

It is possible that the calculation takes into account the probability of getting each word (out of the 26,514 or 32,909) considering the available number of letters (100) and the relative distribution of letters listed above. This is indeed what the next post [8] states (see below).

In addition, the contributors of the thread used packages SOWPODS (the Scrabble tournament word list used in most countries, other than US, Canada & Thailand) and perl. Needless to say, about such matters (among others), I am completely clueless.

However, another (more recent) post on Scrabble bingo probability [8] explains the calculation as follows:  “there are C(100,7) or 16 billion (actually: 16,007,560,800) equally likely ways to draw a rack of 7 tiles from the 100 tiles in the North American version of the game.  But since some tiles are duplicated, there are only 3,199,724 distinct possible racks (not necessarily equally likely)” – the same number quoted by [7].

He adds:

“According to the 2014 Official Tournament and Club Word List (the latest for which an electronic version is accessible), there are 25,257 playable words with 7 letters “.

After that, though, the post goes into lots of details about the frequency of words in the English language, rank ordering the most frequently used on top, and the least used at the bottom of the list – on the argument that most words in the list of 25,257 would not be either known or recalled.

Finally he concludes:

“if we include the entire official word list, the probability of drawing a playable 7-letter word is 21226189/160075608, or about 0.132601. “

This (13.3%) is even higher than my estimate of 5%! And he says he has used Mathematica in his calculations, but where he gets this number from…I haven’t a clue.

He concludes:

“A coarse inspection of the list suggests that I confidently recognize only about 8 or 9 thousand– roughly a third– of the available words, meaning that my probability of playing all 7 of my tiles on the first turn is only about 0.07.”

However, interestingly, the title of the blog is: “Possibly wrong.”

 

Naturally, a lot hinges on this number 3,199,724. Where did it come from? I reproduce here an argument from another blog [9] which explains this (full disclosure: I do not understand it):

There are:

·         4 letters with 7 tiles;

·         3 letters with 6 tiles;

·         4 letters with 4 tiles;

·         1 letter with three tiles;

·         10 letters with 2 tiles; and

·         5 letters with 1 tile.


A=>9, B=>2, C=>2, D=>4, E=>12, F=>2, G=>3, H=>2, I=>9, J=>1, K=>1, L=>4, M=>2, N=>6, O=>8, P=>2, Q=>1, R=>6, S=>4, T=>6, U=>4, V=>2, W=>2, X=>1, Y=>2, Z=>1, '#'=>2 .”

 

4 letters with ³7 tiles: A, E,I, & O

3 letters with 6 tiles: N, R & T

4 letters with 4 tiles: D, L, S & U.

1 letter with 3 tiles: G

10 letters with 2 tiles: B, C, F, H, M, P, V, W, Y & blank.

5 letters with 1 tile: J, K, Q, Z & Z.

A comparison with the list of letters shows that the polynomials do follow what is described…but what these equations are for…and how Wolfram Alpha spits out ‘3,199,724’ [7b]:

1 + 27x + 373x^2 + 3509x^3 + 25254x^4 + 148150x^5 + 737311x^6 + 3199724x^7 + 12353822x^8 + 43088473x^9 + 137412392x^10 ...
+ x^100

The thread [7b] also contains an ab initio calculation of how to get the number 3,199,724 … but it will take me a while to figure it out (like, t = ¥).

Anyway, another post by Derek, the Word Buff [10a], confidently states that the probability of a Scrabble bingo by the first player from the first rack is about 15% (no details of the calculation given). He assumes that the Official Scrabble Dictionary is being used and that the players have perfect memory and perfect ‘anagramming’ skills.

But others also give similar estimates [10b]:

“If we instead only require that the 7 tiles be rearranged to form a 7-letter word, then it's much easier: of the C(100,7)=16,007,560,800 ways to draw 7 tiles from the bag, 2,068,621,350 of those may be arranged to spell a word, for a probability of about 0.129228. If we prohibit blanks, then there are only 1,075,220,956, or about half as many ways to draw 7 "spellable" tiles.” 

And one more [10c]:

 “There are 24,029 valid tournament 7 letter words in English. Treating the fresh bag of tiles as a multivariate hypergeometric distribution with parameters 7 and the number of each type of tile, and getting the PMF for the distinct combinations of letters and blanks that result in at least one of the valid scrabble words results in probability 13790809/106717072, or ~12.9% chance you pull the letters/blanks to form one of the valid words.

N.B.: I use English word list and the American English tile distribution. The results will differ for different language tournament lists/tiles.”

And in the blog [10d]:

“I ran it on /usr/share/dict/words, as I say, which on this machine is an edition of the well-known online word list claimed to have been legally derived from Webster's New International Dictionary, 2nd Edition. The complete list is 235,882 words. And the results were:

7-letter words in dictionary: 20552
2017799913/16007560800 = 12.61%

However, there are two major sources of error here. First, the
word list includes a large number of obscure words, which in
practice few people would know.”

According to Mark Spahn [11], the process of determining the probability of getting the word ‘boot’ in the very first draw consists of 4 steps:

1.specify the problem

2. create a dictionary of words

3. compute the non-words

4. compute the chances

The probability is calculated as 0.71% of obtaining the word ‘boot’ in the very first draw.

According to Mark Spahn [11] the probability of obtaining the word “MINIMAL” at the very first draw is calculated as follows:

“There is C(2,2)=1 subset of 2 M's from the 2 M's in the bag.
There are C(9,2)=36 subsets of 2 I's from the 9 I's in the bag.
There are C(6,1)=6 subsets of 1 N from the 6 N's in the bag.
There are C(9,1)=9 subsets of 1 A from the 9 A's in the bag.
There are C(4,1)=4 subsets of 1 L from the 4 L's in the bag.

Thus there are 1*36*6*9*4=7776 distinct subsets of 2 M's, 2 I's, and 1 each N, A, L. The 7 distinct tiles can be arranged in 7! ways. Thus there are 7776*7! Distinct permutations of tiles that can be rearranged to spell MINIMAL.
The probability of drawing such a set of tiles from the bag is this number of permutations, divided by the number of all permutations of 7 tiles drawn (without repetition) from the bagful of 100 tiles, which is P(100,7) = 100*99*98*97*96*95*94.

So P{MINIMAL} = 7776*7!/P(100,7) = 7776/(P(100,7)/7!)
= 7776/C(100,7).

C(100,7) = 2^5*3*5^2*7*11*19*47*97 = 16,007,560,800.

P{MINIMAL} works out to 81/166,745,425.”

Spahn [11] also quotes from Albert Weissman [12] the probability of AEINORT (the most likely rack, but, sorry, no corresponding word) as:

 9*12*9*6*8*6*6/C(100,7) = 17,496/166,745,425 = 1/9530.488

and the least likely combination of letters as  BBJKQXZ with a probability of 1/C(100,7). That is, 1/(16 billion).

The most likely actual Scrabble bingo according to Weissman [12], is for the words TRAINEE and RETINAE, with a probability of 1/13,870. According to Spahn [11], it is slightly different:

“I get P{AEEINRT} = P{EEAINRT} = C(12,2)*9*9*6*6*6/C(100,7)
= 2,187/30,317,350 = 1/13,862.53.”

A=>9, B=>2, C=>2, D=>4, E=>12, F=>2, G=>3, H=>2, I=>9, J=>1, K=>1, L=>4, M=>2, N=>6, O=>8, P=>2, Q=>1, R=>6, S=>4, T=>6, U=>4, V=>2, W=>2, X=>1, Y=>2, Z=>1, '#'=>2 .”

 

--------------------------------------------------------------------------------------------------------------------------------

 

So what is the probability of Scrabble bingo? The numbers 3,199,724 and 16,007,560,800 occur everywhere. The first estimate gives a Scrabble bingo probability of 0.82% [7]. But all the others give higher values:

13.26% [8], 15% [10a], 12.92% [10b], ~12.9% [10c] and 12.61% [10d].

Clearly, nobody believes the estimate of 0.82% - most likely because it does not take into account the fact that the probability of each word is different (as explained in detail by Mark Spahn [11]). Am I sure of this? No! Because most of this combinatorics is, to me, gibberish!

Note that two writers in the thread  [7b] give almost the same value (12.9%) as [10b & 10c]. Well, 12.9% is popular, for sure…

It is interesting that the Scrabble bingo probability (12.9 %) is higher than the word density (5.0%)? It all boils down to the fact that in Scrabble there are 100 letters (with different numbers of letters, plus two blanks), while in the word density there are only 26 letters (with equal probability). And add to that: these geeks computed the probability of 26,514 words from the Scrabble dictionary using software that I had never heard of (perl & SOWPODs ) as well as some I had heard of (Mathematica & R), as well as Monte Carlo (perl: a general-purpose programming language originally developed for text manipulation)...

 

I will just stick to my pay-grade and my crude estimate and plot of wordensity…

References:

1.       1. https://www.economist.com/johnson/2013/05/29/lexical-facts

 

2.       2a). https://www.bbc.com/news/world-44569277#:~:text=We%20considered%20dusting%20off%20the,to%20mention%2047%2C156%20obsolete%20words.

 

a)      2b).  https://blogs.illinois.edu/view/25/4641#:~:text=With%20more%20than%20326%20million,it%20was%20400%20years%20ago.&text=Payack%20also%20predicts%20that%20some,millionth%20English%20word%20will%20appear.

 

3.       3.https://blog.ititranslates.com/2018/03/07/which-language-is-richest-in-words/

4.       4a) www.bestwordlist.com

4b)      https://www.bestwordlist.com/8letterwords.htm

4c)       www.yougowords.com

 

5.       5. https://miniwebtool.com/binomial-coefficient-calculator/?n=26&k=9

6.       6https://humanbenchmark.com/tests/number-memory

 

7.       7a). https://www.reddit.com/r/AskStatistics/comments/47a3z0/what_are_the_odds_of_being_able_to_spell_a_7/

b)      7b)http://godplaysdice.blogspot.com/2007/08/how-many-scrabble-racks-are-there.html?showComment=1228611720000#c2396124566679894906

8.       8).https://possiblywrong.wordpress.com/2017/01/20/probability-of-a-scrabble-bingo/

9.       9. https://math.stackexchange.com/questions/243685/how-many-possible-scrabble-racks-are-there-at-the-beginning-of-the-game

10.    

10a).       http://www.word-buff.com/what-is-the-probability-of-a-scrabble-player-making-two-seven-letter-words-on-their-first-two-turns.html

10b).      https://www.reddit.com/r/theydidthemath/comments/33noxv/request_i_pull_seven_scrabble_tiles_out_of_the/

10c).       https://www.reddit.com/r/theydidthemath/comments/40t22p/request_how_many_ifferent_combination_of/

10d).      https://groups.google.com/g/rec.puzzles/c/3dFsrDa9_oE?pli=1

1111. https://stats.stackexchange.com/questions/74468/probability-of-drawing-a-given-word-from-a-bag-of-letters-in-scrabble

1212.  Albert Weissman Scrabble Players Newspaper (Feb.1980)