Fearbola, Ebola and the Web

My nasty “cold” has been diagnosed as Influenza A, so it’s bed rest for 48 hours. And, of course, blogging about why Ebola gets all the news but not good ‘ol killers like influenza. I got CDC figures for deaths and then ran Google searches for the related terms, totaling the number of hits. I was surprised at first. The number of hits seemed to roughly correspond to the death rate. Ebola was way off, massively over reported, but the general trend seemed right. However . . . .

Big_ebolaBut that’s just an artifact of cancer and heart disease, which kill four times as many Americans as the “runner up,” respiratory diseases.


Once we remove these two, the data shows what I was looking for: presence on the web and mortality have no discernable relationship. In fact, the weak correlation is negative. Respiratory diseases are the number one killer after the cancer and heart disease, but they are not, it seems, web savvy. Same for kidney disease. Anyone have a t-shirt from the “Nephrotic syndrome 5K and Fun Run”? Didn’t think so. And don’t get me started on the flu, the Rodney Dangerfield of infectious diseases. In some cases, the abundance of websites makes sense. HIV AIDS transmission has plummeted becasue of public education. But why is Alzheimer’s a web sensation, whereas stroke is ho-hum? And, in some cases, these mismatches point to dangerous pubic confusion about risk. Heart attacks are considered a “man’s problem” but it’s a major cause of death for women. The relatively weak web presence of heart disease probably flags this gendered misperception, which then leads to the under-diagnosis and under-treatment of women.

Name Web hits Deaths Web search term CDC term
Ebola 54,800,000 1 Ebola deaths US Ebola
Whooping cough 549,000 7 Whooping cough deaths US Whooping cough
HIV AIDS 30,500,000 15,529 HIV AIDS deaths US Human immunodeficiency virus (HIV) disease
Murder 50,000,000 16,238 Murder deaths US Assault (homicide)
Parkinson’s disease 6,760,000 23,111 Parkinson’s disease deaths US Parkinson’s disease
Liver disease 14,050,000 33,642 Liver disease deaths US Chronic liver disease and cirrhosis
Suicide 40,100,000 39,518 Suicide deaths US Intentional self-harm (suicide)
Kidney disease 7,780,000 45,591 Kidney disease deaths US Nephritis, nephrotic syndrome, and nephrosis
Influenza Pnuemonia 13,350,000 53,826 Influenza deaths US PLUS Pnuemonia deaths US Influenza and Pneumonia
Diabetes 18,700,000 73,831 Diabetes deaths US Diabetes
Accidents 28,500,000 84,974 Accidents deaths US Accidents (unintentional injuries)
Alzheimers 42,900,000 84,974 Alzheimer’s deaths US Alzheimer’s disease
Stroke 24,100,000 128,932 Stroke deaths US Stroke (cerebrovascular diseases)
Respiratory diseases 9,310,000 142,943 Respiratory disease deaths US Chronic lower respiratory diseases
Cancer 64,100,000 576,691 Cancer deaths US Cancer
Heart disease 27,200,000 596,577 Heart disease deaths US Heart disease



Visualizing Ebola

The Guardian recently posted a dataviz comparing Ebola to other infectious diseases. It’s from a forthcoming book entitled Knowledge is Beautiful and it is indeed beautiful. Unfortunately, it’s a really bad viz. Below is my alternative viz (using the Guardian’s data), along with a critique.

The basic issue is evolution. Because viruses reproduce quickly so they’re a great example of Darwin at work. Basically a win for a virus is to reproduce a lot. A lot, a lot, a lot. Darwin is simple that way. So once a virus has infected a host, it makes sense to breed like crazy. With one caveat: if you over-reproduce and kill the host, you might lose your transmission vector. So be careful. And if you wait too long, the host might recover: her immune system might learn how to wipe you out. So viruses have to balance virulence and transmission efficiency. You can kill your host quickly, but then you’d better have lots of means of infecting other people. Alternately, if you’re willing to let your host drag around for a week with the sniffles, going to work and school, then you don’t need to be especially infectious. The host will give you plenty of occasions to find new hosts. (I’m blogging with a head cold so this is personal). But overall we should see a clear pattern: more lethal viruses should be more transmissible.

Indeed, my viz below (using the Guardian’s data) shows this rough correlation between virulence and transmissibility. Salmonella doesn’t last long on surfaces, but instead it lets its infected host live and spread the disease through other means. C.diff and tuberculosis are more lethal, but they can survive on surfaces for longer. The Norovirus seems like an outlier, but this makes sense. It spreads primarily through surface contact, so its durability on surfaces is unexpected high. By contrast, Bird Flu is unexpected weak on surfaces, but it spread primarily through droplets. And Ebola is weak on surfaces because it spreads overwhelming through bodily fluid.


But it’s clear that the Guardian’s data is extremely buggy. The data are scraped from the web and are full of errors: HIV does NOT survive on dry surfaces for seven days. That’s probably seven hours. Same for syphilis.

An even bigger problem is that Guardian viz seems to refute Darwin. On their graph deadly diseases seem LESS infectious. What’s going on? First, their x-axis doesn’t make much sense. The reported average rate of infection doesn’t tell us about how well a virus might spread under neutral or ideal conditions. Rather, it tells us how people and public health systems respond to outbreaks. HIV transmission, for example, has dropped in around the world because people have intervened to cut off disease vectors. The difference in HIV prevalence around the world tells us about education, public health, and culture, but not much about the virus itself. Also the x-axis should be on a log scale. And the y-axis should be on a logit scale. Using the fatality rate on a linear scale builds a non-linearity into the relationship, since fatality has to asymptote near 0% and 100%.

So the Guardian graph is indeed beautiful. But it also misuses faulty data to refute evolution. Outside of that it’s great. I’m going to take more ibuprofen now.