Skip To Content

    The Wikipedia Pages You Read Could Help Predict Disease Outbreaks

    Scientists have found that tracking visits to Wikipedia articles about some diseases can spot when outbreaks are happening.

    Patterns in the Wikipedia pages that people around the world visit could help predict outbreaks of diseases, according to a new paper published today.

    Wikimedia Foundation/vitanovski/Thinkstock

    According to the research by a team from Los Alamos National Laboratory, published today in PLOS Computational Biology, tracking visits to Wikipedia articles related to various diseases has the potential to provide a fast and accurate way of tracking when outbreaks are occurring.

    Being able to tell quickly when disease outbreaks are happening is vital for health authorities to respond to them. Traditionally, tracking this relies on reports of visits to physicians, laboratory tests, and collecting data from medical centres across a country. This provides accurate data, but it can be expensive and (crucially) slow – when the speed of response can be vital.

    Tracking disease outbreaks based on people's internet activity has already had some success — notably, Google's Flu Trends.

    Google / Via

    Spikes in Google search queries related to flu have been shown to closely match actual cases of flu – with the benefit that these spikes can potentially be identified more quickly than the standard medical methods of collecting data.

    But one downside of this approach is that the data isn't publicly available – Google controls access to it. Which is why the team decided to look for a public, open source alternative. They settled on the access logs for visits to every Wikipedia article in the world, which anybody can download.

    They found that their model was able to accurately track Dengue fever outbreaks in Brazil and Thailand, influenza outbreaks United States, Poland, Japan, and Thailand, and tuberculosis in Thailand and China.


    In total, in eight out of the 14 pairings of disease and country they looked at over the course of three years, Wikipedia visits were an accurate guide to disease outbreaks. That's despite their model, largely developed as a proof-of-concept, being relatively crude – it only looks at a small selection of articles potentially related to the diseases, and (because the Wikipedia logs don't track visitors geographical locations) the researchers had to use the language of Wikipedia articles as a rough guide to where users were.

    In addition to "nowcasting" the state of diseases (accurately tracking their incidence day-to-day), the successful models were also able to forecast with some accuracy how the outbreaks would play out. Much as with Google's Flu Trends, that could be because people are likely to search the internet for information about their symptoms before they go to their doctor.

    However, other disease/country pairings weren't successful, such as cholera in Haiti, Ebola in Uganda and the Democratic Republic of Congo, and HIV/AIDS in China and Japan. The researchers think this might be due to reasons such as noise in the Wikipedia data, the diseases either changing too slowly or with too few cases for a pattern to emerge, or the country having realtively poor internet connectivity (as is the case in Haiti).

    Despite some of the drawbacks of the approach, though, the researchers think the technique shows promise.

    Joe Raedle / Getty Images

    While the paper acknowledges some of the problems of using Wikipedia as an early warning system (such as the fact that it tends to "strongly over-represent people and places with good internet access and technology skills") the fact that the model seems to work for some diseases, and appears to be easily transferrable between countires, suggests a more sophisticated model could be a useful tool in helping health authorities spot outbreaks as early as possible.

    Lead researcher Sara Del Valle said in a statement: "A global disease-forecasting system will change the way we respond to epidemics. In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast."

    BuzzFeed Daily

    Keep up with the latest daily buzz with the BuzzFeed Daily newsletter!

    Newsletter signup form