Skip to main content
Dryad logo

Data from: A machine learning method to monitor China’s AIDS epidemics with data from Baidu Trends


Nan, Yongqing; Gao, Yanyan (2019), Data from: A machine learning method to monitor China’s AIDS epidemics with data from Baidu Trends, Dryad, Dataset,


Background: AIDS victims’ unwillingness to report their disease, due to social discrimination against them, makes it hard for disease control departments to accurately monitor the disease’s dynamics through traditional surveillance tools, such as over-the-counter drug sales and hospital or self-reported data. With the diffusion and adoption of the Internet, the ‘big data’ aggregated from Internet search engines, which contain users’ information on the concern or reality of their health status, provide a new opportunity for AIDS surveillance. This paper uses search engine data to monitor and forecast AIDS in China. Methods: A machine learning method, artificial neural networks (ANNs), is used to forecast AIDS occurrences and deaths. Search trend data related to AIDS from the largest Chinese search engine,, are collected and selected as the input variables of ANNs, and officially reported actual AIDS occurrences and deaths are used for the output variable. Three criteria, the mean absolute percentage error, the root mean squared percentage error, and the index of agreement, are used to test the forecasting performance of the ANN method. Results: Based on the monthly time-series data from January 2011 to June 2017, this article finds that, under three criteria, the ANN method can lead to satisfactory forecasting of AIDS occurrences and deaths, regardless of the change of the number of search queries. Conclusions: Internet-based data should be adopted as a real-time, cost-effective complement to a traditional AIDS surveillance system.

Usage Notes