Using Twitter to Predict Heart Attack Deaths

Here’s a question for anyone who is active on Twitter, the popular microblogging website that allows users to share brief messages (called Tweets) with friends and followers: How would you characterize the tone of others in your social network, particularly those who live in your surrounding area?

Based on the Tweets you see in your feed, would you say people in your community are generally happy, depressed, anxious, angry, optimistic?

I ask because, as it turns out, Twitter reveals important information about a community’s psychological and physical health.

In a fascinating paper published earlier this year in the journal Psychological Science, researchers at the University of Pennsylvania analyzed 148 million tweets from 1,347 U.S. counties and discovered that online expressions of negative emotions, such as anger, and disengagement constitute a significant risk factor for death due to atherosclerotic heart disease (AHD), the leading cause of heart attacks and stroke.

Specifically, after controlling for socioeconomic factors (e.g., income and education), the researchers found that tweets expressing anger and other negative emotions generally come from counties that have higher AHD mortality rates. Alternatively, tweets reflecting engagement and positive emotions generally come from counties that have lower AHD mortality rates.1

More surprising was that a predictive model of AHD mortality based solely on Twitter language patterns outperformed a model based on traditional risk factors, such as demographic variables (e.g., percentages of Black, Hispanic, married, and female residents), socioeconomic variables (e.g., income and education), and health variables (e.g., incidence of diabetes, obesity, smoking, and hypertension).2

This means that, at the county level, mortality rates from heart disease could actually be better predicted from language on Twitter than from information about demographics and individuals’ health!

(Of course, AHD mortality was best predicted by a statistical model that included Twitter language patterns as well as traditional risk factors).

Figure 1 below shows a side-by-side comparison of actual AHD mortality (as reported by the Center for Disease Control and Prevention; CDC) vs. AHD mortality as predicted by Twitter usage. The similarity between the two is obvious and striking.

Figure 1: Heart disease mortality rates as reported by the CDC (left) vs. heart disease mortality rates predicted by language on Twitter (right).

twitter and heart disease2

Image credit: Eichstaedt et al. (2015).

Now at this point you might be wondering, what about the fact that the average Twitter user is much younger than the average person at risk for heart disease?

Of course, this is true, and as the researchers acknowledge, “the people tweeting are not the people dying.”

Furthermore, the researchers make no claim about a cause-effect relationship between tweets and AHD mortality. As you can imagine, it’s highly unlikely that tweets from younger people cause older adults to suffer and die from heart attacks. So rest assured that you’re probably not going to kill anyone the next time you log on to Twitter to vent, although there are other reasons why venting online might not be a good idea.

Rather, the researchers speculate that tweets from younger individuals likely betray important characteristics of their surrounding community – and therefore reflect a sort of shared economic, physical, and psychological environment not already accounted for by traditional risk factors for heart disease.

Essentially, this means your behavior on Twitter provides a window not only into your unique daily hassles and stressors, but also the daily hassles and stressors you share with others in your community. It’s just that others in your community might not share their anger or anxiety about a particular circumstance on social media.

So, the next time you log onto a social network to vent about life, stop for a moment to think about where your negative language comes from and what it reflects. Does it reflect something that you are alone in facing at this particular moment in time? Or does it reflect a more general challenge you share with others in your community?

If misery loves company, as they say, then taking a moment to entertain the possibility that your sour mood stems from experiences shared by many others in your community might just be enough to brighten your day, as well as your friends’ Twitter feeds.

Ultimately though, the major value of this work is to bring a new set of twenty-first century tools to epidemiological research. As the authors conclude:

Traditional approaches for collecting psychosocial data from large representative samples, such as the Behavioral Risk Factor Surveillance System of the CDC and Gallup polls, tend to be expensive, are based on only thousands of people, and are often limited to a minimal, predefined list of psychological constructs. A Twitter-based system to track psychosocial variables is relatively inexpensive and can potentially generate estimates based on 10s of millions of people with much higher resolution in time and space.

They go on to say:

Our approach opens the door to a new generation of psychological informational epidemiology (Eysenbach, 2009; Labarthe, 2010) and could bring researchers closer to understanding the community-level psychological factors that are important for the cardiovascular health of communities.

One can only hope that this new generation of psychological informational epidemiology will eventually lead to improved community-wide efforts to target at-risk populations, educate citizens, and prevent future illness.

 

Notes:

1 Interestingly, use of the word love was identified as a risk factor for AHD mortality, as Tweets containing love were associated with higher AHD mortality rates. After reading through a sample of tweets, the researchers determined that this was likely because most tweets containing the word love were statements about loving things, rather than people. Excluding love from the analysis reduced the correlation between positive-relationship words and AHD mortality to a non-significant level.

2 Albeit slightly and just barely at the conventional level of statistical significance

 

Article Reference:

Eichstaedt, J.C., Schwartz, H.A., Kern, M.L., Park, G., Labarthe, D.R. Merchant, R.M., Jha, S., Agrawal, M., Dziurzynski, L.A., Sap, M., Weeg, C., Larson, E.E., Ungar, L.H., & Seligman, M.E.P. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2) 159 – 169. DOI: 10.1177/0956797614557867

 

Brian Kurilla is a psychological scientist with a Ph.D. in cognitive psychology. You can follow Brian on Twitter @briankurilla 

4 thoughts on “Using Twitter to Predict Heart Attack Deaths

  1. The anger also could reflect financial stress. I wonder if there is any way to obtain the data to ascertain if it was any kind of leading indicator for certain investments or financial markets.

    1. Possibly, and that would likely be something that correlates across individuals within a particular county. However, for the purposes of this study I don’t think it mattered much what was causing negative vs. positive emotions. But the journal Psychological Science has adopted several new policies to improve the integrity and transparency of psychological research. As such, all of the raw data from this study are available via Open Science Framework.

  2. Using Twitter as a window into a community’s collective mental state may provide a useful tool in epidemiology and for measuring the effectiveness of public-health interventions. Lyle Ungar, a professor of computer and information science.

Leave a Reply

Your email address will not be published. Required fields are marked *