Research using statistical NLP algorithm reveals Twitter users' age, gender & socioeconomic status

Computer Science Research Associates Dr Vasileios Lampos and Dr Nikolaos Aletra have collaborated with researchers from the University of Pennsylvania to publish a paper about Twitter - connecting a person’s words to age, gender, even socioeconomic status, and taking the next step to linking the online behavior of more than 5,000 Twitter users to their income bracket.

The team published their results in the journal PLOS ONE.

Some of the results validated what's already known, for instance, that a person's words can reveal age and gender, and that these are tied to income. But there were also some surprises; for example, those who earn more tend to express more fear and anger on Twitter, and perceived optimists have a lower mean income. 

Text from those in lower income brackets includes more swear words, whereas those in higher brackets more frequently discuss politics, corporations and the nonprofit world.

Nikolaos Aletras noted an overall picture that emerged about Twitter use:

'Lower-income users or those of a lower socioeconomic status use Twitter more as a communication means among themselves,' he said.

High-income people use it more to disseminate news, and they use it more professionally than personally. This work attempts to highlight some of the potential causal factors in these relationships.'

In order to determine Twitter users' socioeconomic status, the team used the UK's job code system that sorts occupation into nine classes. Using that hierarchy, the researchers determined average income for each code, then sought a representative sampling from each by analysing 5,191 Twitter users and more than 10 million tweets.

They created a statistical natural language processing algorithm that pulled in words that people in each code class use distinctly. Most people tend to use the same or similar words, so the algorithm's job was to 'understand' which were most predictive for each class.

Posted 30 Sep 15 10:19