The text accumulating in social media is a massive source of rich data. Consider, for example, the many millions of Facebook users, regularly expressing emotions and attitudes with updates to their status and exchanging messages with friends. This data represents a potential to increase psychological research substantially. Social media language is an especially good match for psychological study for several reasons; social media contains an unprecedented amount of publicly available written language, material that was written in natural social settings and can be retroactively accessed. In addition, users disclose information about themselves at unusually high rates. If researchers can develop models from the data on social media, this information could be leveraged to create a fast, valid and stable method for personality assessment online.
Park et al. approached this study with the goal of building a model for accurately predicting personality using the written language data from social media. This primary research study used participants drawn from users of myPersonality, a third-party application on the Facebook social network.
Using the large sample size of over 66,000 participants, the authors built a language model using an open-vocabulary approach to language analysis. Open-vocabulary methods extract numerous and rich features from language samples, including single words or topics. The authors built a predictive model of personality within a sample of Facebook users, each of whom volunteered samples of their language and completed personality tests. The results of the language-based assessment (LBA) were compared with self-reporting questionnaires, informant reports and external criteria, such as information from online profiles. This method improved the systems accuracy over other language-based predictive models.
The language from social media can be used to correctly assess participants` key personality traits. LBAs are capable of capturing true personality variance and suggests that the language in social media can be harnessed to create a valid and reliable measure of personality. As well, compared with self-report questions, LBAs are fast, cheap and have low levels of participant burden. Once the model is created, the application of the model to a new user’s language data only takes seconds. Using computational techniques for LBA can reveal new layers of psychological richness in language and avoid the inherent biases in selfreports.
Language-based assessments offer a cost-effective alternative to questionnaires and self-reports, allowing assessment of psychological characteristics when other options are impractical. LBAs may also enable new approaches to studying geographic and temporal trends, comparing regional psychological differences and trends, and within-person variation over time and across locations. As LBAs can be generated retroactively, this approach can give researchers, investigators and situational awareness analysts the ability to provide a clearer portrait of the mentality behind online actions.
Language-based assessments using text from social media are capable of representing personality variance.