Assistant professor’s two NSF grants aim to better sort social media content, identify online trolls
Projects totalling more than $737,000 to fund Blackburn's continuing research about bad actors on the internet.
The discussions happening on social media, both healthy and unhealthy, drive a lot of the public discourse and news coverage in our 21st-century world. Some people use platforms such as Facebook, Twitter and Reddit to make positive connections, but others prefer to sow misinformation and hate.
Given the popularity of those platforms and similar ones, which see millions of posts each day, it can be difficult for researchers to wrap their heads around what is being shared and how it affects our opinions on political and social topics.
Assistant Professor Jeremy Blackburn — a faculty member in the Department of Computer Science at Binghamton University’s Thomas J. Watson College of Engineering and Applied Science since 2019 — is devising ways to make online content easier to gather and sort, particularly from emerging social media platforms.
Blackburn recently received a five-year, $517,484 National Science Foundation CAREER Award for his project “Towards a Data-Driven Understanding of Online Sentiment.” The CAREER Award supports early-career faculty who have the potential to serve as academic role models in research and education.
The project includes four objectives:
- Create a multiplatform social media dataset, developing tools to leverage prior experience in large-scale data collection to perform continuous identification and collection of multimedia data from emerging social media platforms.
- Develop data-driven techniques to understand coded language used in social media, both text and images.
- Develop a new system for rating the sentiment of content by comparing pieces rather than looking at them individually.
- Explore user and community-level modeling of online sentiment.
“A big focus is on images,” Blackburn said. “Can we infer the sentiment or the underlying meaning of an image? Images are used almost as much as text on the internet, and it’s hard to figure out what people are talking about if you can’t understand the visual language they’re using.”
Current algorithms classify the sentiment of an image by assessing it and assigning it an independent score, he said. For instance, one tweet may get a 0.4 on a predetermined “happiness scale,” while another one may get a 0.5 — but what does that incremental difference mean for humans?
Instead, by showing two pieces of content and asking which is more positive, Blackburn hopes to get a better gauge of the emotion behind it. Complicating that endeavor, however, is knowing how images become memes among certain subsets of online commenters.
“We’re not interested in just saying what’s in the image — we’re interested in saying how it’s being used,” he said. “We’re going from the adage of ‘a picture is worth 1,000 words’ and treat it as a piece of vocabulary. We have ways that can capture the look of it, but we’re also going to treat it like a word as we do in a language model and place it where it was used.
“For instance, if you tweet a picture, you may also include some words, and if we have enough of those samples, we can now figure out that someone is upset or sad or whatever the underlying meaning is. We can translate it into regular words.”
Although the development of this new technology to monitor online sentiment could have many uses, such as the political and business realms, Blackburn has a specific goal that he hopes to achieve.
“We could better understand violent content or hate speech online that is very coded, or we could identify misinformation so that people can’t hide this type of behavior by using just images,” he said. “That’s my personal passion and the reason why I’m developing it.”
Another recently awarded NSF project takes aim at better detecting so-called “troll” accounts that disseminate false information as part of larger influence campaigns on social media.
The two-year, $220,000 grant — a collaboration with Assistant Professor Gianluca Stringhini from Boston University — will collect information about the troll accounts identified by Twitter and Reddit as belonging to disinformation campaigns spearheaded by countries that are U.S. adversaries.
These malicious users are different from “bot” accounts that automatically post the same message in multiple places. They are coordinated to interact with each other and take multiple sides of the same argument just to sow discord among anyone watching.
One example, Blackburn said, is two troll accounts “arguing” about “Black Lives Matter” versus “All Lives Matter” not as a matter of principle but merely to spark drama among other users.
“Over time, the same troll account may take different positions on the same issue, because ultimately they don’t have a particular opinion — they just want to cause trouble,” he said. “They have to convince people to become engaged.
The data collected for this project will be used to train machine-learning algorithms to identify troll accounts by codifying patterns of interactions that are uncommon in real accounts. Social media platforms then would be able to shut down the trolling without needing someone to moderate every questionable post.