Technology

Intel agency awards contract to company that harvests social media text, data

IARPA awards contracts for extracting data from multilingual text documents

IARPA awards a research contract for extracting data from text to Raytheon BBN, which harvests the text of social media postings and other data.

“An opportunity to develop better methods to extract complex semantic information from documents”

The Intelligence Advanced Research Projects Activity (IARPA) has awarded contracts for its Better Extraction from Text Towards Enhanced Retrieval (BETTER) program.

Among those rewarded contracts last week were several universities including Johns Hopkins University, Brown University, and the University of Southern California Information Sciences Institute, but also, Raytheon BBN.

Raytheon has been a valuable asset to the intelligence community for many, many years, and it is a company that has its hands in just about every technology suited for “defense, civil government and cybersecurity solutions.”

Through its multimedia monitoring system (M3S), Raytheon scours the internet to make sense of data in multiple languages, so it can be easily digested by analysts.

M3S not only harvests the text of social media postings, but also gathers metadata such as geotags, hashtags, user references, URLs, topic threading information and retweet information.”

Raytheon BBN M3S screenshot

“Using this information, the system can accurately reconstruct interactions across networks, setting the stage for social media analytics,” according to the Raytheon website.

One of Raytheon’s specialties is in collecting data from social media platforms such as Facebook, YouTube, and Twitter.

Some of Raytheon’s M3S services include:

  • Automatic multi-lingual data collection and mirroring of user-identified Web sites, broadcast media, and social media (Twitter and Facebook)
  • Automatic extraction and translation of text
  • Search across multi-lingual sites, channels, and posts
  • Visualization tools and automatic topic detection for enhanced analysis
  • Collected media archived for later use
  • Story segmentation of broadcast media
  • Designed for inter-agency data sharing
  • Effective retrieval and triage for human analysts who must deal with overwhelming volumes of continuously accumulating media.

IARPA finds these qualities useful in its mission “to tackle some of the most difficult challenges of the agencies and disciplines in the intelligence community.”

“There’s too much for any one individual person to read or any team of people to read on a given day”

What Raytheon’s M3S does is almost verbatim what IARPA said it was looking for when it held a proposers’ day on March 29, 2018.

The goal of the BETTER program is “to provide a user [analyst] with a system that quickly and accurately extracts complex semantic information, targeted for a specific user, from text. The system then uses this extracted information to discover and triage relevant documents from a large corpus.”

As I see it, IARPA wants to:

  1. Extract data from mountains of text written in multiple languages (sources could be from news media, social media, websites, etc.)
  2. Organize and personalize that data, so it is catered to specific needs of the analyst
  3. Recommend other documents to analyze that are relevant to the analyst’s interests

At the proposer’s day (above video) IARPA program manager Dr. John Beieler explained the problem the intelligence community was looking to solve.

“#SpoilerAlert! We’re re-ranking document lists”

John Beieler

“There’s constantly more text that’s being generated. There’s too much for any one individual person to read or probably any team of people to read on a given day.”

“A lot of times this stuff comes across in multiple languages. If you’re trying to look at all the news that was published across the world yesterday — multiple languages — too much.

“Then this concept of finding orthogonal information when you’re trying to discover something, it’s kind of like looking for your keys on a dark night under the streetlight.

“You’re kind of looking in this one place, but really they’re probably somewhere else.”

The solution, according to Dr. Beieler, is an automated process that analyzes a ton of text and makes recommendations on where to look next.

“Hashtag ‘Spoiler Alert!’ we’re re-ranking document lists […] We’re re-ranking documents based on relevancy.”

“What we’re really trying to do is smash together information extraction, information retrieval, and active learning”

The BETTER program will look to exploit machine learning technology and because of this foreign language expertise will be disallowed.

“How we envision the technology in this program is to provide automated suggestions and provide the most relevant documents — a kind of a recommender system (i.e. ‘you read this, maybe you want to read this’) and then automatically tag information according to that individual analyst’s knowledge,” the BETTER program manager said.

He added, “What we’re really trying to do is smash together information extraction, information retrieval, and active learning to enable this downstream use of extraction and retrieval of fine-grain personalized knowledge.

“We’re doing things across languages and we’re doing things for an individual rather than these kind of overarching ‘one ontology to rule them all’ approaches.

According to Beieler, IARPA’s BETTER program “gives us an opportunity to develop better methods to extract complex semantic information from documents and then to use this extracted semantic information to do basically document triage.”

Tim Hinchliffe

The Sociable editor Tim Hinchliffe covers tech and society, with perspectives on public and private policies proposed by governments, unelected globalists, think tanks, big tech companies, defense departments, and intelligence agencies. Previously, Tim was a reporter for the Ghanaian Chronicle in West Africa and an editor at Colombia Reports in South America. These days, he is only responsible for articles he writes and publishes in his own name. tim@sociable.co

View Comments

Recent Posts

How a former Wall Street exec is saving your plants and the planet 

Jeanna Liu’s love for nature is rooted in her childhood. As a young girl, Liu…

2 days ago

New initiative announced to accelerate cloud, GenAI adoption in Latin America

The arrival of generative artificial intelligence (genAI) into the mainstream at the end of 2022…

2 days ago

Deborah Leff to join Horasis Advisory Board in boost to machine learning and data initiatives 

Data analytics and machine learning models deliver the most powerful results when they have access…

2 days ago

37, Emotionally Stuck, and Why the Journey Didn’t Change Me

I’ve been on the road for almost a year now. Chasing freedom, adventure, and purpose.…

4 days ago

Will iPhones Get Pricier Under Trump’s Leadership?

As technological use increases, so may the cost of innovation due to the global movement…

4 days ago

The Science of Gift-Giving: 10 Functional Gifts for the Holidays

Have you ever asked yourself why some people are amazing at picking gifts, while others…

4 days ago