Stock performance prediction prototype shows 62% accuracy using NLP, Deep Learning

December 11, 2017

Accurately predicting stock performance involves acquiring highly coveted data, and a new prototype using Natural Language Processing (NLP) and Deep Learning is showing very promising results.

“For investment firms, predicting likely under-performers may be the most valuable prediction of all, allowing them to avoid losses on investments that will not fare well,” writes Patty Ryan, Principal Data and Applied Scientist at Microsoft.

Patty Ryan

By partnering with a financial services company “to develop a model to predict the future stock market performance of public companies in categories where they invest,” the team at Microsoft modeled its prototype on just one industry, the biotechnology industry, which had the most abundant within-industry sample.

The project goal was to discern whether they could outperform the chance accuracy of 33.33%, but the results went way beyond that 33.33%.

What they found was a 62% accuracy for predicting the under-performing company, almost double what the chance accuracy was.

But how did they do it? Therein lies the question, but the answer may be found among industry buzzwords and a whole lot of code.

If you are a developer or familiar with these tools and concepts, Patty Ryan did a remarkable job walking you through the entire process, step-by-meticulous-step on the Microsoft Developer blog.

However, I will attempt to summarize and layout how they did it here.

The stock perfomance prediction prototype technical aspects

Natural Language Processing (NLP), pre-processing, and Deep Learning were utilized in order “to prototype a predictive model to render consistent judgments on a company’s future prospects, based on the written textual sections of public earnings releases extracted from 10k releases and actual stock market performance.”

For input the team then proceeded to gather “a text corpus of two years of earnings release information for thousands of public companies worldwide.” They “extracted as source the sections 1, 1A, 7 and 7A from each company’s 10k — the business discussion, management overview, and disclosure of risks and market risks.”

Additionally, they “gathered the stock price of each of the companies on the day of the earnings release and the stock price four weeks later,” categorizing the public companies by industry category.

The tools used included Python with Azure Machine Learning Workbench, Jupyter Notebook, and NLP tools including the Gensim library.

Executive Producer Dave Mendlen (R) on the set of Decoded with Host John Shewchuk (L).

Machine Learning, according to Microsoft General Manager and Executive Producer of the Decoded Show, Dave Mendlen, is something that happens “on the server side” to “bring the best information forward.”

“Let’s take this technology and enable it to learn on its own,” says Mendlen on Machine Learning, adding, “and put that in the back-end for developers to take use of. If you are building an application, you can use that to do amazing things. They tend to be things that I’ll call back-end things or processing things that the user doesn’t necessarily see directly.”

Overcoming adversity

One of the difficulties that arose in the stock performance prediction prototype was that the “pre-trained word vectors” they used as a model had a limited vocabulary of some 400,000 words. Many industries have specific vocabulary that is not used outside their particular niche. However, the “GloVe pre-trained model of all of Wikipedia’s 2014 data” did prove useful in allowing the team to “vectorize” its document set and prepare it for deep learning toolkits.

After embedding all the documents and data, they were then “able to take advantage of a convolutional neural network (CNN) model to learn the classifications.”

More number crunching, embedding, and model training pursued, and in the end, the “prototype model results, while modest, suggest there is a useful signal available on future performance classification in at least the biotechnology industry based on the target text from the 10-K.”

The future looks promising

Ryan says that “while the model needs to be improved with more samples, refinements of domain-specific vocabulary, and text augmentation, it suggests that providing this signal as another decision input for investment analyst would improve the efficiency of the firm’s analysis work.”

“Overall, this prototype validated additional investment by our partner in natural language based deep learning to improve efficiency, consistency, and effectiveness of human reviews of textual reports and information.”

So, the initial chance of stock performance prediction was at 33.33% before the project began, and that was raised to 62% accuracy through NLP, Deep Learning, Convolutional Neural Networking, and a host of developer tools in tow.

Tim Hinchliffe

The Sociable editor Tim Hinchliffe covers tech and society, with perspectives on public and private policies proposed by governments, unelected globalists, think tanks, big tech companies, defense departments, and intelligence agencies. Previously, Tim was a reporter for the Ghanaian Chronicle in West Africa and an editor at Colombia Reports in South America. These days, he is only responsible for articles he writes and publishes in his own name. [email protected]
VIEW ALL POSTS

< Next Post

Coding Autism: the startup empowering autistic adults in the tech industry

Previous Post >

VR is shifting how we experience anxiety, surgery, pain, and games

Big Tech Government and Policy Technology

‘World Models’ are needed to train AI robots in 3D: WEF ‘Summer Davos’

World Models to train AI robots cracks the WEF Top 10 Emerging Technologies at this year's Annual...

June 24, 2026 Tim Hinchliffe

Technology

Dosty Walks launches in Azerbaijan, aims to donate 7,000kg of food to shelters

The pet tech app has 1,600 people walking to feed shelter animals weeks after launch Of the...

June 23, 2026 Salome Beyer Velez

Technology

Straive acquires NextGen Invent to expand AI transformation capabilities for enterprises

With AI spending expected to rise from $340 billion in 2025 to around $3 trillion by 2035, there...

June 19, 2026 Elena Rodríguez

Sociable's Podcast

Brains Byte Back

Brains Byte Back interviews startups, entrepreneurs, and industry leaders that tap into how our brains work. We explore how knowledge & technology intersect to build a better, more sustainable future for humanity. If you’re interested in ideas that push the needle, and future-proofing yourself for the new information age, join us every Friday. Brains Byte Back guests include founders, CEOs, and other influential individuals making a big difference in society, with past guest speakers such as New York Times journalists, MIT Professors, and C-suite executives of Fortune 500 companies.

88% of companies are deploying AI this year. Only 1 in 20 will get real value out of it. A new role is being created inside the companies actually getting it right — and it doesn't require a computer science degree.

Most companies are buying AI tools before they've figured out what problem they're trying to solve. That's a big reason only 1 in 20 enterprise AI projects actually deliver measurable value — and why the other 95% end in millions of wasted spend, stalled rollouts, and in some cases, real damage.

A new role is emerging to sit in front of all of that. Someone who walks into a company, figures out where AI actually belongs, where it doesn't, and what guardrails it needs once it's running. In this episode of Brains Byte Back, host Erick Espinosa sits down with two of the first people holding that title — Luis Escalante, AI Delivery Manager at Gorilla Logic, and Siddardha Vangala, Senior AI Applications Developer at MasTec Advanced Technologies.

They explain what the job actually is, what it isn't, and why the people most qualified for it often don't realize they already have the skills.

If you've been watching the AI boom from the outside, wondering where you fit, this episode is the answer.

Reach out to today's host, Erick Espinosa – [email protected]

Get the latest on tech news – https://sociable.co/

Leave an iTunes review – https://rb.gy/ampk26

Search Episodes

Why Every Company Is About to Hire an AI Manager (No Coding Required)

May 15, 2026

The Sociable

You Made the Song. Now What? How Neural Frames Is Giving Independent Musicians a Visual Voice

April 29, 2026

The Sociable