" />
Technology

Stock performance prediction prototype shows 62% accuracy using NLP, Deep Learning

Stock performance prediction prototype shows 62% accuracy using NLP, Deep Learning

Accurately predicting stock performance involves acquiring highly coveted data, and a new prototype using Natural Language Processing (NLP) and Deep Learning is showing very promising results.

“For investment firms, predicting likely under-performers may be the most valuable prediction of all, allowing them to avoid losses on investments that will not fare well,” writes Patty Ryan, Principal Data and Applied Scientist at Microsoft.

stock performance prediction

Patty Ryan

By partnering with a financial services company “to develop a model to predict the future stock market performance of public companies in categories where they invest,” the team at Microsoft modeled its prototype on just one industry, the biotechnology industry, which had the most abundant within-industry sample.

The project goal was to discern whether they could outperform the chance accuracy of 33.33%, but the results went way beyond that 33.33%.

What they found was a 62% accuracy for predicting the under-performing company, almost double what the chance accuracy was.

But how did they do it? Therein lies the question, but the answer may be found among industry buzzwords and a whole lot of code.

If you are a developer or familiar with these tools and concepts, Patty Ryan did a remarkable job walking you through the entire process, step-by-meticulous-step on the Microsoft Developer blog.

However, I will attempt to summarize and layout how they did it here.

The stock perfomance prediction prototype technical aspects

Natural Language Processing (NLP), pre-processing, and Deep Learning were utilized in order “to prototype a predictive model to render consistent judgments on a company’s future prospects, based on the written textual sections of public earnings releases extracted from 10k releases and actual stock market performance.”

For input the team then proceeded to gather “a text corpus of two years of earnings release information for thousands of public companies worldwide.” They “extracted as source the sections 1, 1A, 7 and 7A from each company’s 10k — the business discussion, management overview, and disclosure of risks and market risks.”

Additionally, they “gathered the stock price of each of the companies on the day of the earnings release and the stock price four weeks later,” categorizing the public companies by industry category.

The tools used included Python with Azure Machine Learning Workbench, Jupyter Notebook, and NLP tools including the Gensim library.

stock performance prediction

Executive Producer Dave Mendlen (R) on the set of Decoded with Host John Shewchuk (L).

Machine Learning, according to Microsoft General Manager and Executive Producer of the Decoded Show, Dave Mendlen, is something that happens “on the server side” to “bring the best information forward.”

“Let’s take this technology and enable it to learn on its own,” says Mendlen on Machine Learning, adding, “and put that in the back-end for developers to take use of. If you are building an application, you can use that to do amazing things. They tend to be things that I’ll call back-end things or processing things that the user doesn’t necessarily see directly.”

Overcoming adversity

One of the difficulties that arose in the stock performance prediction prototype was that the “pre-trained word vectors” they used as a model had a limited vocabulary of some 400,000 words. Many industries have specific vocabulary that is not used outside their particular niche. However, the “GloVe pre-trained model of all of Wikipedia’s 2014 data” did prove useful in allowing the team to “vectorize” its document set and prepare it for deep learning toolkits.

After embedding all the documents and data, they were then “able to take advantage of a convolutional neural network (CNN) model to learn the classifications.”

More number crunching, embedding, and model training pursued, and in the end, the “prototype model results, while modest, suggest there is a useful signal available on future performance classification in at least the biotechnology industry based on the target text from the 10-K.”

The future looks promising

Ryan says that “while the model needs to be improved with more samples, refinements of domain-specific vocabulary, and text augmentation, it suggests that providing this signal as another decision input for investment analyst would improve the efficiency of the firm’s analysis work.”

“Overall, this prototype validated additional investment by our partner in natural language based deep learning to improve efficiency, consistency, and effectiveness of human reviews of textual reports and information.”

So, the initial chance of stock performance prediction was at 33.33% before the project began, and that was raised to 62% accuracy through NLP, Deep Learning, Convolutional Neural Networking, and a host of developer tools in tow.

View Comments (1)

1 Comment

  1. bullet force

    December 23, 2017 at 2:59 AM

    The future looks promising!

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology
@TimHinchliffe

Tim Hinchliffe is a veteran journalist whose passions include writing about how technology impacts society and Artificial Intelligence. He prefers writing in-depth, interesting features that people actually want to read. Previously, he worked as a reporter for the Ghanaian Chronicle in West Africa, and Colombia Reports in South America. tim@sociable.co

More in Technology

Latin America’s got talent, says global venture builder

Peter AndringaJanuary 17, 2018

Funding for Cybeats’ cybersecurity comes at critical moment in cybercrime fight

Peter AndringaJanuary 16, 2018

Invite and Meet, The Unique Activity-based Dating App: Interview

Sam Brake GuiaJanuary 16, 2018

Biometric smart cards vs digital wallets: payments may depend on the continent you call home

Peter AndringaJanuary 15, 2018
blockchain cricket chicken

Chinese blockchain chickens and a cricket farm to feed the world

Sam Brake GuiaJanuary 15, 2018
mindfulness, apps, spirituality, silicon valley

On being mindful about multi-million mindfulness industry

Ben AllenJanuary 12, 2018

More than swiping right: How these 36 questions can result in love between strangers

Sam Brake GuiaJanuary 11, 2018
slaughterbots

Slaughterbots fresh out of a Black Mirror nightmare and other threats for 2018

Sam Brake GuiaJanuary 11, 2018

New tech takes aim at drunk driving

Peter AndringaJanuary 10, 2018