March Product Update: Better, Faster Data Extraction

March 29, 2018 Geoffrey Gualano

Hubdoc March Product Update: Better, Faster Data Extraction

If you uploaded a bill, receipt, or invoice to Hubdoc in the last few weeks, you might have noticed that the data was accurately extracted from some of those documents in seconds. You may have even asked yourself, “What the heck is going on?!”

In this month’s product update, we’ll show you how we’re improving data extraction from uploaded documents – bringing you one step closer to “real-time” bookkeeping.

Better, faster data extraction from uploaded documents

Accurately extracting data from documents in seconds isn’t new to Hubdoc. We do it on tens of thousands of auto-fetched documents (from our 700+ automated connections) every day! What we’re introducing is better, faster data extraction on some documents that are uploaded to Hubdoc via desktop, mobile device, email, and the Fujitsu ScanSnap scanner.

Before we dive into the how, let’s take a moment to talk about the why.

Why is “better, faster data extraction” so important?

For accountants and bookkeepers, it means that bookkeeping gets done faster, resulting in time saved, improved client experiences, and growth opportunities.

For small businesses, it means access to financial data in real-time, resulting in higher quality, more contextual decision making.

It’s clear to see why access to real-time, accurate financial data is so important, but how are we making it happen?

Using machine learning to accurately extract data from uploaded documents, in real-time

If you guessed machine learning, you were right! ;)

We recently launched the latest version of our machine learning-powered data extraction. This version accurately extracts key data from uploaded documents in less than five seconds. It’s currently working on a small percentage of documents, and that percentage will increase dramatically over time.

“Machine learning is an application of artificial intelligence (AI) that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.”Expert System

How does Hubdoc’s machine learning-powered data extraction work?

When a document is uploaded to Hubdoc, it's sent to a neural network (which is a computer system modeled on the human brain and nervous system) that reads the document and determines what data needs to be extracted. If it is very confident that it extracted the right data, it will be available to Hubdoc customers within five seconds.

Accurate data extraction within five seconds! That’s really fast.

Hubdoc machine learning powered data extraction gif

Our machine learning-powered data extraction was trained by using tens of millions of financial documents. It analyzed each and every document to learn what data needed to be extracted. If, during this process, the wrong data was extracted from a document, our data scientists mathematically calculated how far off it was and made adjustments to the model. The next time the model attempted to extract data from an uploaded document, it got closer to the right answer. Adjustments are made to the model every time it extracts incorrect data and, as such, continues to improve over time.

When will Hubdoc’s new machine learning-powered data extraction work on every uploaded document?

Exclusively using machines to extract data from uploaded documents – without the use of human quality assurance – is really challenging. Companies that take this approach prioritize speed over accuracy, which can result in poor customer experiences.

Instead, we’re going to slowly roll out this version of machine learning-powered data extraction, until we’re confident that it can accurately extract data from every uploaded document. As the model gets better, it will be applied to more and more documents, improving your Hubdoc experience, and bringing you one step closer to “real-time” bookkeeping.

How do I stay up-to-date or provide feedback on Hubdoc’s new machine learning-powered data extraction?

We’ll use the monthly product update to keep you in the loop on any/all changes to our machine learning-powered data extraction. 

New Automated Connections

We added 11 new automated connections in March! Hubdoc can now auto-fetch documents from:

United States

  • Expedia
  • US Foods
  • Vistaprint


  • Sysco Canada Payment Hub


  • CommSec Adviser Services

United Kingdom

  • Ecenica
  • Fasthosts
  • Parcelforce
  • CloudConvert
  • NatWest Credit
  • British Gas

Featured this month

Did you know that Hubdoc can collect AR reports? Check the following connections:

  • Chase Paymentech
  • PayPal
  • MindBody

If you don't have a Hubdoc account and you want to see what all the fuss is about, book a demo with a member of our team!

About the Author

Geoffrey Gualano

Geoffrey Gualano is the Director of Marketing at Hubdoc. He has a passion for customers and bringing products to market. An ex-musician and aspiring chef, he spends his free time cooking and listening to music. <a href="">Follow his cloud bookkeeping journey in 'How I Learned Cloud Bookkeeping'!</a>

Follow on Twitter More Content by Geoffrey Gualano
Previous Article
The Next Chapter in the Hubdoc Story
The Next Chapter in the Hubdoc Story

Everything good in business and life is about people and shared values. That is why we are so excited to ge...

No More Articles