Using machine learning to index text from billions of images

In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.
The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous. People have stored more than 20 billion image and PDF files in Dropbox. Of those files, 10-20% are photos of documents—like recepits and whiteboard images—as opposed to documents themselves. These are now candidates for automatic image text recognition. Similarly, 25% of these PDFs are scans of documents that are also candidates for automatic text recognition.
From a computer vision perspective,


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/WMLxjuzY3K8/

Original article

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: