Tesseract OCR 4.0.0 release notes

This page keeps the most up-to-date release notes.

New OCR engine
Added a new OCR engine that uses neural network system based on LSTMs, with major accuracy gains.
This includes new training tools for the LSTM OCR engine. A new model can be trained from scratch or by fine tuning an existing model.
Added trained data that includes LSTM models to 123 languages.
Added optional accelerated code paths for the LSTM recognizer:
Using OpenMP
Using SIMD: AVX2 / AVX / SSE4.1

Added a new parameter lstm_choice_mode that allows to include alternative symbol choices in the hOCR output.
The new LSTM engine still does not support all features from the old legacy engine (see missing features).

Other OCR engines
The pattern matching OCR engine that was the primary OCR engine in previous versions is still available in this version.
Removed the ‘Cube’ OCR engine from the codebase. It was used for Hindi and for Arabic. The New LSTM engine performs much better,


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/JoBPaqsTWZk/ReleaseNotes

Original article

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: