Ask HN: Open source OCR library?


As others pointed out, Tesseract with OpenCV (for identifying and cropping the text region) is quite effective. On top of that, Tesseract is fully trainable with custom fonts.

In our use case, we’ve mostly had to deal with handwritten text and that’s where none of them really did well. Your next best bet would be to use HoG(Histogram of oriented gradients) along with SVMs. OpenCV has really good implementations of both.

Even then, we’ve had to write extra heuristics to disambiguate between 2 and z and s and 5 etc. That was too much work and a lot of if-else. We’re currently putting in our efforts on CNNs(Convolutional Neural Networks). As a start, you can look at Torch or Caffe.



Tesseract is ok, but I gather that a lot of the good work in the last few years on it has remained closed source within Google.

If you want to do text extraction, look at things like Stroke Width Transform to extract regions of text before passing them to Tesseract.



I’ve used tesseract to great affect. I don’t know how your images are but if only part of the image has text in it, you should only send that part to the OCR engine. If you send the entire image and only a portion of it has text in it, chances of the OCR extracting text are slim. There are pre-processing techniques [1] you can use to crop out the part of the image that has text

[1]: https://en.wikipedia.org/?title=Hough_transform



Tesseract does no layout analysis.

So if the source image contains text columns or pull quotes or similar, the output text will just be each row of text, from the far left to the far right.



Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/FHpfbOnXyW4/item

Original article

Now They’ve Gone And Stuck Android Onto A Graphing Calculator

nexus2cee_CIAONytUEAAMPnl_thumb Today in our ongoing series of people putting one thing into another thing, we present Android running on a Texas Instruments TI-Nspire CX, a robust graphing calculator popular with the pre-calc set. The calculator has about 100MB of storage and 64MB of RAM but has enough power to run Android 1.6 aka Donut. Obviously you’re not going to make very many calls using your graphing calculator… Read More


Original URL: http://feedproxy.google.com/~r/Techcrunch/~3/5rM77Do2Dto/

Original article

CellBreaker Finds Breaches in Your Contract to Help You Switch Carriers

Once you sign a two year contract with a cell carrier, you’re stuck with them, right? Not necessarily. If the carrier violates your contract with significant changes to your service, you may be able to break out early and switch. CellBreaker helps you find out if that’s happened to you.

Read more…





Original URL: http://feeds.gawker.com/~r/lifehacker/full/~3/h0VSf4eIiWE/cellbreaker-finds-breaches-in-your-contract-to-help-you-1713844864

Original article

Dropbox could be king of the one-page app

In 2013 when we were getting our browser-based outliner ready, Les Orchard, a longtime reader of this blog, and contributor to our community (he wrote the initial S3 glue for Frontier, a huge gift), suggested we look at using Dropbox as our storage system.

I was already a serious Dropbox user, and loved how it virtualized my file system. Using Dropbox meant I could go anywhere, with a laptop, and have access to my full work environment. This was part of the dream of using networks since I started using them in the 70s. Dropbox was a big piece of the puzzle.

But Les had shown me how Dropbox could be even more.

Fargo, my Dropbox-based writing environment

We hooked our outliner up to their file system, and shipped it. That’s Fargo.

I’m using Fargo to write this. Scripting News, my blog, is a Fargo site.

Later, I put a content management system in Fargo, so you could now publish a website without any extra server software. It still amazes me that this experiment worked.

Now I read articles that Dropbox is facing increased competition from Microsoft and Google. They need something extra, something different from the Office suites both companies offer. Imho it should be of the web, using the most modern approach to development, the single-page JavaScript app.

Developers, developers, developers

I think independent developers have the key to giving them a competitive edge.

There’s a universe of possible one-page apps and a vast sea of developer creativity to tap into. They just have to help create the market, a little more than they already have.


Original URL: http://scripting.com/2015/06/25/dropboxCouldBeKingOfTheOnepageApp.html

Original article

LittleBits Raises Big $44.2 Million Round

LittleBits smart home kit LittleBits, a 3.5-year-old, New York-based maker of electronic components that both children and designers can snap together to create everything from toy robots to lightweight industrial products, has raised $44.2 million in Series B funding led by DFJ Growth, with managing director Barry Schuler joining the board.
Morgan Stanley, Alternative Investment Partners, Grishin Robotics and Wamda… Read More


Original URL: http://feedproxy.google.com/~r/Techcrunch/~3/oRG64OC5JYc/

Original article

Warner Bros. Halts Sales of AAA Batman PC Game Over Technical Problems

An anonymous reader writes: The Batman: Arkham series of video games has been quite popular over the past several years. But when the most recent iteration, Batman: Arkham Knight, was released a couple days ago, users who bought the PC version of the game found it suffered from crippling performance issues. Now, publisher Warner Bros. made an official statement in the community forums saying they were discontinuing sales of the PC version until quality issues can be sorted out. Gamers and journalists are using it as a rallying point to encourage people to stop preordering games, as it rewards studios for releasing broken content.


Share on Google+

Read more of this story at Slashdot.


Original URL: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/8FVKZa-XpUw/warner-bros-halts-sales-of-aaa-batman-pc-game-over-technical-problems

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: