An exclusive gaming industry community targeted
to, and designed for Professionals, Businesses
and Students in the sectors and industries
of Gaming, New Media and the Web, all closely
related with it's Business and Industry.
A Rich content driven service including articles,
contributed discussion, news, reviews, networking, downloads,
and debate.
We strive to cater for cultural influencers,
technology decision makers, early adopters and business leaders in the gaming industry.
A medium to share your or contribute your ideas,
experiences, questions and point of view or network
with other colleagues here at iVirtua Community.
OCR (Optical Character Recognition) can really come in handy. For example, I previously wrote about how I use Timesnapper as a black box torecover work which would otherwise be lost. Since most of my work istext based (C#, SQL, HTML, documentation, communications, etc.), theobvious next step is to grab the code from a screenshot. Of course Ican retype it, but OCR would be better.
There are some greatcommercial OCR packages out there. My company recently used OmniPagePro in a project which loaded data from hundreds of PowerPointslides into SQL Server for reporting and analysis1. OmniPage isgreat software, but it costs $149 for the basic version, which doesn'treally make sense if you're just using it to avoid retyping a littletext from a screenshot every now and then.
I looked around forfree OCR software, and was a little bit surprised that there wasn'tmuch out there. Here's a rundown of what I found, wrapping up with aprogram that wasn't technically free, but I already had it. There's agood chance you've got it, too.
GOCR
I first tried out GOCR (a.k.a. JOCR). The easiest way to try it out is the GOCR Win Frontend, which installs GOCR as well. My opinion matched Pitor's:
Tolet things be clear - gocr is not ready, to say the least. PersonallyI'd even say the effect of trying to OCR a page is so crappy it is noteven worth installing the gocr engine (seems like the total rewrite in0.40 did not help much). And I am talking about an ascii black text ona white page, without other elements. Gocr seems to go all the way downhere - error in 98% of recognized characters, randomly added spaces,etc. For example: content is C unrir in gocr, sounds like drunken elvish to me.
It only is configured to build under MSVC++6 for Windows.
It only accepts uncompressed bitonal tiffs.
It's command-line only.
No GUI.
It performed abysmally on the provided testimage.tif
But it did build.
Microsoft Office Document Imaging
On accident, I stumbled across Microsoft Office Document Imaging.It's included Microsoft Office Tools ("Microsoft Office \ MicrosoftOffice Tools" folder in the start menu, default installation locationis "C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\"). Theinterface looks a "My First VB5 Application" reject, but it works great.
Ithandles scanned documents via TWAIN. The image import's a bit lame - itonly handles TIF files. You can convert to TIF in just about anygraphics application (e.g. MSPAINT - open the file, Save As TIF file).An easier method is to just copy the image to the clipboard and pasteas a new page into MODI.