SimpleOCR Info


SimpleOCR Info




Do you dread having to retype that document you are holding in your hand?  If only you had the electronic file, your life would be so much easier.   With SimpleOCR, you could easily and accurately convert that paper document into editable electronic text for use in any application including Word and WordPerfect. 

Not only is SimpleOCR up to 99% accurate, it is 100% free.

Download SimpleOCR now or learn more its feature and functions.

Accuracy   With optical character recognition up to 99% accurate, there is no better OCR application for the price.  This increased accuracy greatly reduces the need for post-recognition proof reading and correction.  And after all, isn't that why you want to OCR the document in the first place?  Of course it is!  

Features      

Huge Dictionary - With more than 120,000 words, it is unlikely that SimpleOCR will run into a word it does not know.  In the rare event that it does not, our improved text editor allows you to easily add the new word to the dictionary.  By adding new words to the dictionary, SimpleOCR becomes better with every use.      

Despeckle - For those documents which are not particularly clear (i.e. faxes, copies of copies, ...), SimpleOCR provides a despeckle or "noisy document" option which increases SimpleOCR's accuracy.      

Format Retention - SimpleOCR can keep certain elements of the document's format in the recognized document.  From varying font sizes to font formatting elements such as underline, italic, and bold, SimpleOCR recognizes it all.  For certain documents, it retains the original document's format with up to 99% accuracy.      

Image Retention - Along with the document's text, SimpleOCR has the uncanny ability to capture and retain pictures from the document.  This is a great feature which reduces the need to import images from a document by other means.       

Plain Text Extraction - Just need the plain text from the original document?  No problem.  SimpleOCR can be set to recognize the characters and words but ignore the formatting.  The resulting file is ready for your word processor or your HTML/web editor and your own custom formatting.      

Simplified Error Correction - Our text editor highlights suspected errors in the recognized text for easier correction.  This simplifies the otherwise time-consuming task of proof reading the recognized text for errors.  But because SimpleOCR has up to 99% accuracy, you may never need this feature.      

Batch OCR - Do you have several documents to OCR?  Just point SimpleOCR to them and it will OCR them from start to finish without delay.      

Zone OCR - Sometimes all you may need is to extract the text from a certain area in a document.  Maybe one column.  Maybe a footnote.  Maybe just one paragraph.  Unlike other OCR applications, SimpleOCR can limits its OCR ability to a user defined area.  There is no need to OCR an entire document only to use a small portion of it.  With SimpleOCR, OCR only what you need.      

Input Formats - SimpleOCR works with all fully compliant TWAIN scanners and also accepts input from TIFF files.        Output Formats - SimpleOCR can save the documents it acquires in text formats (TXT and RTF) importable into most every program such as Word, WordPerfect, HTML editors, and e-mail programs, either fully formatted or as plain text.  Additionally, it can save scanned documents in the industry standard TIFF format, a format as widely accepted as PDF files.      

Multiple Language Recognition - SimpleOCR currently supports English and French recognition.  We are in the process of adding recognition for additional languages.

System Requirements   SimpleOCR works on any version of windows, from Windows 95-2008 and beyond!.  Your scanner need only a TWAIN driver, the driver that comes with a majority of all scanners sold.  In short, SimpleOCR will most likely work with the PC and scanner you already have.  

Pricing   Our software is free for all non-commercial purposes. It may be re-distributed freely, but only in its original, unaltered form.




The SimpleOCR SDK is a fast, lightweight OCR engine designed to let developers add basic OCR functions to an application with minimal cost and none of the drawbacks of open source solutions.

SimpleOCR is implemented as two C++ dlls with a total file size under 1 megabyte, making it perfect for mobile OCR applications, shareware and freeware applications, or any solution where the 100-500 megabyte footprint of modern OCR engines is impractical. Wrapper dlls and sample code for easy ActiveX and .NET integration are also provided.

The SimpleOCR SDK contains several group of functions including image manipulation, image I/O with TIFF files, image acquisition with TWAIN compliant scanners, and of course, OCR. Note that SimpleOCR SDK can read bi-level and grayscale, and create TIFF files containing bi-level (i.e. black & white) images. TIFF files are created by SimpleOCR SDK using the CCITT Group IV compression scheme, but it can read most TIFF bi-level and grayscale images.

SimpleOCR SDK Documentation

Latest Features in SDK Version 3.5

Version 3.5 adds many important new features to the SimpleOCR engine.  If you purchased SimpleOCR prior to 2007 please contact us to receive an update.
  • Template matching in OCR engine
      Extract only the characters you want from a region, ignoring the rest
     
  • Ability to limit the character set used in OCR
      Improve accuracy by limiting results to certain letters or numbers
     
  • New auto-rotate function finds correct image orientation with high accuracy
     
  • Improved noise removal for shaded regions and speckles on images
     
  • Greatly improved error handling
     
  • Comes with example projects in C++, Visual Basic, ASP, VB.NET
     
  • Returns coordinates of recognized words and images
      Allows mapping of text to original image, typically for correction of hidden text PDF

SDK Pricing and Licensing

Frequently Asked Questions




SimpleOCR doesn't recognize my scanner
SimpleOCR scans my document, but I don't see it

SimpleOCR can't load my TIFF file
My software can't read the images from SimpleOCR
SimpleOCR crashes during the OCR process
How can I add a dictionary for German, Spanish, Italian
The OCR results are poor
I have a "To much shapes error"
May I use SimpleOCR to process screen captures?

SimpleOCR doesn't recognize my scanner.

Your scanner must be a TWAIN compliant scanner that can acquire a black and white image or grayscale image. A common problem is that you have installed an old 16 bit driver that can't communicate with 32 bit applications like SimpleOCR. Try to get an updated driver by downloading it from your scanner manufacturer's website.

If it still doesn't work, you can always scan from your scanning software, save the resulting image in a black and white TIFF file, and process the file with SimpleOCR.

SimpleOCR scans my document, but I don't see it.

SimpleOCR handles only bi-level (black & white) and grayscale images. Please don't scan in color mode.

SimpleOCR can't load my TIFF file.

SimpleOCR handles only bi-level (black & white) and grayscale images. It can't read color TIFF documents. Convert the file into a bi-level or grayscale format and then load it into SimpleOCR.

My software can't read the images from SimpleOCR .

SimpleOCR TIFF files use a CCITT Group IV (a.k.a. ITU T-6) compression scheme. Some software applications are not able to decode TIFF files compressed this way.

SimpleOCR crashes during the OCR process.

Most crash bugs have been fixed in version 3.0 of SimpleOCR.  Please download the new version to eliminate these errors and greatly improve your SimpleOCR experience!

First of all, please be sure to have the last version of SimpleOCR and restart your computer. Then try the following.

Load the SimpleOCR.tif sample file shipped with SimpleOCR and try to OCR it. If you have a crash, please try to uncheck to "extract images" option and retry.  This option doesn't work on old versions of Windows 95.

Look at SimpleOCR status bar once your document is displayed. You should see something like '1728x2200 horizontal and vertical image resolution in Dots Per Inch. If you have some strange values for the resolution (like 87687689x76876987 DPI), it means that your scanning software doesn't fill the resolution fields properly and it can make SimpleOCR crash. You can report the problem to your scanner manufacturer. Unfortunately, you can't modify the resolution fields by hand from SimpleOCR. Save the document in a TIFF file and then use a TIFF file editor to change the resolution fields.

If it works with the sample file but not on your document, please try to select the text areas by hand using the "Create Area" tool.

In any case, please mail us a bug report.

How can I add a dictionary for German, Spanish, Italian.

SimpleOCR can only recognize the characters used in the English and French language. Therefore, it can not recognize characters like ß, ü, ñ, and ú for instance.

The OCR results are poor.

The scanning quality is very important. You should obtain a quality comparable with the quality of the sample file shipped with SimpleOCR.

Usually you should use a scanning resolution of 300 DPI.  It could be less if the characters are quite big or more if the characters are small.

Next, carefully tune the scanning brightness. Look at the characters in the resulting image. They should be clean. If you have a lot of characters in several pieces (i.e. image is too light) or many characters stuck together (i.e. image is too dark), you should tune the scanning brightness (or try a higher scanning resolution).

If your document has a sophisticated page layout, you can help SimpleOCR by selecting the text areas by hand using the "Create Area" tool.

Avoid having a scanning area bigger than the document.  In this case you can have black borders around the document and SimpleOCR doesn't like that.

I have a "Too many shapes error".

This problem occurs when there is more than 20,000 distinct "shapes" (a set of connected pixels) in a document. This probably means your document is "noisy" (a lot of small dots everywhere) or there is an image in your document which looks like a cloud of points.

Two solutions:

  • select the "noisy" option in the bottom toolbar
  • select the text areas with the "area" tool.

May I use SimpleOCR to process screen captures?

Sure, but SimpleOCR usually returns poor results with screen captures.