How does optical character recognition work?
The physical document is digitized using a multifunction device or a scanner. The scanned document is analyzed for light and dark areas. The light areas are identified as the background and the dark areas as characters to be recognized.
To find alphabetical letters or numerical numbers, the dark areas are further processed. Often only one word, character or block of text is recognized at a time.
Two methods of character recognition:
1.) Feature matching: Each character can be identified based on certain features. These include the number of unrolled lines, crossed lines or curves. For example, the letter A can be stored as two diagonal lines connected in the middle by a horizontal line. In the next step, the character is identified and converted into a code for further processing in the computer.
2.) Pattern recognition (Pattern Matching): The software uses its own character database to match the characters to be recognized.
Areas of application of OCR technology?
•
The automatic processing of documents (delivery notes, order documents, orders).
•
The automation of data entry, processing and extraction.
•
Processing printed documents that can be edited with Microsoft Word etc.
•
Translating specific words within a captured document into another language.
•
Entering important legal documents into a database.
•
Sorting letters for postal delivery.
•
Etc.