Blog 11

What is OCR?
Gizem Baruk I 14.12.2021

Optical character recognition, also called OCR (Optical Character Recognition), is a technology that can recognize and read handwritten or printed text characters from digital documents, e.g. a scanned paper document. The text of the document is examined and characters are translated into a code that is used for data processing. Optical character recognition often consists of hardware and software that converts physical documents into machine-readable text. If a camera, scanner or multifunction device is used to read and copy the document, the software takes over the essential processing. With structure recognition (layout analysis), it can distinguish text blocks from graphic elements, break down texts into sentences, words and characters and save them for context analysis in order to later determine content-related connections. With an advanced method of character recognition, the use of artificial intelligence (AI) can be important, for example in the recognition of different languages or handwriting.
How does optical character recognition work?
The physical document is digitized using a multifunction device or a scanner. The scanned document is analyzed for light and dark areas. The light areas are identified as the background and the dark areas as characters to be recognized.
To find alphabetical letters or numerical numbers, the dark areas are further processed. Often only one word, character or block of text is recognized at a time.

Two methods of character recognition:
1.) Feature matching: Each character can be identified based on certain features. These include the number of unrolled lines, crossed lines or curves. For example, the letter A can be stored as two diagonal lines connected in the middle by a horizontal line. In the next step, the character is identified and converted into a code for further processing in the computer.

2.) Pattern recognition (Pattern Matching): The software uses its own character database to match the characters to be recognized.

Areas of application of OCR technology?
The automatic processing of documents (delivery notes, order documents, orders).
The automation of data entry, processing and extraction.
Processing printed documents that can be edited with Microsoft Word etc.
Translating specific words within a captured document into another language.
Entering important legal documents into a database.
Sorting letters for postal delivery.
Etc.

Share by: