Filter creation for optimizing scanned document quality

September 26, 2022 by Simeon Hermann

How to transform a photo-scanned document into a professionally scanned document

I work as a junior developer for Docutain, a mobile app for scanning and managing documents. I am excited to share how we turn regular photos into high-quality document scans. But first, let us reach back to explain the general issue.

Initial situation: Scanning challenges
Minimizing geometric and photometric distortions
Methods to determine a document with illumination estimation
Determining a background image
Segmentation and Interpolation Methods
Conclusion

Initial situation: Scanning challenges

For a while, the only choice for digitizing documents was using classic document scanners for digital archiving. Nowadays, it is common to use smartphone cameras as mobile scanners to digitize documents. While the use of mobile smartphone cameras as scanners has some obvious advantages over stationary scanners, the scans as such exhibit some optical weaknesses.

The reason for this is that the imaging conditions for simple photographs are largely uncontrolled.

Meanwhile, the usual flatbed scanners ensure that the document lies flat and is evenly illuminated — thus also ensuring largely controlled imaging conditions. The uncontrolled environment while casually photo-scanning with your phone can result in a wide variety of undesirable effects. These effect can make digital post-processing such as text recognition (OCR) more difficult and impair the readability and aesthetics of the document photos.

Those effects can be divided into geometric and photometric distortions.

Geometric distortions primarily result from the fact that a document was not photographed from a vertical perspective and does not lie completely flat on a surface. You may know this from folded letters that must be kept flat manually.

Photometric distortions manifest themselves by illumination artifacts such as brightness gradients, shading, shadows, or color casts. They might result from various light sources which are not aimed directly at the document. But also, objects occluding the incoming light result in artifacts. In a common scenario, the smartphone and photographer block the ceiling light, creating a shadow on the document photo.

Color casts especially occur with white paper as it reflects the light’s whole visible spectrum. Usually, you must deal with a warm yellowish or a cold blueish lighting mood. You can see some of those typical effects in the images below:

Various photometric effects as color casts, phone shadows or darkening shade

Various photometric effects as color casts (left), phone shadows (middle) or darkening shade (right)

Minimizing geometric and photometric distortions

With Docutain it is possible to remove photometric and geometric distortions.

We crop the photos for geometric distortions to only show the documents themselves and then correct them for perspective. The applicable filters allow to correct the photometric distortions. The result should be a photo without any illumination artifacts but only the documents’ surface properties. Just like in the filtered photos below, taken from Docutain:

illumination corrected versions of the previous document photos

To remove those artifacts, illumination correction must be applied on the document image. The idea is to find out which structures in the photo can be attributed to the illumination and which ones to the content of the document. According to the Retinex Theory, the image formation model can simply be expressed by

N = R * I.

💡 Definition of Retinex theory

The Retinex theory, associated with color constancy, posits that human perception of colors remains relatively constant despite changes in lighting conditions due to the brain's ability to compensate for variations in illumination.

The original illuminated image is N, which consists of the reflectance R, basically the content of the document, and the illumination term I. The most basic approach is to estimate the illumination image by removing the reflectance structures. Once you have the illumination image, you can easily remove its influence from the input image by dividing N by I. This might sound simple, but the mentioned estimation of the illumination often is not that easy.

Check out our Docutain SDK
Integrate high quality document scanning, text recognition and data extraction into your apps. If you like to learn more about the Docutain SDK, contact us anytime via SDK@Docutain.com.

Methods to determine a document with illumination estimation

When it comes to documents, there are a lot of assumptions that we can draw about the content of the original document. This makes the illumination correction easier than for random subjects as outdoor photos.

We assume that the paper is uniformly colored and that the document's content is printed on it. Mostly it is white but basically it can be any brighter color. Knowing the color of the original paper helps us estimate the lighting more easily. We focus on estimating the background instead.

We just remove the printed content of the document, the foreground, from the photo to obtain an illuminated background image. Since the original background is uniform, all structures in the background image can be traced back to the illumination. According to the image formation model in relation to the background, we now have the input background image N_BG, as well as the (homogenous) original background color R_BG. With this, the illumination image I can be calculated.

It basically describes, in the form of a pixelwise gain factor, how strongly the document was distorted by the influence of the illumination. Since the illumination is independent of whether it was derived from the original or the background image, we can use it for the illumination correction of the original image as well. The formula for that is

R = (M / N_BG) * N.

where M is the original paper material color, N_BG the illuminated background image, and N the input image. The result is R, which shows only the content of the document including the given uniform background color. This is exactly what we strive for.

In summary, the illumination estimation is practically equal to a background estimation if there is a given homogenous background.

separation of the original image into reflectance and illumination

separation of the original image (left) into reflectance (middle) and illumination (right)

Determining a background image

To remove the foreground from the input image using suitable image processing methods, we need additional assumptions about the influence of the illumination, the document’s content and the background itself. There is quite a range of assumptions and presumptions about document photos. The problem is, that hardly any of them are always correct. With increasing complexity of the documents’ content as well as of the illumination situation, their correctness becomes more uncertain.

When it comes to plain text documents with black text on white paper, we have specific expectations of the document photo. Usually, text is rather small and can be detected by high intensity gradients. On the other side, the illumination component mostly consists of large-scale and smoothly varying structures. When taking photos, it's hard to avoid a brightness gradient or shading, especially when capturing bright objects. Just imagine the typical effect it can have on your pictures.

Typical image processing methods to remove the text from a document photo like this without losing information about the illumination component are the following:

Low-pass filters: They keep the smooth gradient along the whole document but filter out the high frequency components as text.
Rank filters: as a median filter can be used based on the assumption that the foreground is rather small. Then, the foreground pixels make up only a small portion of the pixels in a neighborhood and are being replaced by the background pixels. The kernel size must be adjusted to the typical pixel size of the text.
Percentile filter: with a high given percentile or even a morphological dilatation both exploit the fact that the foreground is significantly darker than the background and, thus, can be eliminated more effectively by choosing a bright pixel value from the neighborhood.

But it is not always that easy due to the following reasons: On the one hand, the illumination artifacts to be removed are not always smooth and large-scaled. Common exceptions are strong shadows with distinct edges or small shaded areas resulting from wrinkles in the paper.
On the other hand, documents cannot always be reduced to text only. They might contain any kind of illustrations. The possibilities of their look, shape or size are virtually unlimited.

In these cases, our simple foreground removing methods would have the following effects: For the illumination estimation some illumination artifacts like the ones just mentioned may be falsely removed while bigger printed content as illustrations may remain in the estimated illumination. Conversely, for the illumination correction this means that the mentioned illumination artifacts will not be removed while illustrations are tried to be removed resulting in their impairment.

If we do not only want to focus on plain text documents, the most common approach is Segmentation and Interpolation. If we identify the foreground elements, we can simply mask them and estimate the hypothetical background in the masked areas via interpolation. Accordingly, the main tasks are segmentation of background and foreground, and a suitable interpolation method.

Segmentation and Interpolation Methods

The segmentation approaches are mostly based on the assumption that foreground elements are separated from the background by gradients. To what extent this applies is uncertain. Due to the sheer amount of illumination artifacts and even more different looking illustrations, the assumption cannot always be guaranteed. In the worst case, an illustration merges smoothly into the background. However, since we assume that the two segments are separated, the approaches are either to directly detect foreground elements, e.g. by edge detection, or to first detect a contiguous background, e.g. by Region Growing.

💡 Definition Region Growing

Region growing is a technique in image processing where adjacent pixels with similar properties are grouped together to form larger, coherent regions based on predefined criteria.

When it comes to interpolation techniques, ordinary interpolation methods as bilinear interpolation can be a legitimate option. It is used to estimate the values of pixels at non-integer coordinates by interpolating the values of neighboring pixels in a weighted average manner based on their proximity to the target position. However, one must keep in mind that in our scenario the masked regions can be very unevenly distributed. The problem is also called "scattered data interpolation" since there might be a lot of known points in an unprinted area but no data at all in a big illustrated area. Therefore, specialized methods as natural neighbor interpolation might work more effectively. Generally, the interpolation can be done by inpainting methods. The printed areas are seen as damaged or missing parts for our background image. There are several techniques for that, for example one based on the Fast Marching Method and described by Alexandru Telea in 2004. Usually, the masked regions are somehow gradually filled by weighted averages of the surrounding known pixels.

The Fast Marching Method is an algorithm used primarily for solving the Eikonal equation, which describes the evolution of wavefronts in a medium with a varying propagation speed. This method efficiently computes the arrival times of wavefronts originating from specified sources to all other points in the medium. It's widely used for path planning, image segmentation, and level set methods.

So now that we derived a full background image, we only need the original uniform paper color to generate a factorial shading map. We can manually set a fixed color like the widespread pure white or estimate it with the known background pixels. In the latter case, it is of course difficult to remove color casts but the illumination inhomogeneity of the document will still be corrected.

Finally, with the estimated illuminated background image N_BG and original paper color M, we can multiply the shading map with the original input photo N and obtain the illumination corrected document photo. This is done with the previously mentioned formula

R = (M / N_BG) * N.

Conclusion

As you might have realized, the biggest challenge is the distinction between illumination-based structures and content-based structures. It is an ill-posed and underdetermined problem and thus, it seems almost impossible to achieve flawless results for all scenarios all the time. Nevertheless, we are very satisfied with the results of our Docutain image filters based on these principles of illumination correction of document photos.

Below you can see an example of how one of our filters outperformed the filters of the competing apps Adobe Scan and Microsoft Lens. You can clearly detect how our filter manages to remove strong color casts as the blueish area in the lowest third of the letter, which is something that especially Adobe Scan struggles with. On the other hand, the filter also succeeds at preserving illustrated areas as the black region at the bottom of the document. The default Microsoft Lens document filter erases almost every content bigger than usual text, which seems to relate to the previously mentioned downside of focusing on plain text documents.

comparison

comparison of the filtering of a photo-scanned document (left) through the filters of Docutain, Adobe Scan and Microsoft Lens (from left to right)

You are invited to try out the scan function of Docutain on your own documents! Just download Docutain for free in the Google Play Store and/or Apple App Store.

By the way, our scan functionality and a few more, e.g. data extraction based on the detected text, is also offered as a separate software development kit which you can use to add these functionalities into your own apps.

Check out our Docutain SDK
Integrate high quality document scanning, text recognition and data extraction into your apps. If you like to learn more about the Docutain SDK, have a look at our Developer Documentation or contact us anytime via SDK@Docutain.com.