F&M College Library

Scanning and Digitization Basics

Scanning Basics

Bit depth

For digital images, bit depth refers to the number of bits used to represent color. A higher bit depth allows for wider degree of subtlety between colors. Bit depth and color depth are sometimes used interchangeably. Common bit-depths are:

24-bit color: 8 bits per channel (8 bits each for Red, Green, and Blue) for a total of 16,777,216 different colors. Sometimes referred to as True color.

8-bit grayscale: 8 bits per pixel for 256 different shades of gray, including black and white.

bitonal: 1 bit per pixel, in which each pixel is either white or black. Sometimes called bitmap image or black and white image (not to be confused with grayscale).

See also Color mode.

Color mode


Scanner software sometimes refer to color mode, which is generally a combination of color and bit-depth. 

See also: Bit depth.

Color: Digital images use combinations of red, green, and blue (RGB) light displayed through a computer monitor to represent a wide range of colors. In this model, the absence of color is black.

Grayscale: Grayscale is a range of grays between white and black.

Bitonal: Also referred to as black and white or bitmap. Bitonal images are formed by a range of only two colors, often black and white.

Color image
Color image
Bitonal image
Bitonal (black and white) image
Grayscale image
Grayscale image

File formats

File formats are the varying structures, rules, and data of a digital object which allow it to be accessed and read by various agents. Some file formats only support a certain kind of image compression, while others support multiple types of image compression. Commonly used file formats:

.jpeg or .jpg: Only supports lossy jpeg compression. A good choice for displaying images on the web, for email, or inserting into another document such as Microsoft Word and PowerPoint or Keynote presentations.

.tiff: Can support both lossy and lossless compression including jpeg, zip, and lzw. With lossless compression, TIFF is a good choice for a master file to edit in Photoshop or other image editing software, for an archival file, or for high-resolution printing.

.jp2: A newer format than jpeg or tiff. Can support either lossy or lossless compression. 

.pdf: PDFs can be made up of multiple layers of content, often including a layer of image content. Images in PDFs might be compressed with jpeg, zip, jp2, jbig2, or CITT compression.

Image Compression

Digital images often need to contain a lot of information to accurately represent the colors and shapes contained in the image. Compression techniques have been developed to save space when saving and storing digital images. There are two broad types of compression:

Lossy: In lossy compression, some unique information is discarded. This makes the image smaller, but with the eventual loss of data. Although you may not notice it at first, if you save and resave an image (like a JPEG) with lossless compression, you will eventually get an image with distortions.

Lossless: In lossless compression, no information is lost. The image before and after compression is always the same.

OCR (Optical Character Recognition)

OCR is the process of converting scanned images of text to computer searchable and readable text. OCR involves programs that analyze the shapes of pixels to identify and convert those shapes into letters and symbols.

Resolution

Resolution typically refers to the number of samples per pixel. Dots per inch (dpi), pixels per inch (ppi), and samples per inch (spi) technically refer to different things but are often used interchangeably. Commonly used resolutions are:

600dpi: Good for scanning images and photographs or materials that will later be printed. Also a good choice if scanning for preservation.

400dpi: Good for scanning text for high-quality preservation, or for text that is small (footnotes or endnotes, for example).

300dpi: Good for scanning text for reading, searching, and general use. If you intend to run OCR on your scanned materials, 300dpi is a good choice.