Mass Digitization – Requirements and Specifications

Requirements and Specifications: Microfilm

Source Material Specifications

• 16mm or 35mm microfilm
• Maximum reduction ratio: 27x (35mm film) or 48x (16mm film)
• Film must not be duplexed (contain two or more rows of images)
• All microfilms in a single batch must be of the same film type (16mm/35mm) and reduction ratio; the correct reduction ratio must be indicated for each batch
• Each batch must be accompanied by an electronic manifest identifying all of the materials to be scanned using the Canadiana-supplied manifest template and/or electronic metadata records supplied in MARC or MARCXML or using the Canadiana-supplied metadata template
• Each film must be labeled with a unique identifier that can be matched to the corresponding identifier on the supplied manifest or metadata
• Films which cannot be matched to a manifest entry or metadata record will be rejected for scanning
 
Output Specifications
• One image per frame
• Images are cropped to minimize the appearance of black borders without cropping out any of the image itself; the amount of visible border depends on the amount of skew in the source frame
• 2-up frames are output as a single image (not split into two images)
• Images are rotated, if necessary, so that the majority of the images in the reel are in standard reading orientation (images are not individually rotated)
• Images are 300 dpi greyscale, saved as JPEG files (.jpg) with high quality (80%) compression
• Images are numbered sequentially using 4-digit numbers starting at 0001 (i.e. 0001.jpg, 0002.jpg, 0003.jpg, etc.)
• One ALTO 3.0 XML file per image containing the positionally-encoded OCR output for that image
• ALTO files have the same base name as the corresponding image file, with the .xml extension (e.g. 0001.jpg => 0001.xml)
• One UTF-8 text file per image containing the plain text OCR output for that image
• Text files have the same base name as the corresponding image file, with the .txt extension (e.g. 0001.jpg => 0001.txt)
• One PDF/A document created from all of the image files and the corresponding OCR data to enable full text searching
• The PDF/A file is named after the film’s supplied object identifier
• All image, PDF and OCR files are placed in a directory/folder named after the film’s supplied object identifier
 
Pricing
• $3.00 per reel + $0.05 per image (e.g. a reel with 1,000 images costs $53.00).
• JPEG2000 instead of JPEG file format: add $0.03 per image.
• TIFF uncompressed instead of JPEG file format: add $0.05 per image.
• An administration fee of $500 per batch applies to cover fixed setup, management and delivery costs. This fee is waived for batches larger than 50,000 images.
 

Requirements and Specifications: Unbound or Disbound Documents

Source Material Specifications
• Minimum page size is A8 (2.0 x 2.9”)
• Maximum page size is A3 (11.7 x 16.5”)
• Mixed page sizes are permitted within each document
• The pages of each document must be physically bound together (by means of an envelope, folder or single removable clip) and in the correct order
• Pages must not be torn or fragile
• The pages of disbound documents must be completely separated and with all excess glue removed
• All staples, paperclips and any other attachments must be removed
• All pages must be unfolded
• Creased pages can be scanned, but any fold or crease lines in the original may appear in the scanned image
• Documents can be scanned either single sided (front of the page only) or double sided (both sides scanned) but all documents in a batch must be scanned the same way
• Each batch must be accompanied by an electronic manifest identifying all of the materials to be scanned using the Canadiana-supplied manifest template and/or electronic metadata records supplied in MARC or MARCXML or using the Canadiana-supplied metadata template
• Each document must be labeled with a unique identifier that can be matched to the corresponding identifier on the supplied manifest or metadata
• Documents which cannot be matched to a manifest entry or metadata record will be rejected for scanning
 
Output Specifications
• One image per page
• Images are automatically cropped to the edge of the page, or a black border can be added (one option per batch); irregularly-shaped pages will contain some black background
• Images are rotated, if necessary, so that the majority of the images in a document are in standard reading orientation (images are not individually rotated)
• Images are 300 dpi colour, saved as JPEG files (.jpg) with high quality (80%) compression
• Images are numbered sequentially using 4-digit numbers starting at 0001 (i.e. 0001.jpg, 0002.jpg, 0003.jpg, etc.)
• One ALTO 3.0 XML file per image containing the positionally-encoded OCR output for that image
• ALTO files have the same base name as the corresponding image file, with the .xml extension (e.g. 0001.jpg => 0001.xml)
• One UTF-8 text file per image containing the plain text OCR output for that image
• Text files have the same base name as the corresponding image file, with the .txt extension (e.g. 0001.jpg => 0001.txt)
• One PDF/A document created from all of the image files and the corresponding OCR data to enable full text searching
• The PDF/A file is named after the document’s supplied object identifier
• All image, PDF and OCR files are placed in a directory/folder named after the document’s supplied object identifier
 
Pricing
• $3.00 per document + $0.05 per image (e.g. a document with 200 images costs $13.00).
• JPEG2000 instead of JPEG file format: add $0.03 per image.
• TIFF uncompressed instead of JPEG file format: add $0.05 per image.
• An administration fee of $500 per batch applies to cover fixed setup, management and delivery costs. This fee is waived for batches larger than 50,000 images.

 

Requirements and Specifications: Books and Bound Documents

Source Material Specifications
• Maximum page size is A2 (16.5 x 23.4”)
• Maximum book thickness is 3”
• Pages must not be torn or fragile
• Book must not contain fold-outs
• Pages must have minimum margins and gutters of 0.5”
• Document must be capable of being opened at least 110º without damage
• Each batch must be accompanied by an electronic manifest identifying all of the materials to be scanned using the Canadiana-supplied manifest template and/or electronic metadata records supplied in MARC or MARCXML or using the Canadiana-supplied metadata template
• Each document must be labeled with a unique identifier that can be matched to the corresponding identifier on the supplied manifest or metadata
• Documents which cannot be matched to a manifest entry or metadata record will be rejected for scanning
 
Output Specifications
• One image per page
• All pages, including front and back covers, are scanned
• Pages are cropped to remove black border without removing any text or other content from the page itself
• All pages are scanned in the normal reading orientation of the book (images are not individually rotated)
• Images are 300 dpi colour, saved as JPEG files (.jpg) with high quality (80%) compression
• Images are numbered sequentially using 4-digit numbers starting at 0001 (i.e. 0001.jpg, 0002.jpg, 0003.jpg, etc.)
• One ALTO 3.0 XML file per image containing the positionally-encoded OCR output for that image
• ALTO files have the same base name as the corresponding image file, with the .xml extension (e.g. 0001.jpg => 0001.xml)
• One UTF-8 text file per image containing the plain text OCR output for that image
• Text files have the same base name as the corresponding image file, with the .txt extension (e.g. 0001.jpg => 0001.txt)
• One PDF/A document created from all of the image files and the corresponding OCR data to enable full text searching
• The PDF/A file is named after the document’s supplied object identifier
• All image, PDF and OCR files are placed in a directory/folder named after the document’s supplied object identifier
 
Pricing
• $3.00 per document + $0.15 per image (e.g. a book with 300 pages costs $48.00).
• JPEG2000 instead of JPEG file format: add $0.03 per image.
• TIFF uncompressed instead of JPEG file format: add $0.05 per image.
• An administration fee of $500 per batch applies to cover fixed setup, management and delivery costs. This fee is waived for batches larger than 15,000 images.

Requirements and Specifications: Microfiche

Source Material Specifications
• Maximum reduction ratio: 25x
• No jacketed or COM fiche
• All microfiche in a single batch must be of the same reduction ratio; the correct reduction ratio must be indicated for each batch
• Each batch must be accompanied by an electronic manifest identifying all of the materials to be scanned using the Canadiana-supplied manifest template and/or electronic metadata records supplied in MARC or MARCXML or using the Canadiana-supplied metadata template
• Each fiche must be labeled with a unique identifier that can be matched to the corresponding identifier on the supplied manifest or metadata
• Fiche which cannot be matched to a manifest entry or metadata record will be rejected for scanning
 
Output Specifications
• One image per frame
• Images are cropped to minimize the appearance of black borders without cropping out any of the image itself; the amount of visible border depends on the amount of skew in the source frame
• 2-up frames are output as a single image (not split into two images)
• Images are rotated, if necessary, so that the majority of the images in the fiche are in standard reading orientation (images are not individually rotated)
• Images are 300 dpi greyscale, saved as JPEG files (.jpg) with high quality (80%) compression
• Images are numbered sequentially using 4-digit numbers starting at 0001 (i.e. 0001.jpg, 0002.jpg, 0003.jpg, etc.)
• One ALTO 3.0 XML file per image containing the positionally-encoded OCR output for that image
• ALTO files have the same base name as the corresponding image file, with the .xml extension (e.g. 0001.jpg => 0001.xml)
• One UTF-8 text file per image containing the plain text OCR output for that image
• Text files have the same base name as the corresponding image file, with the .txt extension (e.g. 0001.jpg => 0001.txt)
• One PDF/A document created from all of the image files and the corresponding OCR data to enable full text searching
• The PDF/A file is named after the fiche’s supplied object identifier
• All image, PDF and OCR files are placed in a directory/folder named after the fiche’s supplied object identifier
 
Pricing
• $3.00 per document + $0.15 per image (e.g. a reel with 1,000 images costs $53.00).
• JPEG2000 instead of JPEG file format: add $0.03 per image.
• TIFF uncompressed instead of JPEG file format: add $0.05 per image.
• An administration fee of $500 per batch applies to cover fixed setup, management and delivery costs. This fee is waived for batches larger than 15,000 images.

Requirements and Specifications: 35mm Slides

Source Material Specifications
• Each batch of slides must be sorted in numbered slide holders, so that their order is clearly discernable.
• Each slide holder must be labeled with a unique identifier that can be matched to the corresponding identifier on the supplied manifest or metadata
• Each batch of slides must be accompanied by an electronic manifest identifying each slide holder using the Canadiana-supplied manifest template and/or electronic metadata records supplied in MARC or MARCXML or using the Canadiana-supplied metadata template.
• Slide holders which cannot be matched to a manifest entry or metadata record will be rejected for scanning.
 
Output Specifications
• One image per slide.
• Slides are treated with compressed air to remove most dust particles.
• Images from slides are cropped inside the slide mount with minimal loss of image area. 
• Images are rotated so that the images are in their correct viewing orientation.
• Images have a digital dust removal and sharpening filters applied.
• Images are 2000 dpi colour, saved as JPEG files (.jpg) with high quality (~80%) compression.
• Images are numbered sequentially using 4-digit numbers starting at 0001 (i.e. 0001.jpg, 0002.jpg, 0003.jpg, etc.).
• All image files are placed in a directory/folder named after the slide holder’s supplied object identifier.
 
Pricing
• $0.70 per slide (e.g. a slide holder of 30 slides costs $21.00)
• TIFF uncompressed instead of JPEG file format: add $0.05 per image.
• An administration fee of $500 per batch applies to cover fixed setup, management and delivery costs. This fee is waived for batches larger than 3,500 images.