Warning: Constant FS_CHMOD_DIR already defined in /home/elizab17/public_html/wp-config.php on line 101

Warning: Constant FS_CHMOD_FILE already defined in /home/elizab17/public_html/wp-config.php on line 101

Warning: Constant FS_CHMOD_DIR already defined in /home/elizab17/public_html/wp-config.php on line 101

Warning: Constant FS_CHMOD_FILE already defined in /home/elizab17/public_html/wp-config.php on line 101

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831

Warning: Cannot modify header information - headers already sent by (output started at /home/elizab17/public_html/wp-config.php:101) in /home/elizab17/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1831
{"id":33,"date":"2016-02-16T22:13:57","date_gmt":"2016-02-16T22:13:57","guid":{"rendered":"http:\/\/elizabethgrab.com\/?p=33"},"modified":"2018-03-20T19:34:15","modified_gmt":"2018-03-20T23:34:15","slug":"a-reaction-to-digitization-standards","status":"publish","type":"post","link":"https:\/\/elizabethgrab.com\/digital-humanities\/a-reaction-to-digitization-standards\/","title":{"rendered":"A Reaction to Image Digitization Standards"},"content":{"rendered":"

After reading through suggested standards for scanning processes based on each type of material or object, I can thoroughly understand why institutions\u2014despite enthusiasm for access and digital humanities\u2014might shy away from long term and collection-wide scanning projects.<\/p>\n

The Harry Ransom Center recently launched a digitization project entitled Project REVEAL <\/a>(Read and View English and American Literature).\u00a0 As an institution with astounding resources available to them, the Center had the luxury of approaching the project as a model of ideal digitization workflows for future scanning endeavors both in-house and in the wider community.\u00a0 Part of Project REVEAL’s objective was to scan twenty-five of the HRC’s English & American author collections in a way that respects original order and disregards the traditional approach of only digitizing collection highlights.\u00a0 Going box by box, folder by folder, and item by item, the project digitized and provided detailed metadata at the most granular level.<\/p>\n

Most organizations, however, do not have the time, finances, nor support to accomplish such a dedicated attempt at digitizing their collections.\u00a0 Collections may not be processed to the item level.\u00a0 Oversized or delicate items may require more specialized handling and scanning than the facility can accommodate.\u00a0 The money supporting the digitization may only stretch far enough to allow the scanning of collection highlights.<\/p>\n

To experiment with just how long a high quality scan can take, I digitized a proofing press run of a block from Wellesley College’s Book Arts Lab collection on my HP Photosmart C4700 flatbed scanner.\u00a0 The 3×4 inch scrap of card, scanned at 4800 dpi, took 1 minute to scan and save.\u00a0 While my flatbed is aged and amateurish compared to newer, professional grade scanners, items scanned at a high enough dpi to withstand being resized and to accommodate zooming features takes a marked amount of time, which increases many times over when digitizing larger items.\u00a0 Then there’s the concern of file size and type.\u00a0 I used a .jpeg format, since my purposes don’t require an uncompressed, high quality .tiff.\u00a0 But any archival purposes would require that caliber of format.\u00a0 The opening and downloading of higher quality files is its own time commitment and can severely slow down the loading of the image or the webpage in which it’s embedded, even though the excellence of the scanned image is worth the wait.<\/p>\n

\"senior — This cut illustrates Wellesley seniors’ hoop rolling tradition.\u00a0 For more on how scanning setting influences image quality, see “The Image” section of Besser’s Introduction to Imaging (linked at beginning of post).<\/figcaption><\/figure>\n
While these considerations might discourage most organizations from keeping up with the standards the HRC laid out in Project REVEAL, the project’s imaging and metadata standards.\u00a0 And they don’t differ much from the standards laid out by the Library of Congress, FADGI or Besser’s recommendations.\u00a0 Without a quality image available for easy, detailed viewing (and potentially for download), a digitization project loses all significance.<\/p>\n
These quality images don’t mean much without their metadata, though.\u00a0 Current technology remains incapable of auto populating metadata fields better than a human viewer. [1]\u00a0 Consequently, to compare or search image files, one needs sufficiently descriptive metadata.\u00a0 Project REVEAL, under ‘Object Description,’ includes the fields Title, Creator, Date, Description, Subject (includes medium and format details), Subject, Language, Format, Extent (eg number of manuscript pages), Digital Object Type, Physical Collection, Collection Area, Digital Collection, Repository, Rights, Call Number, Series, Identifier, Finding Aid and File Name.\u00a0 For description of the image, rather than the object, Project REVEAL includes the categories of Title and File Name.\u00a0 The metadata sections also provide menus for Tags and Comments, which allows for user participation in making the object even more relevant to viewers.\u00a0 As an archival institution, the HRC emphasizes metadata representative of their storage structure, such as collections and series identifiers.\u00a0 While HRC modified their metadata structure for their purposes, the Library of Congress lays out general guidelines for digital material metadata that follows similar lines:<\/p>\n
$\"metadata$ <\/a>
For more, click on the Library of Congress link at the top of this post.<\/figcaption><\/figure>\n
More information on metadata standard will come in a later post.<\/p>\n
Digitization standards hold relevance beyond the institutional level, too.\u00a0 Individuals establishing an archive\u2014such as visual artists, writers or genealogists\u2014also require workflows for the digitization of their materials.\u00a0 While both Introduction to Imaging<\/em> and “Becoming Digital” are oriented towards organizations, their suggestions can be instructive for project planning at an at-home level.\u00a0 When looking to digitize personal materials, the highest quality images might not be necessary (especially since materials at the highest resolution take quite a while on at-home scanners, as illustrated by my own experimentation).\u00a0 But Introduction to Imaging<\/em>‘s chapter on “The Image” is supremely instructive in identifying what an individual might require for their digitized images to serve their needs.\u00a0 The subsections on Bit Depth, Resolution and File Format illustrate what one will get out of using various levels of depth, size and compression.\u00a0 By literally illustrating the limits and advantages of each aspect of the scanning process, Besser saves individuals’ time and storage by allowing them to see what those images can do at each level\u2014one need not reinvent the wheel in taking the time to experiment with all of the combinations of aspects personally.<\/p>\n
For those working in text-heavy digitized objects, OCR (Optical Character Recognition) might be an avenue to consider before setting off on a digitization project.\u00a0 Unfortunately, even neatly written handwritten materials don’t respond well to OCR programs, nor do items with variable formatting and fonts (like newspapers or mathematical treatises).\u00a0 But if your materials are type written\u2014such as manuscripts produced on a typewriter\u2014OCR might be an avenue to explore.\u00a0 “Becoming Digital” explains the OCR evaluation process quite well, including a breakdown of programs’ efficacy and price point, in its chapter on “How to Make Text Digital.”\u00a0 Impact’s Tools & Resources page on Tools for Text Digitization provides an even more in depth breakdown of the options out there, though it does not provide pricing information.<\/p>\n
While I can see the appeal of OCR for a larger typewritten collection that needs to be searchable down to the phrase, it seems less useful in virtually any other context, since price for software, time for processing and effort for verification\/correction result in an imbalance between energy input and product output.\u00a0 An answer to the instinct to OCR materials might instead to be thoroughly descriptive in the metadata fields of Description, Subject and Tags.<\/p>\n
*For more examples of digitization projects, take a look at sites like the* *Digitization Projects Registry <\/a>to see how various organizations are implementing these technical suggestions.<\/p>\n*
[1] MIT recently developed a program that could recognize if a painting was in the style of cubism, but it\u2019s far less expensive to have a practiced eye tell you that in the same amount of time than ask a computer to run the program, which isn\u2019t equipped to identify every object, style or variation therein.<\/p>\n
\n
Instructional Resources for Digitization:<\/p>\n
\n
Howard Besser, ed. Sally Hubbard with Deborah Lenert. *Introduction to Imaging. <\/em>Getty Research Institute Publications, Revised 2003.* *http:\/\/www.getty.edu\/research\/publications\/electronic_publications\/introimages\/index.html<\/a><\/li>\n*
Daniel J. Cohen and Roy Rosenzweig. \u201cBecoming Digital.\u201d Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web, 2006. http:\/\/chnm.gmu.edu\/digitalhistory\/digitizing\/<\/a><\/li>\n
*FADGI (Federal Agencies Digitization Guidelines Initiative). http:\/\/www.digitizationguidelines.gov\/<\/a><\/li>\n*
*IMPACT (Improving Access to Text). http:\/\/www.digitisation.eu\/training\/recommendations-for-digitisation-projects\/<\/a><\/li>\n*
Library of Congress, Recommended Format Specifications. http:\/\/www.loc.gov\/preservation\/resources\/rfs\/TOC.html<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"
After reading through suggested standards for scanning processes based on each type of material or object, I can thoroughly understand why institutions\u2014despite enthusiasm for access and digital humanities\u2014might shy away from long term and collection-wide scanning projects. The Harry Ransom Center recently launched a digitization project entitled<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[10,9],"_links":{"self":[{"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/posts\/33"}],"collection":[{"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/comments?post=33"}],"version-history":[{"count":12,"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/posts\/33\/revisions"}],"predecessor-version":[{"id":388,"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/posts\/33\/revisions\/388"}],"wp:attachment":[{"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/media?parent=33"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/categories?post=33"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/elizabethgrab.com\/wp-json\/wp\/v2\/tags?post=33"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}