How Materials are Made Available in the Virtual Vietnam Archive

The Virtual Vietnam Archive project began in 2001 as an effort to digitize the complete holdings of the Vietnam Archive. The staff of the Center and Archive have extensive experience with archives, digitization, and digital projects. Using this experience, we have established a set of guidelines and standards that best fit our project and our goals. We are constantly reviewing our procedures, as well as industry standards, and adjust our practices as necessary.

Much of the digitization and metadata creation is done by students of Texas Tech University. These students come from a wide variety of backgrounds, but all have an interest in history, and many have a connection to the Vietnam War. Each student undergoes an extensive training program, and their work is monitored as part of the quality control process. In most cases, the same student will be responsible for a digitizing the all of the documents from a single collection. We also employee students who will focus exclusively on photographs or slides, or other special projects. Many of the students will continue working until they graduate from the university. Since the beginning of the Virtual Archive project, the Vietnam Center and Archive has employed over 100 students, a mixture of both undergrad and graduate.

The Virtual Vietnam Archive is an continually evolving project. New materials are added daily, and we are always striving to make our resources more available and easier to access for researchers. If you have any comments or suggestions about the project, please feel free to use our online contact form, or call at 806-742-9010.

Digitization Workflow

The Vietnam Center and Archive has an extensive collection of hardware available for digitization, and all work is conducted in-house by either students or full time faculty or staff. This section describes the digitization workflow for our most common types of materials. The metadata we collect for each item is described in the next section.

The Database

The Virtual Vietnam Archive is powered by the Cuadra Star database system. This powerful database program is very flexible and customizable, and has allowed us to customize the Virtual Archive to fit the needs of our collections and our researchers, and to provide a variety of ways to access the digital materials. All items in a collection, copyrighted and non-copyrighted, are digitized, but only non-copyrighted items are available online to researchers. Personal information, such as addresses, phone numbers, and Social Security Numbers, are removed from all digital copies.

Document/Manuscript Digitization

The majority of the materials in the Virtual Vietnam Archive are printed documents. These materials are digitized, and metadata records are created, by our student staff. Digital files and database records are quality controlled by a full time staff member before they are made available to researchers. Optical Character Recognition (OCR) is run for each document. This OCR text is added to the metadata record, and is also embedded into the PDF file to allow finding of words or phrases within the file itself.

  • Equipment
    • Dell desktop computers w/ Microsoft Windows OS
    • Adobe Acrobat 9
    • Fujitsu fi-4220c scanners with both flatbed and automatic document feeders for items up to 8.5"x17"
    • Epson Expression 10000XL for items up to 11"x17"
  • Specifications
    • Master Copy - 300 dpi PDF
    • Access Copy - compressed PDF

Still Images (photographs, slides, and negatives)

The Virtual Vietnam Archive contains over 100,000 still images. These materials are digitized, and metadata records are created, by our student staff. Digital files and database records are quality controlled by a full time staff member before they are made available to researchers.

  • Equipment
    • Dell desktop computers w/ Microsoft Windows OS
    • Adobe Photoshop Expressions
    • Epson Perfection V700 for photographs and negatives larger than 35mm
    • Epson Expression 10000XL for large images, up to 11x17
    • Nixon Supercool Scan 5000 for slides and 35mm or smaller negatives
  • Specifications
    • Master Copy - 300 dpi TIFF
    • Access Copy - 72 dpi JPG

Large format items (documents or images)

The Vietnam Center and Archive recently purchase a CopiBook ONYX large format scanner, capable of scanning 17"x24" at 400dpi. The scanner also features a self balancing book cradle and can output a variety of formats.

Audio and Moving Images

The holdings of the Vietnam Archive contain a wide variety of audio/visual formats. These materials are digitized by a full time staff member, who also creates the metadata record, including an abstract of the item.

  • Equipment
    • Dell desktop and workstation computers w/ Microsoft Windows OS
    • Otari Digital Archive System (DAS) for audio [Otari DAS website]
    • DV8 Sniper HD Telecine and Sniper 16 HD Telecine for moving images [available from MovieStuff]
    • Elmo 8mm, Super 8mm, and 16mm projectors with S-Video output for moving images
    • Sony DV player for DV and miniDV
    • Various DVD, VCR, SVCR, and Betacam machines
    • Adobe Premier
    • Adobe Audition
    • Pinnacle Studio
  • Specifications
    • Master Copy, Moving Image - AVI
    • Master Copy, Audio - WAV
    • Access Copy, Moving Image - WMV, MP4, OGV (Theora), WEBM
    • Access Copy, Audio - MP3

Oral Histories

Oral History interviews are currently recorded on flash media, although early interviews were recorded on cassette or mini-disc. Interview are usually conducted by phone by Vietnam Archive faculty, who also create the metadata records. Student employees transcribe each interview, which is then reviewed by the interviewee. Another student conducts a final round of edits, and then the faculty member who conducted the interview reviews the transcript a last time before it is made available to the public. Over 800 interviews are available online, most including both the audio and a full transcript in PDF format.

  • Equipment
    • Dell Desktop w/ Microsoft Windows OS
    • Marantz recorders
    • Adobe Audition
  • Specifications
    • Master Copy, Audio - WAV
    • Access Copy, Audio - MP3
    • Master Copy, Transcript - MS Word
    • Access Copy, Transcript - PDF

Artifacts

The Vietnam Archive houses a collection of over 4000 artifacts. Artifacts are digitized, and metadata records are create by, student staff members.

  • Equipment
    • Nikon D80 Digital Camera
    • Fujitsu fi-4220c flatbed scanner
    • Adobe Photoshop Elements
  • Specifications
    • 300 DPI JPG

Microfilm

The Vietnam Archive has digitized over 5 million pages of microfilm, but due to the processing time involved, only a small portion have been made available online. Metadata record creation is performed by student staff members.

  • Equipment
    • NextScan Eclipse 300 Microfilm Scanner (300 ppm) with Microsoft Windows XP
  • Specifications
    • Master Copies - 300dpi Tiff of each page
    • Access Copies - PDF

Maps, Posters, and Other Oversized Items

Over 1000 maps and other large items have been digitized and made available online. Only non-fragile items are digitized. Digitization and metadata record created is performed by full time staff members.

  • Equipment
    • HP DesignJet 815mfp with Windows OS
    • Adobe Photoshop
    • Adobe Acrobat
  • Specifications
    • Master Copies - 300 DPI Tiff or PDF
    • Access Copy - PDF and JPG

Servers and Storage

The Vietnam Center and Archive maintains a number of servers, storage devices, and backup systems. Three primary servers are in use - one for website and file access, and two running redundant copies of the database. Record creation is conducted on one server, and new records are transferred to the public access server nightly. Redundancy of the servers allows for seamless transition of users to alternate servers incase of unavailability or failure of a server, ensuring near continuous access to the Virtual Vietnam Archive. 60TB of near-line storage is also available through a Server Area Network (SAN). All materials are backed up on magnetic tape using Dell Robotic Backup Libraries. More about our backup system can be found in the Digital Preservation section below.

  • Equipment
    • Dell PowerEdge Servers
    • Dell EMC CX300 and Dell EMC CX4-120 Storage Arrays
    • PowerVault 136T and ML6020 tape libraries

Metadata

The Vietnam Center and Archive includes an extensive amount of information and each item in the database records for that item. To develop our metadata list, we started with the Dublin Core Metadata Element Set. We then added our own metadata fields customized to the types of materials and the subject matter covered. The following list is the primary metadata fields we collection. Note that not all fields are used for every media type. Additional metadata about some items and material types may be collected and stored in databases that are not accessible to the public. Additionally, the files themselves for some items may include embedded metadata.

The Virtual Vietnam Archive index currently contains over 20 million searchable terms.

  • Item Title
  • Author or Creator
  • Collection Name (source)
  • Veterans Association
  • Media Format (Document, Photograph, etc), and specific format (8mm, 16mm, etc)
  • Unique Item Number (identifier)
  • Collection Number (internally assigned)
  • Unique donor ID (internally assigned)
  • Box, Folder, and item within the folder numbers
  • Is item copyrighted?
  • Creation Date
  • Date span covered with the item (coverage)
  • Language(s) of the item
  • Title translation, if title is in non-English language)
  • DPI of digital file
  • Software used in digital file creation
  • Full text of item
  • Description, caption, or abstract
  • Descriptive subject keywords
  • Is this item a duplicate of another item in the collection?
  • Is this item related to other items in the collection (relation)
  • Donation credit line, or contributor
  • Copyright statement (rights or publisher)
  • Number of pages
  • AV length
  • Condition of physical item
  • Linear Feet of Collection
  • Location or call number of physical item
  • If item is not going to be available online to researchers, reason why
  • If personal information has been removed from digital copy
  • Fields specific to Maps:
    • Country
    • Series
    • Scale
    • Edition
    • Contour Interval
    • Geographic Features
    • Latitude/Longitude
    • Military Grid Zones
  • Fields specific to Finding Aids:
    • Linear Feet of Collection
    • Scope and Content
    • Biographical Note or Administrative History
    • Access Level
    • Collection Inventory
    • Accession Numbers
    • Digitized Materials?
    • EAD Version Available?
  • Fields specific to Oral Histories:
    • Interviewer
    • Date transcription completed
    • Transcription software
  • Military information about the donor or interviewee, such as military branch, rank, unit, awards, etc)
  • Digital File Hashes
  • Record creator
  • Record updator
  • History of record updates

Digital Preservation

The Vietnam Center and Archive has devoted extensive resources to developing a comprehensive digital preservation and disaster recovery plan, consisting of numerous elements.

Disaster Recovery

The first stage of disaster recovery is redundant backup servers. Copies of all access versions of digital files, as well as of the database itself, are maintained on two or more servers. In the event of a hardware failure, the redundant server will take over the research access load.

The next stage of protection is backups on magnetic tape. Nightly backups are run for all new and changed files, along with weekly backups of all data. One complete set of backups is stored offsite at all times.

Many digital files are also burned to gold-based CDs or DVDs and stored in our climate controlled stacks.

Digital Preservation

Although we are confident that the file formats we have chosen to use for master copies will remain available standards for many years to come, the Vietnam Center and Archive is committed to migrating to newer formats as necessary.

Future Goals

The Vietnam Center and Archive is currently exploring the possibility of utilizing cloud storage for both digital preservation and disaster recovery. Ideally, all digital files on our servers, including website files, access and master copies of digital materials, database files (including the installation programs for the database), and disk images of all servers would be stored in a data storage location outside of Texas, ensuring that if something catastrophic occurred to our physical location, the Virtual Archive could continue to exist on the internet.

Physical Materials

All physical materials remain open and accessible to researchers in our reading room. The primary focus of our digitization effort is to provide access to materials, not as a preservation method. The digital copies will, however, reduce the need for handling or accessing the physical originals, helping extend their life span.