Texas Tech University

How Materials are Made Available in the Virtual Vietnam Archive

The Virtual Vietnam Archive project began in 2001 as an effort to digitize the complete holdings of the Vietnam Archive. The staff of the Center and Archive have extensive experience with archives, digitization, and digital projects. Using this experience, we have established a set of guidelines and standards that best fit our project and our goals. We are constantly reviewing our procedures, as well as industry standards, and adjust our practices as necessary.

Much of the digitization and metadata creation is done by students of Texas Tech University. These students come from a wide variety of backgrounds, but all have an interest in history, and many have a connection to the Vietnam War. Each student undergoes an extensive training program, and their work is monitored as part of the quality control process. In most cases, the same student will be responsible for a digitizing the all of the documents from a single collection. We also employee students who will focus exclusively on photographs or slides, or other special projects. Many of the students will continue working until they graduate from the university. Since the beginning of the Virtual Archive project, The Vietnam Center and Sam Johnson Vietnam Archive has employed over 100 students, a mixture of both undergrad and graduate.

The Virtual Vietnam Archive is an continually evolving project. New materials are added daily, and we are always striving to make our resources more available and easier to access for researchers. If you have any comments or suggestions about the project, please feel free to use our online contact form, or call at 806-742-9010.

Digitization Workflow

The Vietnam Center and Sam Johnson Vietnam Archive has an extensive collection of hardware available for digitization, and all work is conducted in-house by either students or full time faculty or staff. This section describes the digitization workflow for our most common types of materials. The metadata we collect for each item is described in the next section.

The Database

The Virtual Vietnam Archive is powered by the Cuadra Star database system. This powerful database program is very flexible and customizable, and has allowed us to customize the Virtual Archive to fit the needs of our collections and our researchers, and to provide a variety of ways to access the digital materials. All items in a collection, copyrighted and non-copyrighted, are digitized, but only non-copyrighted items are available online to researchers. Personal information, such as addresses, phone numbers, and Social Security Numbers, are removed from all digital copies.

Document/Manuscript Digitization

The majority of the materials in the Virtual Vietnam Archive are printed documents. These materials are digitized, and metadata records are created, by our student staff. Digital files and database records are quality controlled by a full time staff member before they are made available to researchers. Optical Character Recognition (OCR) is run for each document. This OCR text is added to the metadata record, and is also embedded into the PDF file to allow finding of words or phrases within the file itself.

Still Images (photographs, slides, and negatives)

The Virtual Vietnam Archive contains over 100,000 still images. These materials are digitized, and metadata records are created, by our student staff. Digital files and database records are quality controlled by a full time staff member before they are made available to researchers.

Large format items (documents or images)

The Vietnam Center and Sam Johnson Vietnam Archive recently purchase a CopiBook ONYX large format scanner, capable of scanning 17"x24" at 400dpi. The scanner also features a self balancing book cradle and can output a variety of formats.

Audio and Moving Images

The holdings of the Vietnam Archive contain a wide variety of audio/visual formats. These materials are digitized by a full time staff member, who also creates the metadata record, including an abstract of the item.

Oral Histories

Oral History interviews are currently recorded on flash media, although early interviews were recorded on cassette or mini-disc. Interview are usually conducted by phone by Vietnam Archive faculty, who also create the metadata records. Student employees transcribe each interview, which is then reviewed by the interviewee. Another student conducts a final round of edits, and then the faculty member who conducted the interview reviews the transcript a last time before it is made available to the public. Over 800 interviews are available online, most including both the audio and a full transcript in PDF format.

Artifacts

The Vietnam Archive houses a collection of over 4000 artifacts. Artifacts are digitized, and metadata records are create by, student staff members.

Microfilm

The Vietnam Archive has digitized over 5 million pages of microfilm, but due to the processing time involved, only a small portion have been made available online. Metadata record creation is performed by student staff members.

Maps, Posters, and Other Oversized Items

Over 1000 maps and other large items have been digitized and made available online. Only non-fragile items are digitized. Digitization and metadata record created is performed by full time staff members.

Servers and Storage

The Vietnam Center and Sam Johnson Vietnam Archive maintains a number of servers, storage devices, and backup systems. Three primary servers are in use - one for website and file access, and two running redundant copies of the database. Record creation is conducted on one server, and new records are transferred to the public access server nightly. Redundancy of the servers allows for seamless transition of users to alternate servers incase of unavailability or failure of a server, ensuring near continuous access to the Virtual Vietnam Archive. 60TB of near-line storage is also available through a Server Area Network (SAN). All materials are backed up on magnetic tape using Dell Robotic Backup Libraries. More about our backup system can be found in the Digital Preservation section below.

Metadata

The Vietnam Center and Sam Johnson Vietnam Archive includes an extensive amount of information and each item in the database records for that item. To develop our metadata list, we started with the Dublin Core Metadata Element Set. We then added our own metadata fields customized to the types of materials and the subject matter covered. The following list is the primary metadata fields we collection. Note that not all fields are used for every media type. Additional metadata about some items and material types may be collected and stored in databases that are not accessible to the public. Additionally, the files themselves for some items may include embedded metadata.

The Virtual Vietnam Archive index currently contains over 20 million searchable terms.

Digital Preservation

The Vietnam Center and Sam Johnson Vietnam Archive has devoted extensive resources to developing a comprehensive digital preservation and disaster recovery plan, consisting of numerous elements.

Disaster Recovery

The first stage of disaster recovery is redundant backup servers. Copies of all access versions of digital files, as well as of the database itself, are maintained on two or more servers. In the event of a hardware failure, the redundant server will take over the research access load.

The next stage of protection is backups on magnetic tape. Nightly backups are run for all new and changed files, along with weekly backups of all data. One complete set of backups is stored offsite at all times.

Many digital files are also burned to gold-based CDs or DVDs and stored in our climate controlled stacks.

Digital Preservation

Although we are confident that the file formats we have chosen to use for master copies will remain available standards for many years to come, The Vietnam Center and Sam Johnson Vietnam Archive is committed to migrating to newer formats as necessary.

Future Goals

The Vietnam Center and Sam Johnson Vietnam Archive is currently exploring the possibility of utilizing cloud storage for both digital preservation and disaster recovery. Ideally, all digital files on our servers, including website files, access and master copies of digital materials, database files (including the installation programs for the database), and disk images of all servers would be stored in a data storage location outside of Texas, ensuring that if something catastrophic occurred to our physical location, the Virtual Archive could continue to exist on the internet.

Physical Materials

All physical materials remain open and accessible to researchers in our reading room. The primary focus of our digitization effort is to provide access to materials, not as a preservation method. The digital copies will, however, reduce the need for handling or accessing the physical originals, helping extend their life span.