These policies describe the types of formats preserved by Duke University Libraries. Use them to provide recommendations on the best formats to use for our campus and library communities.
File formats are categorized into three preservation levels: recommended, acceptable, and not recommended.
We are reasonably confident that recommended formats can be preserved for long-term use. University offices and depositors should use these formats when transferring or depositing information into DUL systems, as these formats have the best chance for future use. We list these in alphabetical order, as a file's intended use likely governs the best format.
Long-term preservation of acceptable formats may require additional resources from DUL. It is less certain, though still likely, that these formats will be able to be preserved over time. University offices and depositors may transfer or deposit materials into DUL systems in tehse formats if needed, particularly if there is not a feasible way to migrate material to a Recommended format prior to transfer.
These formats cannot be permanently preserved without significant additional resources from DUL, and thus are not recommended. Many of these formats also require proprietary software to render and preserve material. University offices and depositors should not transfer/deposit materials into DUL systems in these formats if at all possible. For these preservation types, we can ensure bit-level preservation, but we cannot guarantee future use.
|Format Type||Recommended||Acceptable||Not Recommended|
|Text||ANSI X3.4/ECMA-6/ US-ASCII (7-bit); EPUB (unencrypted) (*.epub); Open Office (*.sxw/*.odt); PDF/UA; PDF/A-1; PDF/A-1a; PDF/A-1b; PDF/A-2; PDF/A; Plain text: UTF-8, UTF-16 with BOM; SGML; XML and XML-based markup formats||PDF; Postscript; Rich Text Format 1.x (*.rtf); SGML (*.sgml); HTML; Cascading Style Sheets (*.css); DTD (*.dtd); LaTeX; TeX; OOXML (ISO/IEC DIS 29500) (*.docx)||PDF with embedded fonts or encryption; PDF/A-3; DVI (*.dvi); Microsoft Word (*.doc); WordPerfect (*.wpd); Google Docs|
|Image||JPEG; PNG; SVG; TIFF||Bitmap or BMP (*.bmp); Computer Graphic Metafile (*.cgm); Digital Negative DNG (*.dng); GIF (*.gif); Google WebP (*.webp); JPEG 2000; JPEG/JFIF (*.jpg); JPEG2000 (lossy) (*.jp2)||Encapsulated Postscript (*.eps, *.epsf, *.ps); FlashPix (*.fpx); GIF; JPEG 2000 Part 2 (*.jpf, *.jpx); Macromedia Flash; MrSID (*.sid); Photo CD (*.pcd); Photoshop (*.psd, *.psb [Large doc format], *.acv, *.atf [Photoshop Curve File]); RAW; TIFF (Planar Format)|
|Audio||AES3 (*.aes); AIFF (*.aif, *.aiff); Broadcast WAV (BWF); WAV||Advance Audio Coding (*.mp4, *.m4a, *.aac); Free Lossless Audio Codec (*.flac); MP3 (MPEG-1/2, Layer 3) (*.mp3); Ogg Vorbis (*.ogg); Standard MIDI (*.mid, *.midi); SUN Audio/Basic||AIFC (compressed) (*.aifc); NeXT SND (*.snd); Protected AAC (*.m4p); RealNetworks 'Real Audio' (*.ra, *.rm, *.ram); WAV; Windows Media Audio (*.wma)|
|Video||AVI (Audio Video Interleaved) (*.avi); Motion JPEG 2000 (ISO/IEC 15444-4); Motion JPEG 2000 (*.mj2); QuickTime Movie (*.mov); MPEG-4; FFV1||DV; MPEG-1; MPEG-2; Matroska Multimedia Container (*.mkv)||Ogg Theora (*.ogg); RealNetworks 'Real Video' (*.rv, *.rm); Windows Media Video (*.wmv); Flash Video (*.flv); Macromedia Flash (*.swf); Advanced Authoring Format Object (*.aaf); Material Exchange Format (*mfx); Digital Cinema Initiative Distribution Master (DCDM); Apple ProRes; Cinepak; DivX; Digital Moving Picture Exchange (DPX)|
|Presentation||Dependent upon file contents||PDF-A; Microsoft Powerpoint OOXML (.pptx); Open Office (.sxi and .odp)||Microsoft Powerpoint (.pptx)|
|Structured data||CSV; TXT; SQL DDL||Microsoft Excel (.xlsx); OOXML; Open Office; DBF; JSON; R||Microsoft Excel (.xls); SPSS (.por and .sav); SAS; Stata; MatLab|
|MBOX; Internet Message Format (EML)||Email Account XML Schema||Microsoft Outlook Item (*.msg); Personal Storage Table (*.pst); Outlook for Mac (*.olm)|
|Disk-Image||Expert Witness forensically-packaged disk image; Raw (*.000, *.img, *.dd, *.dmg) disk image||ISO9660||Advanced Forensics Format forensically packaged disk image (*.aff)|
|Web archive format||Mozilla Archive Format (MAFF) (*.mar); Web Archive File (WARC) (*.warc)||--||--|
|Archive file format||ZIP (*.zip); Tape Archive (*.tar)||Unix Archiver (*.a, *.ar); CPIO (*.cpio); gzip (*.gz); lzip (*.lz); lzma (*.lzma); 7z (*.7z)||Shell archive (*.shar); bzip2 (*.bz2); lzop (*.lzo); 7zX (*.s7z); RAR (*.rar); FreeArc (*.arc); Cabinet (*.cab); eXtensible Archive format (*.xar)|
|Other||XD3 (unknown or unassigned file extension)||VRML; U3D; Computer Pgm Source Code (*.c, *.c++, *.java, *.js, *.jsp, *.php, *.pl, etc)||Compiled or Executable Files (EXE, *.class, COM, DLL, BIN, DRV, OVL, SYS, PIF)|
About these policies
The Libraries work with faculty and staff throughout the University to identify and permanently preserve records, data, documents, and other information assets. This document was created by the Digital Preservation Working Group (DPWG) to identify the types of formats that are preserved by Duke University Libraries and provide format recommendations for preservation for both the campus and library communities.
To prepare this document, DPWG members reviewed best practices regarding preservation formats at a number of institutions, including the National Archives and Records Administration, Library of Congress, and fellow academic institutions. For these preservation formats, we can ensure bit-level preservation, but we cannot guarantee future use. After reviewing this list and considering the specific requirements of the DUL systems, the DPWG categorized the file formats into three preservation levels described above.
How did we determine preservation levels for a format?
We considered many factors to determine best preservation level for a given format include, which included:
- Whether the format employs loss/lossy compression
- Open source rather than proprietary software
- How widespread the format is
- Embedded files (e.g., a spreadsheet in a PowerPoint presentation that depends on Excel to open; a PDF containing a Quicktime movie)
What is the scope of these policies?
These policies apply to the following DUL systems:
- Duke Digital Repository
- Research Data Repository
Product and functional managers of these spaces can modify these recommendations to best suit the needs of their systems and user communities.
What is our plan for sustaining these policies?
The Digital Preservation Working Group (DPWG) will review these policies every 2 years. Changes will then be submitted to Digital Preservation and Publication Program (DP3) for approval. If you have recommendations for this document, please email your suggestions to firstname.lastname@example.org.
For information about preservation formats within specific applications, contact Duke Digital Repository (email@example.com), Research Data Repository (firstname.lastname@example.org), or DukeSpace (email@example.com).