Skip to main content

These policies describe the types of formats preserved by Duke University Libraries to provide recommendations on the best formats to use for our campus and library communities. To learn more about digital preservation at DUL, please view our Digital Preservation Guide and the Digital Preservation Policy.

Preservation Levels

File formats are categorized into three preservation levels: Level One, Level Two, and Level Three.

Level One

We are reasonably confident that these Level One formats can be preserved for long-term use. University offices and depositors should use these formats when transferring or depositing information into DUL systems, as these formats have the best chance for future use.

Level Two

Long-term preservation of Level Two formats may require additional resources from DUL. It is less certain, though still likely, that these formats will be able to be preserved over time. University offices and depositors may transfer or deposit materials into DUL systems in these formats if needed, particularly if there is not a feasible way to migrate material to a Level One format prior to transfer.

Level Three

These formats cannot be permanently preserved without significant additional resources from DUL and thus are not recommended for long-term preservation. Many of these formats also require proprietary software to render and preserve materials.  For these preservation types, DUL can ensure bit-level preservation, but we cannot guarantee future use.

How Did We Determine Preservation Levels for a Format?

We considered many factors to determine best preservation level for a given format include, which included:

  • Whether the format employs lossless/lossy compression  
  • Open source rather than proprietary software used for creation or access
  • How widespread the format is
  • Whether a format allows for embedded files (e.g., a spreadsheet in a PowerPoint presentation that depends on Excel to open; a PDF containing a Quicktime movie)

 

File Formats Recommendations

Documents

  Level One Level Two  Level Three
Word Processing
  • PDF/A (.pdf)
  • EPUB (.epub)
  • Open Office (.sxw; .odt)
  • PDF (.pdf)
  • Rich Text Format (.rtf)
  • Microsoft Word (.docx)
  • Microsoft Word (.doc)
  • Google Docs (.gdoc)
Text
  • Plain Text (.txt)
   
Structured Text
  • XML (.xml)
  • HTML (.html)
  • Cascading Style Sheets (.css)
  • DTD (.dtd)
  • LaTex (.tex)
  • Tex (.tex)
  • Markdown (.md)
 
Presentation
  • PDF (.pdf)
  • PowerPoint (.pptx)
  • OpenOffice (.sxi/.odp)
 

 

Structured Data

  Level One Level Two Level Three
Tabular Data
  • Comma Separated Values (.csv)
  • Delimited Text (.txt)
  • Microsoft Excel  (.xlsx)
  • OpenOffice (.sxc; .ods)
  • Microsoft Excel (.xls)
Databases
  • SQL DDL (.sql)
  • Sqlite version 3 (.sqlite; various)

  • DBF (.dbf)
 
Statistical Data
  • Comma Separated Values (.csv)
  • Delimited Text (.txt)
  • Delimited text with command file for statistical software
  • R (.rdata)
  • SPSS (.por, .sav)
  • SAS (.sas7bcat)
  • Stata (.dta)
  • MatLab (.mat)
Geospatial
  • Geographic Markup Language (.gml)
  • GeoTIFF (.tiff)
  • GeoPackage Encoding Standard (OGC) Family (.gpkg)
  • ESRI Shapefiles (.shp; .shx; .dbf; various)
  • GeoJSON (.geojson)
  • Keyhole Markup Language (.kml, .kmz)
  • LiDAR (.las, .laz)
  • AutoCAD Drawing Interchange Format (.dxf)
  • ESRI/ArcGIS Geodatabase (.gdb)
  • ESRI Interchange File Format (.eoo)
  • CAD data (.dwg)
Other
  • NetCDF (various)
  • HDF (various)
  • JSON (.json)
  • CDF (various)

 

Audio-Visual Materials

  Level One Level Two  Level Three
Image
  • TIFF (.tiff; .tif)
  • PNG (.png)
  • Scalable Vector Graphics (.svg)
  • JPG (.jpeg; .jpg; .jfif; .pjpeg; .pjp)
  • Bitmap or BMP (.bmp)
  • GIF (.gif)
  • Google WebP (.webp)
  • JPEG 2000 (.jp2)
  • Encapsulated Postscript (.eps; .epsf; .ps)
  • GIF(.gif)
  • Macromedia Flash (.swf)
  • Photoshop (.psd; .psb; .acv; .atf)
  • RAW (various)
Audio
  • WAVE (.wav)
  • Broadcast WAVE (.bwf)
  • AIFF (.aif; .aiff)
  • MPEG Audio Layer III (.mp3)
  • Advance Audio Coding (.mp4; .m4a; .aac)
  • Windows Media Audio (.wma)
Video
  • FFV1
  • Matroska Multimedia Container (.mkv)
  • AVI (Audio Video Interleaved) (.avi)
  • Digital Moving Picture Exchange (.dpx)
  • QuickTime Movie (.mov)
  • Apple ProRes (.mov)
  • MPEG-2 (.mpg; .mpeg)
  • MPEG-4 (.mp4)
  • Windows Media Video (.wmv)
  • High Efficiency Video Coding (.hevc)

 

Archive File Formats

  Level One Level Two Level Three
Email
  • MBox (.mbox)
  • Internet Message Format (.eml)
  • Personal Storage Table (.pst)
  • OLM (.olm) 
  • Microsoft Outlook Item (.msg)
  • PDF (.pdf)
Archive
  • ZIP (.zip)
  • Tape Archive (.tar)
  • CPIO (.cpio)
  • gzip (.gz)
  • 7z (.7z)
  • bzip2 (.bz2)
  • RAR (.rar)

 

3D

  Level One Level Two Level Three
Embedded Texture
  • Extensible 3D (.x3d)
  • glTF (.gltf; .glb)
  • Universal 3D (.u3d)
  • Filmbox (.fbx)
  • Universal Scene Description (USD) (.usd; .usda; .usdc; .usdz)
No-Embedded Texture  
  • Stereo Lithography (.stl)
  • Reflectance Transformation Imaging (.rti)
  • Polygon File Format (.ply)
  • Wavefront (.obj)
  • COLLADA Digital Asset Exchange (.dae)
  • Blender 3D (.blend)
  • 3D Studio (.3ds)

 

Software/Code*

Level One Level Two Level Three
 
  • Computer Program Source Code (Various)
  • Compiled or Executable Files (various)

 

 

*Preserving software is a complex topic as there can be many dependencies that impact whether the original software or code files may be rendered or generate the correct environment. A full analysis of consideration for software preservation is beyond the scope of these guidelines. For more information on software preservation, visit Software Heritage.

Preservation vs. Use: The information in the tables above is based upon recommendations for the long-term access and use of files over time. When considering current use, in some cases, it may be more appropriate to use a proprietary file format that supports analytic functionalities or tools commonly used by a community. You may then want to consider submitting both an active use and preservation copy to the repository.

 

About These Policies

The Libraries work with faculty and staff throughout the University to identify and permanently preserve records, data, documents, and other information assets. This document was created by the Digital Preservation Working Group (DPWG) to identify the types of formats that are preserved by Duke University Libraries and provide format recommendations for preservation for both the campus and library communities.  

To prepare this document, DPWG members reviewed best practices regarding preservation formats at a number of institutions, including the National Archives and Records Administration, Library of Congress, and fellow academic institutions. For these preservation formats, we can ensure bit-level preservation, but we cannot guarantee future use. After reviewing this list and considering the specific requirements of the DUL systems, the DPWG categorized the file formats into three preservation levels described above.

To learn more about digital preservation at DUL, please view our Digital Preservation Guide and the Digital Preservation Policy.

What is the Scope of These Policies?

These policies apply to the following DUL systems:

  • Duke Digital Repository
  • Duke Research Data Repository
  • DukeSpace

Product and functional managers of these spaces can modify these recommendations to best suit the needs of their systems and user communities.

What is Our Plan for Sustaining These Policies?

The Digital Preservation Working Group (DPWG) will review these policies every 2 years. Changes will then be submitted to Digital Preservation and Publication Program (DP3) for approval. If you have recommendations for this document, please email your suggestions to digital-preservation@duke.edu.

For information about preservation formats within specific applications, contact Duke Digital Repository (repositoryhelp@duke.edu), the Duke Research Data Repository (datamanagement@duke.edu), or DukeSpace (lib-dspace-admin@duke.edu).