Duke Libraries

Ask us now
University Archives

Data Accessioner

About

The Duke DataAccessioner was built out of the need for a simple GUI interface to allow technical services staff an easy way of migrating data off disks and onto a file server for basic preservation, further appraisal, arrangement, & description. It also provides a way to integrate common metadata tools at the time of migration rather than after the fact. With a simplified interface and being written in Java it is intended to be easily adopted by smaller institutions with little or no IT staff support. Existing tools fulfill different parts of the idea but never completed it. (National Library of Australia system is another option that recently came to light and appears to be very good although it has higher infrastructure requirements.) The very first version of the tool was written in the course of a week in early 2008 and, although usable, it was more of a proof-of-concept. For nearly a year there was no active development except to tune the metadata output and fix some bugs. In January 2009 the Data Accessioner was revisited with a revised architecture. Also, the metadata tool adapters and the custom metadata manager where extracted to be used as plugins.

If you have any questions please contact the Electronic Records Archivist.

Accessioner

The accessioner is comprised of four main components: the migrator, metadata manager, graphical interface, and optional tool adapters.

The migrator recursively navigates a file tree and creates a copy of the tree in a given destination with the option of skipping specified files and directories (nothing novel here). The biggest difference is that it creates an MD5 for the file before copying it and then, after the copy creates another on the new copy and compares the two. If there is any difference it creates an error and will notify the user. Modern systems typically do error checking on transfer but this application is intended to be as exact as possible to the point of paranoia. It also sets the last modified date of the copy to that of the original. The migrator also instructs adapter plugins to do their work and sends the resulting metadata to the metadata manager.

The metadata manager is in charge of constructing the metadata structure filling it out and handling the metadata passed by the adapter plugins. Two default metadata managers are built in: default and none. The default is a simple file/folder structure inside a collection/accession root that will take whatever metadata is passed to it. Anyone can also write their own metadata manager plugin to use instead. The one created for Duke RBMSCL/UA is below. The DataAccessioner could also be used as an ingest interface to repository systems such as Fedora via a custom metadata manager.

The graphical interface is a single window. It is in reality not necessary to run the migrator which can be started (although with limited options) via the command line.

Plugins

There are two types of plugins: adapters and metadata managers. Adapters can either wrap existing functionality such as Droid and Jhove (an anti-virus plugin will probably be next) or introduce completely new (Java) code. The existing Jhove and Droid plugins run on each file (the new copy) individually and return their results. Metadata Managers simply override the existing metadata managers to modify the default schema or use a completely different one such as METS. Plugins are used via the Java Plugin Framework (JPF). To use plugins simply place the plugin in a "plugins" directory that resides next to the DataAccessioner jar. New plugins should conform to the JPF specification and require the "core" plugin which defines extension points. Future plugins will likely include a virus checker and a handle assigner. (The handle assigner could be built into the metadata manager as well.) NOTE: JPF plugins do not work with Apple's OS X (although they should). The errors are comming from the internals of JPF and I have not been able to find a fix.

Downloads

Below you can find the Data Accessioner, plugins, and a very short guide for in-house use. The source code is now hosted on Github.

Data Accessioner

Version 0.3.0 (Stable): last updated 2011-09-02

Description/Link Size (KB) MD5
Jar, Plugins, & Guide 5,040   32bf15d4f1883387c46e617c4f2e34b8 
Jar 941   4afae54b2b1251db6622d430ba98cd05 
Guide for Duke RBMSCL/UA Technical Services staff  167   fa79106a5173fea08dbafd9eca3234c0

Version 0.4.0 (Development):

Description/Link Size (KB) MD5
Jar, Plugins, & Guide 5,980   68de157761d4a3b00c9ee6238b2cae5a 
Jar 2,338   d780d06678b5f8fc26c9899ed7ff98dc 
Source 127   2172df2e55573d3d820ce32827ba041b 

Plugins

Description/Link Size (KB) MD5
Core (required for all other plugins) 3   39d45cec6111a51f42915852e458791c 
Jhove Adapter (see license statement within zip)  864   be6d5a3b9eea02c9f9466d8d7a466023 
Droid Adapter (see license statement within zip)  2,910   838d03c1ccb84dd29b1cf1c0e76d38fd 
Duke PREMIS Metadata Manager (for version 0.3.0) 312   88639da629d17cc548374993aa70c052 
Duke PREMIS Metadata Manager (for version 0.4.0) 7   18837f4f7c53c3936b7fd560a4f7363a 

License

Copyright © 2009 by Duke University. All rights reserved. 

Permission to copy, use, and modify this software and accompanying documentation for only non-commercial educational and research purposes is hereby granted without fee and without a signed licensing agreement, provided that the above copyright notice, this paragraph and the following two paragraphs appear in all copies including derivatives of the software. The recipient is free to make upgraded or improved versions of the software, provided that they are made readily available to others on these same terms without fee or any other charge. The recipient shall be responsible for compliance with these obligations, which the recipient accepts as fair consideration for the software provided.  Contact the copyright holder, Duke University (Henry Berger - 919-684-3311) for commercial licensing opportunities. 

IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OF ANY KIND WHATSOEVER, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF HE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 

THE COPYRIGHT HOLDER SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION IS PROVIDED "AS IS". THE COPYRIGHT HOLDER HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATION.  

 

follow us on Twitter follow us on Facebook follow us on YouTube follow us on Flickr follow us on Pinterest follow our blogs and feeds

Contact Us919-660-5870
(Perkins Circulation Desk)

Home | Libraries | Ask Us Now | Catalog | Hours | Library Web Site Search | Site Index

Mobile Library Home (content for handheld devices such as cellphones)

Creative Commons License

Unless otherwise specified on this page, this work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.


Last modified October 24, 2011 10:30:43 AM EDT