Cloud Analytics

Storage & Compute Environments Comparison for Duke University

The following matrix provides a comparison of storage and analytic options available to the Duke University community and beyond.* The report was generated by John Little in Duke University Libraries Data and Visualization Services to advise researchers searching for "Cloud-based" data platforms supporting research at Duke. 

The table presents two main compute/storage categories:

  • services available via Duke University entities (e.g., Duke University Libraries, Duke OIT, College of Arts & Sciences)

  • services provided by third-party providers (e.g., IBM, Amazon, Google)

The list of providers is not meant to be comprehensive but rather an at-a-glance comparison of various "ecosystems" for collaborative computing. The guide focuses on the use of analysis tools and/or operating over large datasets. 

The major assumptions are:

  • the amount of data are potentially too large to easily push across the network
  • the need to collaborate across platforms render flash drives and stand-alone workstation hard drives as sub-optimal for the analysis
  • data-portability, data-synchronization and multi-author/collaborations are highly desired

See the bottom of this page for further explanation of column headers, or view the full spreadsheet.

Column Header Descriptions

Service Provider: A non-comprehensive list of companies or entities who make storage, compute or Cloud ecosystems available. Most services will require an exchange of fees for service. However, local Duke providers may make special arrangements (which are not part of this comparison). 

Service Description: A basic categorization of "storage," "compute" or "ecosystem" is listed and provides a representation of types of services available. For example, a workgroup may have a need solely to collaborate with small files (e.g., to share a large spreadsheet or documentation files).  In such cases, it may be sufficient to use synchronization services such as Dropbox or Box.com.  In this use case, "compute" architecture is provided by the local workstation using common office productivity applications. However, if the data are larger or need to be accessed at the "compute" source, then a user may need a Cloud ecosystem or a larger high-performance computing environment with less emphasis on synchronizing files across multiple devices.

Free Storage: At the time of data-gathering, these numbers accurately reflect the storage being "given away" by third-party providers. This information changes rapidly and therefore the table may not be timely enough for a final decision. However, the table does represent that fact that many third-party providers are willing to give a sizable amount of data storage away for free.  

Analytics Tools: In the case of Cloud-computing, some compute environments are free. An increase in compute sophistication may require an increase in costs, but such fees are not mandatory and may depend on previous agreements. For example, Duke University is a member of the IBM Academic Initiative, and members of the Duke University community can acquire highly-sophisticated analysis tools free of charge and based on a predefined agreement of acceptable use.

* Comparisons include the storage-and-compute environments available via OIT but also increasingly through third-party storage and computer environments such as Amazon Web Service (AWS) and Amazon's Elastic Computing (EC2).  While there are several large ecosystem Cloud platforms available, as noted, this is not a comprehensive comparison. For example, the table makes reference to the Amazon and Google products but does not attempt to include all (or many other) service providers.