The HDF Group's history spans twenty years and intertwines with satellites, the first web browser, a dog's knee, "The Lord of the Rings", and neutron scattering. We're thrilled to share our past, present, and future
In 1987, the Graphics Foundations Task Force (GFTF) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, set out to create an architecture-independent software library and file format to address the need to move scientific data among the many different computing platforms in use at NCSA at that time. Additional goals for the format and library included the ability to store and access large objects efficiently, the ability to store many objects of different types together in one container, the ability to grow the format to accommodate new types of objects and object metadata, and the ability to access the stored data with both C and Fortran programs.
Originally dubbed AEHOO (All Encompassing Hierarchical Object Oriented format), the new software and file format was ultimately called Hierarchical Data Format (HDF), and was developed as an open source product and distributed free of charge under a University of Illinois license. The design of HDF combined ideas from a number of different formats, including TIFF, CGM, FITS, and the Macintosh PICT format .
At the time HDF was being created, NCSA produced a suite of desktop scientific visualization tools for viewing and presenting data in HDF. The popularity of these tools led to wide exposure for HDF, and its use spread throughout the United States and beyond. HDF users quickly realized that it could be used effectively to address many other data management requirements, beyond simply sharing and viewing images. A hospital used HDF for managing large collections of X-ray images. A petroleum company used HDF for fast storage and access to large 3D seismic soundings. A veterinary researcher used HDF to manage the complex data associated with modeling a dog's knee. A European consortium adopted HDF to share software for non-destructive testing in ships, aircraft, nuclear power plants.
In acknowledgement of the increasing reliance on HDF, we applied for and received a "software capitalization" grant from the National Science Foundation in 1990, which provided resources to improve the documentation, user support, testing, and overall of quality of HDF. This small grant was absolutely critical to the future growth of HDF, as it funded the transition of HDF from useful academic software into a product that data producers and users could rely on.
An NSF grant in 1992 enabled the group to add netCDF format support to the HDF software. Much like HDF, netCDF is a file format and suite of software that serves a broad community. Being able to access netCDF and HDF data through the same interface made it possible for applications to work with data in either format.
In the early 1990's, the National Aeronautics and Space Administration (NASA) began planning its data management strategy for the Earth Observing System (EOS), which was to include a system of satellites to gather data for the study of global climate change. After investigating 15 different formats over a two-year period, NASA selected HDF as the standard format for the EOS data and information system, a system that would produce three terabytes of data per day in dozens of data products, and ultimately would collect more than 15 petabytes of information.
Meanwhile NCSA, an early and enthusiastic follower of the nascent World Wide Web, was creating more tools for scientific visualization and collaboration, and in the process created Mosaic, the killer app that spawned today's web browsers. The Mosaic developers didn't forget about HDF, and in those early days Mosaic could view two types of files: html and HDF. Mosaic support led to a number of applications to provide remote viewing of HDF files, such as the DIAL server and OPeNDAP.
In 1996 the HDF group began a successful collaboration with the Department of Energy's (DOE) Advanced Simulation and Computing Program (ASC). The goal of ASC was to increase the computing power of DOE systems at Lawrence Livermore, Los Alamos, and Sandia National Laboratories by several orders of magnitude. This boost in computational power was clearly going to necessitate an associated scale-up in data management capabilities - data files would be very large, data would be written in parallel on massively parallel systems, and the data itself would be more complex than ever.
A Data Modeling and Formats (DMF) group, drawn from the three Labs and NCSA, set about to find a format that would address ASC's needs. HDF had many of the required features, but HDF didn't scale sufficiently well, and could not easily support parallel I/O. Like the GFTF before it, the DMF borrowed ideas and lessons learned from other formats, in particular from HDF and a format from Livermore called AIO (Array I/O), to define a new format for the project. Initially called "Big HDF", the resulting format was finally named "HDF5" because the latest release of the original HDF was HDF Version 4, now called HDF4.
With additional support from NASA, the tri-labs and NCSA jointly developed the first version of HDF5, releasing it in 1998. HDF5 continues to be distinguished as the primary scientific data format and library designed to handle terabyte-sized datasets and parallel file processing, and our successful collaborations with NASA and the DOE continue to the present day.
Meanwhile, NASA was gearing up to launch Terra, its first of three major satellites that would form the core of EOS. In 1999 Terra launched, and HDF4 became a critical part of the EOS Data and Information System (EOSDIS) for gathering, creating, storing, and distributing more than half a petabyte of data products per year. NASA developed a special set of "earth science datatypes" based on HDF, called HDF-EOS, and created an HDF-EOS library, which reads and writes these datatypes in HDF4 or HDF5, hiding specific details of HDF itself from EOS data users.
Another noteworthy milestone in the life of HDF occurred in 2003, when R & D Magazine selected HDF5 as "one of the 100 most technologically significant new products of the year 2002." The award was given jointly to NCSA, NASA, and Lawrence Livermore, Los Alamos, and Sandia National Laboratories.
The mission-critical demands of EOS and ASCI had a profound influence on the nature and scope of the HDF group. Whereas scientific research had driven the initial technological requirements for HDF, these new applications demanded a heightened level of attention to quality and performance. The HDF group evolved itself to meet these needs, paying more and more attention to producing robust, well tested code, with greater emphasis on assuring our users that HDF could meet their performance and quality requirements.
In turn, organizations in government, academia, and industry responded by adopting HDF for an increasing number of applications demanding high performance and quality. By 2004, HDF was being used by at least 600 organizations and countless individuals. The Earth Observing System alone counted 1.6 million users of HDF. The variety of data products stored in HDF was also expanding, with at least 200 different types of applications relying on HDF. Filmmakers used HDF5 to generate special effects for "Lord of the Rings." The Department of Defense Aberdeen Test Center adopted HDF5 as the backend object store for test runs, creating more than 800,000 files in the first five years of operation. Boeing adopted HDF4 for an image mining database, and began using HDF5 for flight test data collection, analysis, and preservation. The European Union selected HDF5 as a binary format for product data, proposing it as an ISO standard. The National Polar Orbiting Environmental Satellite System (NPOESS) adopted HDF5 as the distribution format for a unified weather gathering system that will deliver 6 TB per day of weather and climate data products to the Army, Air Force, Navy, US Weather Service, and myriad other applications worldwide.
Tools have been a key to the success of HDF from the beginning. As HDF gained in popularity, numerous commercial and non-commercial tools were extended to read and write HDF data. One of the first to do this was IDL, a powerful data visualization and analysis platform that was adapted to understand HDF-EOS as well as HDF4 and HDF5. MATLAB, a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation, also added support for reading and writing both HDF4 and HDF5. In recognition of HDF5's capabilities, MATLAB began using an HDF5-based format to save data over 2 gigabytes in size in 2006, and announced it will make the HDF5-based format the default MAT-file format in a future release.
A particularly exciting development began in 2005, when NASA agreed to fund a unification between the netCDF and HDF5 formats. Over the years, netCDF had grown much the same as HDF, with overlapping features in some areas and complementary benefits in others. To be completed in the second half of 2007 with the release of netCDF 4 and HDF5 1.8, this project uses HDF5 as the underlying format for netCDF, and offers the best of both worlds to hundreds of data producers and millions of data users.
From early on, questions were raised about long term preservation of HDF data. If important data, such as the fundamental data used to study climate change, was being stored in HDF, what guarantees were there that the data would be accessible far into the future, when it would be important not just for historical reasons, but for long-term longitudinal studies?
By the year 2004, the HDF group recognized that the best path to ensuring long-term access to HDF data would be to institutionalize the HDF project. The group recognized that its organizational affiliation with NCSA was a significant factor in its success. At the same time, being part of a large academic organization made it difficult to develop the business infrastructure needed to guarantee financial solvency over the long term. The idea was born to spin off from the University a company dedicated to ensuring the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.
With support and encouragement from the University and NCSA, The HDF Group incorporated as a non-profit corporation in the State of Illinois in December 2004. In March 2005, The HDF Groupd began providing HDF support and development services on a part-time basis. The HDF Group officially began full operations independent of the University in July 2006 as non-profit 501(c)(3) company.
Since becoming independent in 2006, The HDF Group has begun to implement its business plan in earnest. EOS, the ASC project, and others remain our anchors, but we are also finding new and exciting directions. We have executed a number of maintenance, consulting, and tutorial agreements, and received new government contracts for performance enhancements, improvements in our Java products, strengthening OPeNDAP support, and investigating ways to ensure library-independent access to EOS data in HDF4. Funds from the private sector have also enabled The HDF Group to add .NET support to HDF5, to add major new features HDF5 1.8, to add and improve a number of HDF command-line tools, and to port HDF4 and HDF5 to new platforms.
The HDF Group is also seeing an acceleration of international interest in HDF, as exemplified by PyTables, a highly successful package developed in Spain for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
We have recently begun to explore new domains in which HDF can make major contributions. A handful of bioinformatics projects have shown that HDF has the capability to solve problems of data complexity and size in areas as diverse as DNA sequencing, electron microscopy, brain mapping, and neutron scattering. The successful combination of HDF5 with databases, as exemplified in the Aberdeen project and others, makes it clear that a suite of technologies to harmonize HDF with database applications would prove extremely effective.
HDF has truly become a worldwide phenomenon, changing the way people manage their data and making new uses of data possible that simply could not be done without the capabilities that HDF provides. Looking to the future, we are excited by new opportunities on the horizon and ready to tackle the data challenges that await our user community. We are confident that HDF's next twenty years will be as remarkable as our first twenty.
- - Last modified: 16 May 2011