The increasing volume of data collected, especially with the development of new collection methods (especially nodal networks and DAS – Distributed Acoustic Sensing), brings out the limits of the current capacities of perennial storage, transport and processing. Data centers are called upon not only to host this data, but also to provide the resources necessary for its exploitation. These needs require changes in data management practices, increased computing resources, adapted formats and “lighter” derivative products in terms of volume.
In the spring of 2020, Iris, Résif and Geofon conducted a survey of their user communities to identify their needs and seek solutions together, taking into account their environmental impact. They have just published the results of this survey in the journal Seismological Research Letters.
The 37 respondents to the survey anticipate their needs in the next 3 to 5 years. Eleven of them envisage volumes of 10 to 50 TB and five of them envisage volumes of more than 50 TB. The responses show that experiments using DAS generate the largest volumes, which are incompatible with the current means of academic data centers. However, the volumes of data that researchers wish to work on from traditional seismic stations are also constantly increasing, especially for studies based on cross-correlation or “machine learning” techniques.
Thanks to the collaboration within the international scientific community, coordinated by the FDSN, standards exist both for data formats and for the associated services and metadata in a FAIR approach. However, with the evolution of data volumes, current standards are becoming obsolete and problems are emerging related to the integration, archiving, distribution and use of data, as well as metadata development.
The article inventories existing data formats, evaluating them according to storage, transport and access criteria, in order to identify those best suited to large volumes of data and to meet the needs expressed by survey respondents. It also provides an overview of new issues for data centers: storage, transport and access services. Many aspects of data center operations need to be rethought to meet these new challenges. The article also looks at metadata issues. The standardized format (StationXML) does not allow the description of relevant information for experiments conducted on DAS.
The article gives leads based on a broad international cooperation around DAS data and a broadening of the reflection to other data centers in order to develop new standards and services adapted to the thousands of terabytes of data to come
To know more
- Reference : Javier Quinteros, Jerry A. Carter, Jonathan Schaeffer, Chad Trabant, Helle A. Pedersen ; Exploring Approaches for Large Data in Seismology: User and Data Repository Perspectives. Seismological Research Letters 2021 ; doi: https://doi.org/10.1785/0220200390
- Consult the survey questionnaire
- This article is available in free access via BibCNRS (Inist) for the French scientific community : https://doi-org.insu.bib.cnrs.fr/10.1785/0220200390
Administrateur système pendant une maintenance dans une salle serveur © Cyril Fresillon / Loria / CNRS Photothèque