RT Guide: Data Documentation

What’s New

  • May 14, 2021

    • You can now be notified when new or updated documentation is added to the data lake

  • April 28, 2021

    • Markdown documentation is published to the Data Provider's GitLab wiki

Raw Data Documentation in the Data Lake

This document summarizes the procedures for uploading metadata packages, data issue logs, and metadata packages to the SDC. Data providers should follow these guidelines when sharing these documents with the SDC platform. This guidance is organized into guidance for data providers and guidance for data analysts using the documentation within the SDC.

Data analysts can find data documentation and metadata packages uploaded by data providers in the data lake (one of the following).

s3://prod-dot-sdc-raw-submissions-0123456789-us-east-1/<data-provider>/documentation s3://prod.sdc.dot.gov.data-lake.standardized-data/<data-provider>/documentation

These files will be stored under the “documentation” prefix within the specific data provider prefix to which the documentation files apply. For example, the Acme Data Provider documentation will be stored in the ...standardized-data... bucket, under the prefix acme-data-provider/documentation.

The documentation and metadata will be stored in a folder structure indicating the year, month, and day on which the the documentation was uploaded. So, researchers can find the most recent versions of documentation, in the most recent date folder.

Raw Data Documentation in GitLab

From your SDC workstation, navigate to https://gitlab.prod.sdc.dot.gov/Data-Providers.

From this group you will see a list of all the data providers. Each of these GitLab projects holds information about the data provider and their project, as well as the raw data documentation. The raw data documentation is held in the project’s wiki. e.g., https://gitlab.prod.sdc.dot.gov/Data-Providers/oceanic-data-provider/-/wikis/home