RT Guide: Data Documentation
What’s New
May 14, 2021
You can now be notified when new or updated documentation is added to the data lake
April 28, 2021
Markdown documentation is published to the Data Provider's GitLab wiki
Raw Data Documentation in the Data Lake
This document summarizes the procedures for uploading metadata packages, data issue logs, and metadata packages to the SDC. Data providers should follow these guidelines when sharing these documents with the SDC platform. This guidance is organized into guidance for data providers and guidance for data analysts using the documentation within the SDC.
Data analysts can find data documentation and metadata packages uploaded by data providers in the data lake (one of the following).
s3://prod-dot-sdc-raw-submissions-0123456789-us-east-1/<data-provider>/documentation
s3://prod.sdc.dot.gov.data-lake.standardized-data/<data-provider>/documentation
These files will be stored under the “documentation
” prefix within the specific data provider prefix to which the documentation files apply. For example, the Acme Data Provider documentation will be stored in the ...standardized-data...
bucket, under the prefix acme-data-provider/documentation
.
The documentation and metadata will be stored in a folder structure indicating the year, month, and day on which the the documentation was uploaded. So, researchers can find the most recent versions of documentation, in the most recent date folder.
Raw Data Documentation in GitLab
From your SDC workstation, navigate to https://gitlab.prod.sdc.dot.gov/Data-Providers.
From this group you will see a list of all the data providers. Each of these GitLab projects holds information about the data provider and their project, as well as the raw data documentation. The raw data documentation is held in the project’s wiki. e.g., https://gitlab.prod.sdc.dot.gov/Data-Providers/oceanic-data-provider/-/wikis/home