Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

Instructions

This document summarizes the procedures for uploading metadata packages, data issue logs, and metadata packages to the SDC. Data providers should follow these guidelines when sharing these documents with the SDC platform.

Data providers should utilize their normal data upload steps to upload data documentation and metadata to the SDC. For specific documentation on how to upload data to the SDC as a data provider, see the SDC data provider user guide.

Data providers should upload documentation using a specific “prefix” folder in their normal ingest bucket. The name of this prefix should be “documentation.” As a reminder, data providers can use the following upload command to upload documentation to the “documentation” folder:

aws --profile sdc s3 cp <local files or folders to upload> s3://<Data Provider's Ingest Bucket>/documentation/
  • The term <local files or folders to upload> in the above command should indicate the files or folders that should be uploaded to the provider’s normal ingest bucket.

  • The term <Data Provider's Ingest Bucket> in the above command should be replaced with the data provider’s specific ingest bucket, for example prod-dot-sdc-cvp-nyc-ingest... .

Remember to use the credential generation script when uploading to your specific ingest bucket and the “sdc” profile to reference these temporary credentials. For other troubleshooting issues, please refer to the data provider user guide or contact the SDC Service Desk via the Secure Data Commons Help Center.

Documentation Template

Markdown is the preferred format for data documentation. Other file types will be supported, but only by upload and links in the GitLab data documentation wiki.

The following template should serve as a starting point and be adapted as needed. The only exception being the file level metadata. If file metadata is included, it must follow the metadata requirements.

This document can also be found on GitHub, https://github.com/USDOT-SDC/Public : https://github.com/USDOT-SDC/Public/blob/master/DOCUMENTATION-template.md


<template>

<Project Name> Data Documentation

  • Project Acronym: <PROJ-ACRNM>

1.0 Points of Contact (POCs)

Fill in the following, as appropriate and known

2.0 Dataset Overview:

  • Introduction or abstract

  • Time period covered by the data

  • Physical location (description, polygon or lat/lon/elev) of observations in the dataset

  • Data source if applicable

  • Any website references (i.e. additional documentation such as Project website)

3.0 Instrument/Collection Device Description:

  • Brief text (i.e. 1-2 paragraphs) describing the instrument(s) with references

  • Figures (or links), if applicable

  • Table of specifications (i.e. accuracy, precision, frequency, resolution, time zone, etc.)

4.0 Data Collection and Processing:

  • Description of data collection

  • Description of derived parameters and processing techniques used

  • Description of quality control procedures

  • Data inter-comparisons, if applicable

5.0 Data Files:

  • File names: Data file structure and file naming conventions (e.g. column delimited ASCII, NetCDF, GIF, JPEG, etc.)

  • Field names: Data format and layout (i.e. description of header/data records, sample records)

  • Data type (i.e. string, int, ISO DATE/TIME, etc.)

  • Description:

    • List of parameters with units, sampling intervals, frequency, range

    • Data version number and date

    • Description of flags, codes used in the data, and definitions (i.e. good, questionable, missing, estimated, etc.)

5.1 Data Elements

  • File name

    • Field name [datatype] Description

5.2 Data Format Example

  • agci_events_20210301.csv

    • event_datetime [ISO datetime] Date and time of event capture

    • event_id [int] Unique event identifier

    • event_type [string] Category of event

    • event_speed [decimal] Speed of vehicle in MPH

6.0 File Level Metadata

  • If possible, file level metadata should be included at the beginning of ASCII text files

  • If included, metadata will be enclosed in tags (<metadata></metadata>)

  • The contents of metadata will be in yaml format

6.1 Metadata Elements

If metadata is included, it should use the following elements.

  • Required

    • PROJECT [string]

    • SUBSET, NAME [string]

    • SUBSET, FORMAT VERSION [decimal]

    • COUNT [integer]

  • Optional

    • SUBSET, SOURCE [string]

    • COVERAGE, BEGIN [ISO DATE/TIME]

    • COVERAGE, END [ISO DATE/TIME]

    • REMARKS [string]

    • NOTIFY [email as string]

Please note that PROJECT and SUBSET, NAME values must be consistent throughout the SDC

See the next section for an example

6.3 Example

<metadata>
PROJECT: FRA-ARDS
SUBSET:
  NAME: AGCI
  FORMAT VERSION: 1.0
  SOURCE: RSIS [schema].[table_name]
COVERAGE:
  BEGIN: 2021-01-01T00:00:00.00-06:00
  END: 2021-01-31T00:00:00.00-06:00
COUNT: 142
REMARKS:
  - Office of Safety/Knowledge Management Division
  - Observation DATE/TIME provided in local timezone
  - No data available for 2021-01-16
NOTIFY:
  - KRogers@stevens.edu
  - DCrocetti@dot.gov
  - SDavis@dot.gov
</metadata>
DATE/TIME               LAT     LONG     ID        TYPE    TRESPASSER  FATALITY
                        Deg     Deg      NUMBER                        COUNT
2021-01-08T14:35-06:00  33.087  102.116  AI.20.1  Public   True        0
2021-01-15T05:48-06:00  33.090  102.120  AI.20.2  Private  False       0
2021-01-28T07:52-06:00  33.087  102.116  AI.20.3  Public   True        1
2021-01-30T18:16-06:00  33.090  102.120  AI.20.4  Private  False       0

7.0 Data Remarks:

  • PI's assessment of the data (i.e. disclaimers, instrument problems, quality issues, etc.)

  • Missing data periods

  • Software compatibility (i.e. list of existing software to view/manipulate the data)

8.0 References:

  • List of documents cited in this documentation

No data available for 2021-01-16

</template>

  • No labels