Background processes will move the data from Raw data bucket to the Standardized data bucket under a folder labelled based on the date it was uploaded. As a data provider, you will have a folder in the Data Lake that contains all of the data you upload.
' -> 'Raw-data' -> 'Standardized-data' → raw-data
→ standardized-data
Data uploads can be verified by running the below AWS CLI command on the Standardized data bucket to list the objects there.
Code Block |
aws s3 ls s3://prod.sdc.dot.gov.data-lake.standardized-data/<data-provider> --profile sdc |
The Standardized data bucket name “data provider” is provided in the table below. The “project name” and “data provider name” are provided in the welcome email.
Configuration Files
When the uploaded data reaches the Standardized Data Bucket, it will be checked by the validation lambda function. This function confirms that all the correct fields exist for each message and that the data in each field is reasonable (eg. Variable speed limit set to 0-254 MPH). The field and data checks are stored in a file called config.ini.