/
DP Guide: Chapter 7, Data Quality Checks and Processes

DP Guide: Chapter 7, Data Quality Checks and Processes

Data Quality Checks and Processes

Validating uploaded Data

For security reasons and to avoid the possibility of tampering with uploaded data, data files that are ingested through S3 buckets are moved immediately to a different location when uploaded. Upon successful upload of data files to the ingest bucket using the above step, data will be moved to the standardized data bucket or “Data Lake.”

Background processes will move the data from Raw data bucket to the Standardized data bucket under a folder labelled based on the date it was uploaded. As a data provider, you will have a folder in the Data Lake that contains all of the data you upload.

 drop-zoneraw-datastandardized-data

Data uploads can be verified by running the below AWS CLI command on the Standardized data bucket to list the objects there.

aws s3 ls s3://prod.sdc.dot.gov.data-lake.standardized-data/<data-provider> --profile sdc

Related content

DP Guide: Chapter 6, Data Discovery and Documentation
DP Guide: Chapter 6, Data Discovery and Documentation
Read with this
DP Guide: Data Documentation Guidelines
DP Guide: Data Documentation Guidelines
More like this
DP Guide: Chapter 1, Introduction and Document Overview
DP Guide: Chapter 1, Introduction and Document Overview
Read with this
DP Guide: Chapter 4, Data Ingestion
DP Guide: Chapter 4, Data Ingestion
Read with this