Securely sync CSV or JSON data from any S3 bucket
Need to join, transform, or filter your S3 data? Consider using the AWS Athena source to directly query your S3 data using SQL.
Overview
Hightouch lets you pull data from CSV and JSON files stored in an Amazon S3 bucket and push them to downstream destinations. To get started, you need an S3 bucket and AWS credentials.
AWS credential setup
See the guide for configuring AWS credentials to learn how you can set up an IAM principal and use its credentials. The IAM principal whose credentials you use must have programmatic access enabled and permission to read from the S3 path you want to use.
Hightouch needs the following IAM actions to retrieve items from your bucket:
Action | Details |
---|---|
s3:GetObject | Grants permission to retrieve objects from Amazon S3 |
s3:ListBucket | Grants permission to list some or all the objects in an Amazon S3 bucket (up to 1000) |
You can use the following JSON sample to create your IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::${bucketName}/*",
"arn:aws:s3:::${bucketName}"
]
}
]
}
Connection configuration
To get started, go to the Sources overview page and click the Add source button. Select Amazon S3 and follow the steps below.
Configure your credentials
Select the credentials you previously created or click Create new to set them up now.
Configure your source
Enter your S3 bucket's Region and the Bucket name. The bucket name should be just the name of the bucket, not a URL.
Test your connection
When setting up Amazon S3 as a source for the first time, Hightouch validates your AWS credentials and access to your S3 bucket. Once the test passes, click Continue to finish setup.
Next steps
Once your source configuration has passed the necessary validation, your source setup is complete. Next, you can set up models to define which data you want to pull from your S3 bucket. The file must have either a .csv
or .json
extension.
CSV requirements
CSV files must meet the following requirements:
- They must have a header row. The values in the header row are automatically available as column names when you set up a sync.
- They must use comma-separated values; tabs and other delimiters aren't supported.
- They must use double quotes (
"
) for quoted values.
JSON requirements
JSON files must meet the following requirements:
- The input file must contain an array at the top level.
- Each element in the input file must be an object with the same keys.
The keys of the array elements are automatically available as column names when you set up a sync.
Model setup
Hightouch supports syncing data from a specific file in your S3 bucket as well as from the last modified file that matches a given prefix.
- In Hightouch, go to the Models overview page.
- Click Add model.
- Select the Amazon S3 source you previously created.
- Enter the relative path to the CSV or JSON file that you want to sync
data from, like
path/prefix/file.csv
. The path shouldn't contain the bucket name. - To sync the last modified file with a given prefix, check the Use last
modified box and enter a prefix like
path/prefix
in the path field. See Amazon's S3 docs on prefixes for examples of how prefixes match objects. To sync the last modified file in the entire bucket, check the Use last modified box and leave the path field blank. - Preview your model's query results.
- Click Continue.
- Name your model and select its primary key. Hightouch uses the primary key to determine which rows have been added, changed, or removed since the last sync.
Tips and troubleshooting
To date, our customers haven't experienced any errors while using this source. If you run into any issues, please don't hesitate to . We're here to help.