Skip to main content
Amazon S3 stores objects in buckets. MantrixFlow can read files from S3 and sync them as tabular data. Supported formats include CSV, JSON, JSONL, Parquet, and Avro.

Prerequisites

You need an S3 bucket and an IAM user with s3:GetObject and s3:ListBucket permissions on the bucket. Restrict access to the specific bucket rather than full S3 access.

Connection Setup

1

1. Bucket Name

Enter the S3 bucket name. Find this in the AWS S3 console.
2

2. Region

Enter the AWS region (e.g. us-east-1, ap-south-1). The bucket must be in this region.
3

3. Access Key ID and Secret Access Key

Create an IAM user with programmatic access. Attach a policy that grants only s3:GetObject and s3:ListBucket on the specific bucket. Use the access key ID and secret access key.
4

4. Path Pattern

Enter a path pattern to select which files to sync (e.g. data/*.csv or exports/2024/**/*.parquet). Use * for single-level wildcard and ** for multi-level.

IAM Policy Example

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

Supported File Formats

  • CSV
  • JSON
  • JSONL (newline-delimited JSON)
  • Parquet
  • Avro

Available Streams

Each file or file pattern can map to a stream. MantrixFlow discovers files matching the path pattern and infers the schema from the file format.

Supported Sync Modes

  • Full sync — Reads all matching files on every run. New files are included; deleted files are excluded.