Data Stream Connector

To support clients' data warehousing and analytics needs, Carriyo offers a feature that streams shipment data in near real-time to an AWS S3 location.

The Carriyo Data Stream connector continuously streams every change to shipment data to an AWS S3 bucket managed by Carriyo, enabling clients to use this data for analytics and data warehousing purposes.

Note: This connector is available as an add-on subscription for enterprise accounts. It covers the following components:

  1. Data Streaming Pipeline : Your data is streamed every 5 minutes to a staging area in S3 through a dedicated data pipeline. This pipeline continuously tracks and processes every change to your data, ensuring near real-time updates. The subscription covers the compute resources required to maintain this data flow.
  2. Data Storage : The streamed data is securely stored in Carriyo’s S3 bucket and retained for one year. This subscription includes reading, writing, and storing the data to ensure seamless access and availability.
  3. Security & Maintenance : The subscription also covers ongoing system maintenance to ensure consistent performance, along with security protocols to safeguard your data.

Please contact your Carriyo account manager to learn more or enable this feature for your account.

Push data every 5 minutes
Pull data
Carriyo
AWS S3 Bucket
Client

Changes are pushed to the S3 bucket at approximately 5-minute intervals. Carriyo provides clients with either an AWS role or an AWS user (or both), which can be used to access the S3 bucket.

Folder Structure

The output files in S3 will be named using the format:
<table name>-<version>-YYYY-MM-DD-hh-mm-ss-<uuid v4>
and follow a folder structure in the format:
/YYYY/MM/DD/hh.

For example, shipment data streamed at 10 a.m. (UTC) on April 1, 2024, will be available in the following structure:

Copy
Copied
- 2024
    - 04
        - 01
            - 10
                - PROD_<TENANT>_Shipment-2-2024-04-01-10-00-00-d963b748-56e9-4b75-a334-29e2cc3a43a1

File Format

Each file may contain multiple change records. Each records is a single line of JSON, consisting of the following key components:

  • Keys : Contains the Shipment ID to which this change record relates.
  • OldImage : A snapshot of the shipment data before the change.
  • NewImage : A snapshot of the shipment data after the change.
  • Other Metadata : Additional details related to the change log, such as timestamps, byte size etc.

Both the OldImage and NewImage follow the Shipment Object Model.

You can download a sample file here.

Please note that the sample file above is formatted for readability and contains only a single change record. In practice, files contain multiple records, each formatted as a single line of JSON without extra line breaks or spacing.

Accessing the S3 Bucket

Using an AWS Role

If you already have an AWS account, Carriyo will set up a stream for your AWS account ID and provide you with the S3 bucket name, an ExternalId (used for secure sharing - learn more), and an AWS role ARN. You can pull data from the S3 bucket using API calls, the AWS SDK, or a data warehouse tool by utilizing the S3 bucket name, role ARN, and ExternalId.

Using an AWS User

If you do not have an AWS account or prefer not to connect your AWS account, Carriyo will allow you to directly access the S3 bucket. Carriyo will create a stream and provide you with the S3 bucket name, as well as AWS user credentials (access key ID and secret access key). You can pull data from the S3 bucket using API calls, the AWS SDK, or a data warehouse tool by using the S3 bucket name and the provided user credentials.