Kinesis Data Streams Integration
Imagine you run a booming e-commerce platform. When a user places an order, DynamoDB handles the transaction perfectly. But now, your business needs to instantly update the ElasticSearch catalog, trigger fraud detection models, and ingest the data into an enterprise Data Lake for long-term analytics.
While DynamoDB Streams is excellent for simple, single-purpose triggers (like updating a cache or sending an email), it has strict limitations at scale—specifically, it only supports a maximum of two concurrent consumers per shard. If the Search, Fraud, and Analytics teams all try to read the same DynamoDB Stream, they will compete for throughput and get throttled.
For high-throughput applications, long-term retention, and multi-consumer fan-out, you need the heavy lifter: Amazon Kinesis Data Streams.
1. DynamoDB Streams vs. Kinesis Data Streams
Both services provide ordered, sharded streams of data changes, but they serve different purposes.
| Feature | DynamoDB Streams | Kinesis Data Streams |
|---|---|---|
| Retention | Fixed at 24 hours. | 24 hours to 365 days. |
| Consumers | Max 2 consumers per shard. | Up to 5 consumers (standard) or 20 (enhanced fan-out). |
| Cost | Charged by Read Request Units (RRU). | Charged by Shard Hour + Payload Units. |
| Ordering | Strict ordering per Item Key. | Strict ordering per Partition Key. |
| Integration | Tightly coupled with the table. | Decoupled; many producers can write to one stream. |
[!TIP] Use Kinesis When: You need to fan-out data to multiple teams (Search Team, Fraud Team, Analytics Team) without them competing for read throughput on the DynamoDB stream shards.
2. Kinesis Data Streams for DynamoDB
AWS offers a feature called Kinesis Data Streams for DynamoDB. This allows you to replicate item-level changes from your table to a Kinesis stream without writing any code.
- Zero Impact on Table Performance: The replication happens asynchronously in the background and does not consume your table’s RCU/WCU.
- Precision: You can choose whether to replicate the entire item (
NEW_IMAGE) or just keys.
3. The Analytics Pipeline Pattern
A common pattern in modern data architectures is to use DynamoDB for online transactions (OLTP) and S3/Athena for analytics (OLAP). Kinesis acts as the bridge.
- DynamoDB: Handles user requests (Sub-ms latency).
- Kinesis Data Stream: Receives change events.
- Kinesis Data Firehose: Buffers records (e.g., 128MB or 5 minutes) and writes them to S3.
- Amazon S3: Stores the raw JSON/Parquet data.
- Amazon Athena: Runs SQL queries on the S3 data for reporting.
Interactive: Data Pipeline Simulator
Visualize how a single write to DynamoDB propagates through the entire analytics pipeline.
4. Considerations & Costs
Shard Management
Unlike DynamoDB Streams (where shards are managed for you invisibly behind the scenes), Kinesis gives you control over your shards.
- Provisioned Mode: You manually specify the number of shards. 1 Shard = 1MB/s or 1,000 records/s write, and 2MB/s read. This requires monitoring and manual scaling during traffic spikes.
- On-Demand Mode: AWS automatically scales the shards based on throughput. This is easier but costs more per GB and per stream-hour.
Ordering Guarantees and Partition Keys
Kinesis guarantees order within a shard. When writing to Kinesis, a Partition Key determines which shard the data lands in. When using Kinesis Data Streams for DynamoDB, the DynamoDB item’s Partition Key is automatically used as the Kinesis Partition Key. This ensures that all updates to a single item go to the same shard, preserving strict chronological ordering for that specific item.
Duplicate Records (Idempotency)
Kinesis has an “at least once” delivery guarantee. Due to network retries or consumer application crashes, your consumer might process the same record twice. Your downstream consumer must be idempotent (e.g., performing UPSERT operations instead of blind INSERT operations).
5. Summary
- Kinesis Data Streams is the enterprise-grade sibling of DynamoDB Streams.
- Use it for long retention (replaying history) or high fan-out (many consumers).
- The Analytics Pipeline (DynamoDB → Kinesis → Firehose → S3) is the standard pattern for getting data out of DynamoDB for complex querying.
Next, review your knowledge with the Module Review.