Apache Kafka Event Streaming
Build event-driven architecture with Kafka producers, consumers, and stream processing.
Prompt
Build a production-ready event-driven architecture using Apache Kafka with producers, consumers, and stream processing for the following requirements:
System Overview
Application Architecture
- System Name: [e.g., E-commerce Platform, IoT Data Pipeline, Microservices]
- Architecture Type: [Event-driven microservices / CQRS / Event sourcing / Stream processing]
- Number of Services: [3-5 / 10+ / Specify]
- Expected Throughput: [100 events/sec / 10k events/sec / 1M events/sec]
- Data Retention: [7 days / 30 days / 90 days / Indefinite]
Infrastructure
- Kafka Deployment: [Self-hosted / Confluent Cloud / AWS MSK / Azure Event Hubs]
- Cluster Size: [Single broker / 3 brokers / 5+ brokers]
- Replication Factor: [1 / 3 / 5]
- Environment: [Development / Staging / Production / All]
- Cloud Provider: [AWS / GCP / Azure / On-premise]
Event Schema Design
Event 1: [EventName, e.g., UserCreated]
- Topic: [user-events / users.created / domain.entity.action]
- Event Type: [user.created / order.placed / etc.]
- Schema Format: [JSON / Avro / Protobuf / Plain text]
- Key: [userId / orderId / customerId]
- Payload Fields:
{ "eventId": "uuid", "eventType": "user.created", "timestamp": "ISO8601", "userId": "string", "email": "string", "metadata": {} } - Partitioning Strategy: [By userId / By region / Round-robin]
- Expected Volume: [Events per day/hour]
Event 2: [EventName, e.g., OrderPlaced]
- Topic: [Topic name]
- Event Type: [Type]
- Schema Format: [Format]
- Key: [Partition key]
- Payload Fields: [JSON schema]
- Partitioning Strategy: [Strategy]
- Expected Volume: [Volume]
Event 3: [EventName]
- Topic: [Topic]
- Event Type: [Type]
- Schema Format: [Format]
- Key: [Key]
- Payload Fields: [Schema]
- Partitioning Strategy: [Strategy]
- Expected Volume: [Volume]
[Define 5-15 event types]
Topics Configuration
Topic 1: [TopicName]
- Purpose: [What events this topic contains]
- Partitions: [3 / 6 / 12 / Custom]
- Replication Factor: [1 / 3 / 5]
- Retention: [7d / 30d / Compact]
- Cleanup Policy: [delete / compact / compact,delete]
- Min In-Sync Replicas: [1 / 2 / 3]
- Compression: [none / gzip / snappy / lz4 / zstd]
Topic 2: [TopicName]
- Purpose: [Purpose]
- Partitions: [Count]
- Replication Factor: [Factor]
- Retention: [Retention]
- Cleanup Policy: [Policy]
- Min In-Sync Replicas: [Count]
- Compression: [Type]
[Define 3-10 topics]
Producers
Producer 1: [ServiceName]
- Language: [Node.js / Python / Java / Go / .NET]
- Events Produced: [List event types]
- Production Rate: [Events per second]
- Batching: [Enabled with X ms linger / Disabled]
- Compression: [gzip / snappy / lz4 / none]
- Acknowledgment: [all / 1 / 0]
- Idempotence: [Enabled / Disabled]
- Retry Policy: [Max retries, backoff strategy]
- Error Handling: [Dead letter queue / Logging / Alert]
Producer 2: [ServiceName]
- Language: [Language]
- Events Produced: [Event types]
- Production Rate: [Rate]
- Batching: [Config]
- Compression: [Type]
- Acknowledgment: [Level]
- Idempotence: [Status]
- Retry Policy: [Policy]
- Error Handling: [Strategy]
[Define 2-8 producers]
Consumers
Consumer 1: [ServiceName / Consumer Group]
- Language: [Node.js / Python / Java / Go / .NET]
- Topics Subscribed: [List topics]
- Consumer Group: [Group ID]
- Processing Type: [Real-time / Batch / Stream processing]
- Offset Management: [Auto-commit / Manual commit]
- Auto-commit Interval: [5s / 10s / Custom]
- Max Poll Records: [500 / 1000 / Custom]
- Processing Logic: [What this consumer does with events]
- Idempotency: [How to handle duplicate events]
- Error Handling: [Retry / DLQ / Skip / Alert]
- Scaling: [Number of instances]
Consumer 2: [ServiceName / Consumer Group]
- Language: [Language]
- Topics Subscribed: [Topics]
- Consumer Group: [Group]
- Processing Type: [Type]
- Offset Management: [Strategy]
- Auto-commit Interval: [Interval]
- Max Poll Records: [Count]
- Processing Logic: [Logic]
- Idempotency: [Strategy]
- Error Handling: [Strategy]
- Scaling: [Instances]
[Define 2-10 consumers]
Stream Processing
Stream Processor 1: [Name, e.g., User Analytics]
- Framework: [Kafka Streams / Flink / Spark Streaming / ksqlDB]
- Input Topics: [Source topics]
- Output Topics: [Destination topics]
- Processing Logic:
- Transformations: [Map, filter, aggregate, join, etc.]
- Windowing: [Tumbling / Hopping / Session / None]
- Window Size: [1 minute / 5 minutes / 1 hour]
- Aggregations: [Count, sum, average, etc.]
- Joins: [Stream-stream / Stream-table / Table-table]
- State Store: [RocksDB / In-memory / External]
- Exactly-Once Semantics: [Enabled / Disabled]
Stream Processor 2: [Name]
- Framework: [Framework]
- Input Topics: [Topics]
- Output Topics: [Topics]
- Processing Logic: [Describe transformations]
- Windowing: [Type and size]
- Aggregations: [Aggregations]
- Joins: [Join types]
- State Store: [Store type]
- Exactly-Once Semantics: [Status]
[Define 1-5 stream processors]
Schema Management
Schema Registry
- Use Schema Registry: [Yes with Confluent / Yes with Apicurio / No]
- Schema Format: [Avro / Protobuf / JSON Schema]
- Schema Evolution: [Backward / Forward / Full / None]
- Schema Validation: [Strict / Lenient]
Schema Definitions
For each event type, provide:
- Schema Version: [v1, v2, etc.]
- Required Fields: [List required fields]
- Optional Fields: [List optional fields]
- Default Values: [Defaults for optional fields]
- Evolution Strategy: [How schema changes are handled]
Error Handling & Reliability
Dead Letter Queues
- DLQ Topics: [List DLQ topics for failed events]
- DLQ Strategy: [After X retries / Immediate / Custom]
- DLQ Monitoring: [Alerts on DLQ growth]
- DLQ Processing: [Manual review / Auto-retry / Archive]
Retry Policies
- Producer Retries: [Max retries, backoff]
- Consumer Retries: [Retry count, delay]
- Exponential Backoff: [Enabled / Disabled]
- Circuit Breaker: [Enabled / Disabled]
Idempotency
- Producer Idempotence: [Enabled for all / Specific producers]
- Consumer Deduplication: [Event ID tracking / None]
- Deduplication Window: [1 hour / 24 hours / Custom]
Monitoring & Observability
Metrics to Track
-
Producer Metrics:
- Record send rate
- Request latency
- Error rate
- Batch size
- Compression ratio
-
Consumer Metrics:
- Consumer lag
- Records consumed per second
- Commit latency
- Rebalance rate
- Processing time
-
Broker Metrics:
- Request rate
- Bytes in/out
- Active connections
- Under-replicated partitions
- Leader election rate
Alerting
-
Critical Alerts:
- Consumer lag > X messages
- Producer errors > Y%
- Broker down
- Under-replicated partitions
-
Warning Alerts:
- High consumer lag
- Slow processing
- Rebalancing frequently
Monitoring Tools
- Metrics Collection: [Prometheus / Datadog / CloudWatch / Confluent Control Center]
- Dashboards: [Grafana / Kibana / Custom]
- Log Aggregation: [ELK / Loki / CloudWatch Logs]
Security
Authentication
- SASL Mechanism: [PLAIN / SCRAM / GSSAPI / OAUTHBEARER / None]
- SSL/TLS: [Enabled / Disabled]
- Certificate Management: [Self-signed / CA-signed / Cloud-managed]
Authorization
- ACLs: [Enabled / Disabled]
- Access Control:
- Producer permissions per topic
- Consumer permissions per topic
- Admin permissions
Encryption
- In-Transit: [TLS encryption]
- At-Rest: [Broker encryption / None]
Performance Optimization
Producer Optimization
- Batching: [Batch size, linger time]
- Compression: [Algorithm and level]
- Buffer Memory: [Size in MB]
- Max In-Flight Requests: [1 / 5 / Custom]
Consumer Optimization
- Fetch Size: [Min/max fetch size]
- Prefetch: [Enable prefetching]
- Parallel Processing: [Number of threads]
- Commit Strategy: [Sync / Async / Manual]
Broker Optimization
- Log Segment Size: [1GB / Custom]
- Log Flush Interval: [Messages / Time]
- Replica Fetcher Threads: [Count]
- Network Threads: [Count]
Deployment Configuration
Kafka Cluster
- Broker Count: [3 / 5 / 7+]
- Zookeeper Ensemble: [3 / 5 nodes]
- Broker Resources:
- CPU: [Cores]
- Memory: [GB]
- Disk: [Type and size]
- Network: [Bandwidth]
High Availability
- Multi-AZ Deployment: [Yes / No]
- Rack Awareness: [Enabled / Disabled]
- Min In-Sync Replicas: [2 / 3]
- Unclean Leader Election: [Disabled / Enabled]
Code Generation Requirements
Generate a complete Kafka event-driven system including:
-
Kafka Configuration:
- Broker configuration files (server.properties)
- Topic creation scripts with all configurations
- Zookeeper configuration (if self-hosted)
- Docker Compose for local development
- Kubernetes manifests for production
-
Producer Implementations:
- Producer code for each service in specified language
- Event serialization logic
- Batching and compression configuration
- Error handling and retry logic
- Idempotent producer setup
- Monitoring instrumentation
-
Consumer Implementations:
- Consumer code for each service in specified language
- Event deserialization logic
- Offset management (auto/manual commit)
- Error handling and DLQ logic
- Idempotency handling
- Graceful shutdown handling
- Monitoring instrumentation
-
Stream Processing Applications:
- Stream processing topology code
- Windowing and aggregation logic
- Join operations
- State store configuration
- Exactly-once semantics setup
-
Schema Definitions:
- Avro/Protobuf schema files for all events
- Schema registry configuration
- Schema evolution examples
- Serializer/deserializer code
-
Error Handling:
- Dead letter queue setup
- Retry logic implementation
- Circuit breaker patterns
- Error logging and alerting
-
Monitoring Setup:
- Prometheus exporters for Kafka metrics
- Grafana dashboards for producers, consumers, brokers
- Alert rules for critical metrics
- Consumer lag monitoring
-
Security Configuration:
- SASL/SSL configuration
- ACL definitions
- Certificate generation scripts
- Credential management
-
Testing:
- Unit tests for producers/consumers
- Integration tests with embedded Kafka
- Load testing scripts
- Chaos testing scenarios
-
Documentation:
- Architecture diagram
- Event catalog (all events documented)
- Topic configuration reference
- Runbooks for common issues
- Deployment guide
- Troubleshooting guide
Output production-ready Kafka infrastructure following best practices with:
- Proper partitioning strategy for scalability
- Idempotent producers and consumers
- Exactly-once semantics where needed
- Comprehensive error handling with DLQs
- Schema evolution support
- High availability configuration
- Security with SASL/SSL
- Monitoring and alerting
- Graceful shutdown handling
- Consumer lag management
- Proper offset management
- Clear event naming conventions
- Comprehensive documentation