Apache Kafka Event Streaming

Build a production-ready event-driven architecture using Apache Kafka with producers, consumers, and stream processing for the following requirements:

System Overview

Application Architecture

System Name: [e.g., E-commerce Platform, IoT Data Pipeline, Microservices]
Architecture Type: [Event-driven microservices / CQRS / Event sourcing / Stream processing]
Number of Services: [3-5 / 10+ / Specify]
Expected Throughput: [100 events/sec / 10k events/sec / 1M events/sec]
Data Retention: [7 days / 30 days / 90 days / Indefinite]

Infrastructure

Kafka Deployment: [Self-hosted / Confluent Cloud / AWS MSK / Azure Event Hubs]
Cluster Size: [Single broker / 3 brokers / 5+ brokers]
Replication Factor: [1 / 3 / 5]
Environment: [Development / Staging / Production / All]
Cloud Provider: [AWS / GCP / Azure / On-premise]

Event Schema Design

Event 1: [EventName, e.g., UserCreated]

Topic: [user-events / users.created / domain.entity.action]
Event Type: [user.created / order.placed / etc.]
Schema Format: [JSON / Avro / Protobuf / Plain text]
Key: [userId / orderId / customerId]

Payload Fields:

{
  "eventId": "uuid",
  "eventType": "user.created",
  "timestamp": "ISO8601",
  "userId": "string",
  "email": "string",
  "metadata": {}
}

Partitioning Strategy: [By userId / By region / Round-robin]
Expected Volume: [Events per day/hour]

Event 2: [EventName, e.g., OrderPlaced]

Topic: [Topic name]
Event Type: [Type]
Schema Format: [Format]
Key: [Partition key]
Payload Fields: [JSON schema]
Partitioning Strategy: [Strategy]
Expected Volume: [Volume]

Event 3: [EventName]

Topic: [Topic]
Event Type: [Type]
Schema Format: [Format]
Key: [Key]
Payload Fields: [Schema]
Partitioning Strategy: [Strategy]
Expected Volume: [Volume]

[Define 5-15 event types]

Topics Configuration

Topic 1: [TopicName]

Purpose: [What events this topic contains]
Partitions: [3 / 6 / 12 / Custom]
Replication Factor: [1 / 3 / 5]
Retention: [7d / 30d / Compact]
Cleanup Policy: [delete / compact / compact,delete]
Min In-Sync Replicas: [1 / 2 / 3]
Compression: [none / gzip / snappy / lz4 / zstd]

Topic 2: [TopicName]

Purpose: [Purpose]
Partitions: [Count]
Replication Factor: [Factor]
Retention: [Retention]
Cleanup Policy: [Policy]
Min In-Sync Replicas: [Count]
Compression: [Type]

[Define 3-10 topics]

Producers

Producer 1: [ServiceName]

Language: [Node.js / Python / Java / Go / .NET]
Events Produced: [List event types]
Production Rate: [Events per second]
Batching: [Enabled with X ms linger / Disabled]
Compression: [gzip / snappy / lz4 / none]
Acknowledgment: [all / 1 / 0]
Idempotence: [Enabled / Disabled]
Retry Policy: [Max retries, backoff strategy]
Error Handling: [Dead letter queue / Logging / Alert]

Producer 2: [ServiceName]

Language: [Language]
Events Produced: [Event types]
Production Rate: [Rate]
Batching: [Config]
Compression: [Type]
Acknowledgment: [Level]
Idempotence: [Status]
Retry Policy: [Policy]
Error Handling: [Strategy]

[Define 2-8 producers]

Consumers

Consumer 1: [ServiceName / Consumer Group]

Language: [Node.js / Python / Java / Go / .NET]
Topics Subscribed: [List topics]
Consumer Group: [Group ID]
Processing Type: [Real-time / Batch / Stream processing]
Offset Management: [Auto-commit / Manual commit]
Auto-commit Interval: [5s / 10s / Custom]
Max Poll Records: [500 / 1000 / Custom]
Processing Logic: [What this consumer does with events]
Idempotency: [How to handle duplicate events]
Error Handling: [Retry / DLQ / Skip / Alert]
Scaling: [Number of instances]

Consumer 2: [ServiceName / Consumer Group]

Language: [Language]
Topics Subscribed: [Topics]
Consumer Group: [Group]
Processing Type: [Type]
Offset Management: [Strategy]
Auto-commit Interval: [Interval]
Max Poll Records: [Count]
Processing Logic: [Logic]
Idempotency: [Strategy]
Error Handling: [Strategy]
Scaling: [Instances]

[Define 2-10 consumers]

Stream Processing

Stream Processor 1: [Name, e.g., User Analytics]

Framework: [Kafka Streams / Flink / Spark Streaming / ksqlDB]
Input Topics: [Source topics]
Output Topics: [Destination topics]
Processing Logic:
- Transformations: [Map, filter, aggregate, join, etc.]
- Windowing: [Tumbling / Hopping / Session / None]
- Window Size: [1 minute / 5 minutes / 1 hour]
- Aggregations: [Count, sum, average, etc.]
- Joins: [Stream-stream / Stream-table / Table-table]
State Store: [RocksDB / In-memory / External]
Exactly-Once Semantics: [Enabled / Disabled]

Stream Processor 2: [Name]

Framework: [Framework]
Input Topics: [Topics]
Output Topics: [Topics]
Processing Logic: [Describe transformations]
Windowing: [Type and size]
Aggregations: [Aggregations]
Joins: [Join types]
State Store: [Store type]
Exactly-Once Semantics: [Status]

[Define 1-5 stream processors]

Schema Management

Schema Registry

Use Schema Registry: [Yes with Confluent / Yes with Apicurio / No]
Schema Format: [Avro / Protobuf / JSON Schema]
Schema Evolution: [Backward / Forward / Full / None]
Schema Validation: [Strict / Lenient]

Schema Definitions

For each event type, provide:

Schema Version: [v1, v2, etc.]
Required Fields: [List required fields]
Optional Fields: [List optional fields]
Default Values: [Defaults for optional fields]
Evolution Strategy: [How schema changes are handled]

Error Handling & Reliability

Dead Letter Queues

DLQ Topics: [List DLQ topics for failed events]
DLQ Strategy: [After X retries / Immediate / Custom]
DLQ Monitoring: [Alerts on DLQ growth]
DLQ Processing: [Manual review / Auto-retry / Archive]

Retry Policies

Producer Retries: [Max retries, backoff]
Consumer Retries: [Retry count, delay]
Exponential Backoff: [Enabled / Disabled]
Circuit Breaker: [Enabled / Disabled]

Idempotency

Producer Idempotence: [Enabled for all / Specific producers]
Consumer Deduplication: [Event ID tracking / None]
Deduplication Window: [1 hour / 24 hours / Custom]

Monitoring & Observability

Metrics to Track

Producer Metrics:
- Record send rate
- Request latency
- Error rate
- Batch size
- Compression ratio
Consumer Metrics:
- Consumer lag
- Records consumed per second
- Commit latency
- Rebalance rate
- Processing time
Broker Metrics:
- Request rate
- Bytes in/out
- Active connections
- Under-replicated partitions
- Leader election rate

Alerting

Critical Alerts:
- Consumer lag > X messages
- Producer errors > Y%
- Broker down
- Under-replicated partitions
Warning Alerts:
- High consumer lag
- Slow processing
- Rebalancing frequently

Monitoring Tools

Metrics Collection: [Prometheus / Datadog / CloudWatch / Confluent Control Center]
Dashboards: [Grafana / Kibana / Custom]
Log Aggregation: [ELK / Loki / CloudWatch Logs]

Security

Authentication

SASL Mechanism: [PLAIN / SCRAM / GSSAPI / OAUTHBEARER / None]
SSL/TLS: [Enabled / Disabled]
Certificate Management: [Self-signed / CA-signed / Cloud-managed]

Authorization

ACLs: [Enabled / Disabled]
Access Control:
- Producer permissions per topic
- Consumer permissions per topic
- Admin permissions

Encryption

In-Transit: [TLS encryption]
At-Rest: [Broker encryption / None]

Performance Optimization

Producer Optimization

Batching: [Batch size, linger time]
Compression: [Algorithm and level]
Buffer Memory: [Size in MB]
Max In-Flight Requests: [1 / 5 / Custom]

Consumer Optimization

Fetch Size: [Min/max fetch size]
Prefetch: [Enable prefetching]
Parallel Processing: [Number of threads]
Commit Strategy: [Sync / Async / Manual]

Broker Optimization

Log Segment Size: [1GB / Custom]
Log Flush Interval: [Messages / Time]
Replica Fetcher Threads: [Count]
Network Threads: [Count]

Deployment Configuration

Kafka Cluster

Broker Count: [3 / 5 / 7+]
Zookeeper Ensemble: [3 / 5 nodes]
Broker Resources:
- CPU: [Cores]
- Memory: [GB]
- Disk: [Type and size]
- Network: [Bandwidth]

High Availability

Multi-AZ Deployment: [Yes / No]
Rack Awareness: [Enabled / Disabled]
Min In-Sync Replicas: [2 / 3]
Unclean Leader Election: [Disabled / Enabled]

Code Generation Requirements

Generate a complete Kafka event-driven system including:

Kafka Configuration:
- Broker configuration files (server.properties)
- Topic creation scripts with all configurations
- Zookeeper configuration (if self-hosted)
- Docker Compose for local development
- Kubernetes manifests for production
Producer Implementations:
- Producer code for each service in specified language
- Event serialization logic
- Batching and compression configuration
- Error handling and retry logic
- Idempotent producer setup
- Monitoring instrumentation
Consumer Implementations:
- Consumer code for each service in specified language
- Event deserialization logic
- Offset management (auto/manual commit)
- Error handling and DLQ logic
- Idempotency handling
- Graceful shutdown handling
- Monitoring instrumentation
Stream Processing Applications:
- Stream processing topology code
- Windowing and aggregation logic
- Join operations
- State store configuration
- Exactly-once semantics setup
Schema Definitions:
- Avro/Protobuf schema files for all events
- Schema registry configuration
- Schema evolution examples
- Serializer/deserializer code
Error Handling:
- Dead letter queue setup
- Retry logic implementation
- Circuit breaker patterns
- Error logging and alerting
Monitoring Setup:
- Prometheus exporters for Kafka metrics
- Grafana dashboards for producers, consumers, brokers
- Alert rules for critical metrics
- Consumer lag monitoring
Security Configuration:
- SASL/SSL configuration
- ACL definitions
- Certificate generation scripts
- Credential management
Testing:
- Unit tests for producers/consumers
- Integration tests with embedded Kafka
- Load testing scripts
- Chaos testing scenarios
Documentation:
- Architecture diagram
- Event catalog (all events documented)
- Topic configuration reference
- Runbooks for common issues
- Deployment guide
- Troubleshooting guide

Output production-ready Kafka infrastructure following best practices with:

Proper partitioning strategy for scalability
Idempotent producers and consumers
Exactly-once semantics where needed
Comprehensive error handling with DLQs
Schema evolution support
High availability configuration
Security with SASL/SSL
Monitoring and alerting
Graceful shutdown handling
Consumer lag management
Proper offset management
Clear event naming conventions
Comprehensive documentation