Overview
This guide provides production-ready recommendations for sizing and scaling your PostgreSQL database for Conduktor Console across AWS, GCP or Azure to ensure a performant and consistent user experience.Considerations
Key considerations for sizing your database:1
Match database size to production needs
Select database specifications that match your production workload - avoid using proof-of-concept configurations in production environments.
2
Monitor from day one
Set up alerts for CPU, memory, IOPS and query latency before going live.
3
Ensure sufficient IOPS throughput
Provision a minimum of 3,000 IOPS (Input/Output Operations Per Second) - Background sync processes are write-heavy.
4
Balance cost and performance
Use these recommendations as a baseline, then optimize based on actual usage.
Usage level
To get started, choose the level that matches your expected initial usage:| Level | Concurrent users | Kafka scale | Estimated Maximum DB size |
|---|---|---|---|
| Standard | Up to 500 | 1-5 clusters, up to 1,000 topics / 10,000 partitions, ~500 consumer groups | up to 50 GB |
| Mid scale | 500-1,000 | 5-10 clusters, up to 5,000 topics / 50,000 partitions, ~1,000 consumer groups | up to 100 GB |
| Fully scaled | 1,000-5,000 | 10+ clusters, 5,000+ topics / 50,000+ partitions, 1,000+ consumer groups | up to 250 GB |
The estimated maximum DB size is based on an installation making maximum utilization of all Conduktor features over a year and caters for accumulated information such as audit logging.
Target performance
Our recommendations are based on providing the optimal Conduktor experience while avoiding over-provisioning of the database. We’ve based the recommendations on:- keeping P95 query latency under a level to provide the best user experience
- handling concurrent queries based on your expected user base size
- supporting background Conduktor metadata updates for your expected Kafka platform size
- avoiding IOPS throttling or similar cloud provider limitations during normal operations
Conduktor Console continuously syncs Kafka metadata to the database via a background task, which requires sufficient IOPS (Input/Output Operations Per Second) as specified in the recommendations below.
Recommended specifications
- AWS
- Azure
- GCP
We recommend that you use AWS RDS PostgreSQL. The Postgres version has to be 14.8+ or 15.3+.
- Standard level
- Mid scale
- Fully scaled
Instance type:
- Minimum:
db.t4g.large(2 vCPU, 8 GB RAM) - Recommended:
db.m6g.large(2 vCPU, 8 GB RAM) for consistent performance without CPU credits
- Type: general purpose SSD (gp3)
- Size: 50 GB minimum (allows for growth)
- IOPS: 3,000 IOPS baseline (included free with gp3)
- Throughput: 125 MB/s (included free)
- PostgreSQL version: 14.8+ or 15.3+. See version requirements.
- Multi-AZ: recommended for production
- Automated backups: enable with 7-day retention minimum
Monitoring and observability
Regardless of your level of usage, we recommend that you implement these monitoring practices for the database:Critical metrics
| Metric | Warning threshold | Critical threshold | Action |
|---|---|---|---|
| CPU utilization | >70% sustained | >85% sustained | Scale up compute |
| Memory (Freeable) | <25% free | <15% free | Scale up memory |
| IOPS (read/write) | >80% of limit | >95% of limit | Increase provisioned IOPS or storage size |
| Disk apace | <20% free | <10% free | Increase storage size |
| Replication lag (if HA) | >30 seconds | >60 seconds | Check network, investigate load |
Cloud-specific tools
- AWS RDS
- GCP cloud SQL
- Azure database for PostgreSQL
- Enable Performance Insights (provides query-level analysis)
- Enable Enhanced Monitoring (OS-level metrics)
- Create CloudWatch alarms for critical metrics
- Monitor:
ReadIOPS,WriteIOPS,CPUUtilization,FreeableMemory,DatabaseConnections - CloudWatch Metrics for RDS
Scaling and performance
When to scale up
- CPU consistently >70% for more than 1 hour during business hours
- Memory (freeable) <25% sustained, indicating index and working set don’t fit in RAM
- IOPS at >80% of limit for more than 30 minutes, causing query slowdowns
Connection pooling
Conduktor Console includes built-in connection pooling. The default is 15 connections per instance but you can change this using theCDK_DATABASE_CONNECTION_POOL_SIZE parameter.
Cloud providers have default connection limits based on the provisioned database instance size.
Verify that your instance type supports your required connection count. As a general rule, you should also allow for a few more (~10) connections on top of this.
Database maintenance
Backup and recovery
Our suggested backup requirements are:- Automated backups: enabled with 7-day retention minimum (14-30 days for production)
- Backup window: during low-usage periods (e.g., 2-4 AM local time)
- Point-in-time recovery: enabled (available on all cloud providers)
- Cross-region backups: for disaster recovery (if required by compliance)
- Manual snapshots: take a manual backup before upgrades of Conduktor Console
Upgrade paths
Scaling compute (vertical scaling): All providers support instance size changes with brief downtime (typically 5-15 minutes). Check your cloud provider documentation for full information. Scaling storage:- AWS: storage can be scaled up without downtime (gp3 volumes support online resizing)
- GCP: storage automatically scales up; can be manually increased without downtime
- Azure: storage can be scaled up without downtime
- AWS: modify gp3 IOPS or switch to Provisioned IOPS (io1/io2) during a maintenance window
- GCP: IOPS scale automatically with storage size
- Azure: Premium SSD v2 allows online IOPS adjustment; Premium SSD requires storage tier change
Cost optimization
- Use reserved instances/committed use discounts: save 30-60% for predictable workloads
- Right-size early: starting oversized and scaling down is difficult; start with recommendations and scale up as needed
- Use gp3 storage on AWS: 20% cheaper than gp2 with better baseline performance, especially for IOPS
- Enable multi-AZ only for production: dev/test environments can use single-AZ to save 50% on instance costs
- Monitor idle connections: ensure connection pooling is working correctly to avoid over-provisioning
Troubleshoot
Can I use Aurora PostgreSQL instead of RDS PostgreSQL?
Can I use Aurora PostgreSQL instead of RDS PostgreSQL?
Yes, Aurora PostgreSQL is compatible with Conduktor Console and is a good option for fully scaled deployments.Aurora provides better scalability, automatic failover, and read replicas. Version requirements still apply (14.8+ / 15.3+).
What happens if I run out of IOPS?
What happens if I run out of IOPS?
IOPS throttling causes slow queries, timeouts, and potential user-facing errors.The background metadata sync process is especially sensitive to IOPS limits. Monitor
ReadIOPS and WriteIOPS metrics and scale up before hitting limits.Can I use Burstable tier (T-series) instances for production?
Can I use Burstable tier (T-series) instances for production?
T-series (AWS) or Burstable tier (Azure) instances can work for standard level installs with low, consistent load.However, once CPU credits are exhausted, performance degrades significantly. For production, we recommend general purpose instances (M-series on AWS, General Purpose on Azure/GCP) for predictable performance.
How do I estimate my database size growth?
How do I estimate my database size growth?
Database growth typically depends on:
- the number of Kafka topics, partitions, subjects (schemas), jobs (Kafka Connect) and consumer groups as well as
- number of users, the level of RBAC and the activity level of these users
Should I use read replicas?
Should I use read replicas?
Read replicas can help with read-heavy workloads but add complexity. They are typically not needed for Conduktor.
What if my deployment doesn't match these usages levels exactly?
What if my deployment doesn't match these usages levels exactly?
These levels are guidelines. If you have 150 users but 100,000 topics, use mid or fully scaled sizing.The Kafka scale (topics, consumer groups) drives database size more than user count. When in doubt, start with the next level up and scale down if over-provisioned.