Architecture Overview
Module: Tuner Platform: CloudKeeper Version: 1.0.0-RELEASE Last Updated: October 26, 2025 Document Type: Solution Architecture (Technical Overview)
Table of Contents
- Introduction
- High-Level Architecture
- Core Components
- Technology Stack
- Data Flow Architecture
- Integration Architecture
- Deployment Architecture
- Security Architecture
Introduction
Purpose of This Document
This document provides a comprehensive architectural overview of AWS Tuner, focusing on:
- System components and their responsibilities
- Technology choices and rationale
- Data flow and processing pipelines
- Integration points and dependencies
- Deployment and operational architecture
Related Documents
- Key Features - Feature capabilities and screenshots
- Business Value - Business value and ROI
- Recommendation Engine - Deep dive on Drools engine
- Security & Compliance - Security architecture details
High-Level Architecture
System Context Diagram
Key Architectural Principles
- Event-Driven: Asynchronous processing via RabbitMQ for scalability
- Microservices: 11 specialized modules (tuner-core, tuner-aws-utils, etc.)
- Multi-Database: Right database for the right data (MongoDB for recommendations, MySQL for accounts)
- Rule-Based Intelligence: Drools engine separates business rules from code
- Read-Only AWS Access: Zero write permissions to customer AWS accounts (security)
- Multi-Cloud Ready: Supports AWS and GCP with unified recommendation model
Core Components
1. tuner-core (Main Application)
Purpose: Core recommendation engine and Spring Boot application
Technology:
- Spring Boot 2.7.4
- Spring Cloud Config 2021.0.8
- Spring Data JPA (MySQL)
- Spring Data MongoDB
- Drools 7.73.0.Final
Key Responsibilities:
-
Recommendation Generation:
- Orchestrates 42+ recommendation job types
- Executes Drools rules for intelligent analysis
- Calculates cost savings and prioritization
-
Resource Synchronization:
- Scheduled jobs sync AWS resource metadata
- CloudWatch metrics collection
- Pricing data updates
-
API Services:
- REST endpoints for frontend
- Recommendation CRUD operations
- Account management
Key Packages:
com.ttn.ck.tuner
├── recommendation/
│ ├── job/aws/ # 42+ AWS recommendation jobs
│ ├── job/gcp/ # 9+ GCP recommendation jobs
│ ├── engine/ # Drools engine integration
│ ├── api/ # REST controllers
│ └── processor/ # Data processing logic
├── core/
│ ├── processor/ # Scheduler job processors
│ └── service/ # Business services
└── api/
├── controller/ # API controllers
└── dto/ # Data transfer objects
Recommendation Jobs (42+ types):
AWS Jobs:
OverProvisionedEc2RecommendationJobSnapshotRecommendationJobIdleNatGatewayRecommendationS3IncompleteMultipartRecommendationJobOverProvisionedRedshiftClusterRecommendationJobIdleVpcEndpointRecommendationJobDynamoDbIdleTableRecommendationJobRdsExtendedSupportRecommendationEksExtendedSupportRecommendationModerniseElasticacheRecommendationLoadBalancerRecommendationJobLambdaErrorRateRecommendationJob- And 30+ more...
GCP Jobs:
IdleComputeInstanceRecommendationJobIdleCloudSqlRecommendationJobIdleStaticIPsRecommendationJobIdlePersistentDiskRecommendationJob- And 5+ more...
2. Drools Rule Engine
Purpose: Business rules management system for recommendation logic
Why Drools?
- ✅ Declarative rule definition (readable by non-developers)
- ✅ Business rules separate from code
- ✅ Easy to add new recommendation types
- ✅ Complex decision logic support
- ✅ No redeployment needed for rule changes
Rule Files (40+ rules):
src/main/resources/rules/
├── oper_provisioned_ec2_instance_rules.drl
├── snapshot-rules.drl
├── natgateway-rules.drl
├── idle-vpc-endpoint-recommendation-rules.drl
├── s3-incomplete-multipart-rules.drl
├── over-provisioned-redshift-cluster-recommendation-rules.drl
├── idle-dynamodb-table-rules.drl
├── rds-extended-support-rules.drl
├── eks-rules.drl
├── modernise-elasticache-rules.drl
└── 30+ more rule files...
Example Rule Anatomy (OverProvisioned EC2):
rule "Generate OverProvisioned EC2 Recommendations"
when
$instance: Ec2InstanceInfo()
then
// 1. Validate recommendation criteria
if ($instance.getRecommendedInstanceType() == null) return;
if ($instance.getOdCostPerHour() <= $instance.getRecommendedInstanceCostPerHour()) return;
if ($instance.getInstanceType().equals($instance.getRecommendedInstanceType())) return;
// 2. Calculate savings
double recommendedCostPerHour = Math.max(0,
$instance.getOdCostPerHour()
.subtract(BigDecimal.valueOf($instance.getRecommendedInstanceCostPerHour()))
.doubleValue()
);
// 3. Minimum savings threshold ($0.005/month)
if (recommendedCostPerHour * 720 <= 0.005) return;
// 4. Create recommendation
RecommendationInfo recommendation = new RecommendationInfo(
$instance.getAccountId(),
instanceId,
$instance.getInstanceName(),
$instance.getRegion(),
description,
action,
message,
currentCost,
recommendedCostPerHour,
status,
metadata
);
recommendationList.add(recommendation);
end
Rule Parameters (Configurable):
- Lookback period: 30 days (default)
- CPU threshold: 30% (default)
- CloudWatch metric period: 3600 seconds
- Minimum savings: $0.005/month
3. Multi-Database Architecture
Why Multiple Databases?
Each database optimized for specific data patterns:
MongoDB (Document Store)
Purpose: Recommendations, schedules, user preferences
Why MongoDB?:
- ✅ Flexible schema for varied recommendation types
- ✅ Fast writes for high-volume recommendations
- ✅ JSON-like documents match API responses
- ✅ Horizontal scalability
Collections:
tuner_db
├── recommendations # Generated recommendations
├── scheduler_configs # Schedule definitions
├── tag_scheduler_configs # Tag-based schedules
├── ec2_resources # EC2 metadata cache
├── rds_resources # RDS metadata cache
└── user_preferences # User settings
Example Recommendation Document:
{
"_id": "rec_abcd1234",
"accountId": "123456789012",
"resourceId": "i-0abcd1234efgh5678",
"resourceName": "api-server-prod-3",
"region": "us-east-1",
"category": "OverProvisioned",
"service": "EC2",
"description": "Maximum CPU utilisation of EC2 is 12%...",
"action": "Downsize EC2 instance type from m5.2xlarge to m5.large",
"currentCost": 280.32,
"recommendedCost": 70.08,
"monthlySavings": 210.24,
"annualSavings": 2522.88,
"metadata": {
"instanceType": "m5.2xlarge",
"cpu": 8,
"memory": 32768,
"cpuUtilization": 12,
"recommendedInstanceType": "m5.large"
},
"status": "GENERATED",
"createdAt": "2025-10-26T12:00:00Z",
"updatedAt": "2025-10-26T12:00:00Z"
}
MySQL (Relational Database)
Purpose: Account metadata, user management, transactional data
Why MySQL?:
- ✅ ACID compliance for critical data
- ✅ Strong referential integrity
- ✅ Mature tooling and ecosystem
- ✅ CloudKeeper platform standard
Schema (Key Tables):
tuner_schema
├── accounts # AWS account information
├── users # User authentication and roles
├── permissions # RBAC permissions
├── audit_logs # Change tracking
├── iam_roles # AWS IAM role configurations
└── account_regions # Account-region mappings
Snowflake (Analytics Data Warehouse)
Purpose: Cost & Usage Report (CUR) data, historical analytics
Why Snowflake?:
- ✅ Massive scale (petabytes of cost data)
- ✅ Columnar storage for fast aggregations
- ✅ Shared with AWS Lens (data reuse)
- ✅ Time-series optimized queries
Key Queries:
- Historical cost trends for savings calculations
- Scheduler savings validation
- ROI tracking and reporting
- Multi-account cost attribution
Redis (Cache)
Purpose: API response caching, session management
Why Redis?:
- ✅ Sub-millisecond latency
- ✅ Reduces database load
- ✅ TTL support for cache expiration
Cached Data:
- Recommendation lists (TTL: 1 hour)
- AWS pricing data (TTL: 24 hours)
- User sessions (TTL: session timeout)
4. Event Processing Architecture
RabbitMQ (Message Queue)
Purpose: Asynchronous event processing and job distribution
Message Flows:
Events:
SYNC_RECOMMENDATION_SUCCESS- Recommendation generatedSYNC_RECOMMENDATION_FAILURE- Recommendation job failedSCHEDULER_EVENT_START- Resource started by schedulerSCHEDULER_EVENT_STOP- Resource stopped by scheduler
Quartz Scheduler
Purpose: Job scheduling framework for recurring tasks
Scheduled Jobs:
| Job | Frequency | Purpose |
|---|---|---|
SyncEc2ResourcesJob | Every 6 hours | Sync EC2 metadata |
SyncRdsResourcesJob | Every 6 hours | Sync RDS metadata |
OverProvisionedEc2RecommendationJob | Daily | Generate EC2 rightsizing recommendations |
SnapshotRecommendationJob | Weekly | Identify orphaned snapshots |
AccountSchedulerJobProcessor | Cron-based | Execute start/stop schedules |
Technology Stack
Backend
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Application Framework | Spring Boot | 2.7.4 | Core application framework |
| Configuration | Spring Cloud Config | 2021.0.8 | Centralized configuration |
| Rule Engine | Drools | 7.73.0.Final | Business rules management |
| Job Scheduler | Quartz | (via quartz-utils) | Scheduled job execution |
| Message Queue | RabbitMQ | Latest | Async event processing |
| Document Database | MongoDB | 5.0+ | Recommendations storage |
| Relational Database | MySQL | 8.0.33 | Account/user management |
| Analytics Database | Snowflake | Latest | Cost analytics |
| Cache | Redis | 6.0+ | API caching |
| Language | Java | 17 | Primary language |
| Build Tool | Gradle | 7.x | Build automation |
Frontend
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Framework | React | 18.x | UI framework |
| Language | TypeScript | 4.x | Type-safe development |
| State Management | Redux Toolkit | (TBD) | Global state |
| HTTP Client | Axios | Latest | API communication |
| Charts | Recharts | Latest | Data visualization |
Browser Extension
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Framework | React | 18 | Extension UI |
| Language | TypeScript | 4.x | Type safety |
| Bundler | Webpack | 5 | Build tool |
| Manifest | V3 (Chrome), V2 (Firefox) | Latest | Extension config |
| Authentication | JWT | - | Token-based auth |
Data Flow Architecture
Recommendation Generation Flow
Timeline:
- Hour 0: Resource sync begins
- Hour 6: Second sync (incremental updates)
- Hour 24: First recommendation job runs
- Hour 25: Recommendations available in UI
Integration Architecture
External Integrations
AWS Services
Read-Only API Calls:
EC2:
- DescribeInstances
- DescribeVolumes
- DescribeSnapshots
- DescribeNatGateways
- DescribeVpcEndpoints
RDS:
- DescribeDBInstances
- DescribeDBClusters
CloudWatch:
- GetMetricStatistics
- ListMetrics
Pricing:
- GetProducts
Cost Explorer:
- GetCostAndUsage
Cross-Account Access:
AWS Account (Customer)
│
├─ IAM Role: CloudKeeperTunerRole
│ ├─ Trust Policy: Allow AssumeRole from CloudKeeper account
│ ├─ External ID: Unique per customer
│ └─ Permissions: Read-only (ec2:Describe*, rds:Describe*, etc.)
│
└─ AssumeRole call from Tuner
└─ Temporary credentials (1 hour TTL)
CloudKeeper Platform
Internal Services:
- AuthX: Authentication and authorization
- AWS Lens: Shared Snowflake CUR data
- Config Server: Centralized configuration
Deployment Architecture
Production Environment
Scalability
Horizontal Scaling:
- tuner-core: 3-10 instances (auto-scaling based on CPU/memory)
- MongoDB: Sharded cluster for large customers
- RabbitMQ: Clustered for high throughput
Vertical Scaling:
- tuner-core: 4-8 CPU, 8-16 GB RAM per instance
- MongoDB: 8-16 CPU, 32-64 GB RAM
- MySQL: 4-8 CPU, 16-32 GB RAM
Security Architecture
Defense in Depth
Compliance:
- SOC 2 Type II (TBD)
- GDPR compliant
- Data residency controls
Next Steps
- Recommendation Engine - Deep dive on Drools rules
- Security & Compliance - Security architecture details
- API Reference - API documentation
- Data Architecture - Database schemas and models