Recommendation Engine Deep Dive
Module: Tuner Component: Recommendation Engine Version: 1.0.0-RELEASE Last Updated: October 26, 2025 Document Type: Technical Architecture (Architect Reference)
Table of Contents
- Introduction
- Drools Rule Engine Architecture
- Recommendation Job Framework
- Rule Definitions & Business Logic
- Recommendation Types Catalog
- Data Collection & Analysis
- Cost Calculation Engine
- Performance & Scalability
- Extending the Engine
Introduction
Purpose
The AWS Tuner Recommendation Engine is a rule-based decision system that analyzes cloud infrastructure to identify cost optimization opportunities. Unlike machine learning approaches, it uses explicit business rules defined in Drools (Business Rules Management System) to ensure recommendations are:
- Transparent: Business logic visible and auditable
- Predictable: Same inputs always produce same outputs
- Explainable: Clear reasoning for each recommendation
- Configurable: Thresholds adjustable without code changes
- Extensible: New recommendation types added without redeployment
Architecture Philosophy
Rule-Based vs. ML-Based:
| Aspect | Rule-Based (Drools) | ML-Based |
|---|---|---|
| Transparency | ✅ Fully transparent | ❌ Black box |
| Explainability | ✅ Clear decision path | ⚠️ Difficult to explain |
| Consistency | ✅ Deterministic | ⚠️ Probabilistic |
| Maintenance | ✅ Rules updated easily | ❌ Requires retraining |
| Regulatory Compliance | ✅ Auditable | ⚠️ Challenging |
| Edge Cases | ✅ Explicit handling | ⚠️ May fail unexpectedly |
Why Drools?
Tuner uses Drools because cost optimization requires explainable, auditable decisions that can be validated by finance and engineering teams. When recommending a $10K/month change, stakeholders need to understand why the recommendation was made.
Drools Rule Engine Architecture
Component Overview
Drools Workflow
Step 1: Rule Definition (Business Rules in .drl files)
package com.ttn.ck.tuner.recommendation.rules;
import com.ttn.ck.tuner.utils.aws.Ec2InstanceInfo;
import com.ttn.ck.tuner.utils.dtos.recommendation.RecommendationInfo;
global org.slf4j.Logger log;
global java.util.Map priceMap;
global java.lang.Integer restParams;
global java.util.List recommendationList;
rule "Generate OverProvisioned EC2 Recommendations"
when
$instance: Ec2InstanceInfo()
then
log.info("Generating recommendations for EC2 instance: " + $instance.getResourceId());
String instanceId = $instance.getResourceId();
// Validation 1: Ensure recommended instance type exists
if($instance.getRecommendedInstanceType() == null){
log.info("Skipping recommendation for instance {} as no recommendedInstanceType found.", instanceId);
return;
}
// Validation 2: Ensure savings exist
if($instance.getOdCostPerHour().doubleValue() <= $instance.getRecommendedInstanceCostPerHour()){
log.debug("Skipping recommendation for instance {} as current cost per hour {} is less than or equal to recommended cost per hour {}",
instanceId, $instance.getOdCostPerHour(), $instance.getRecommendedInstanceCostPerHour());
return;
}
// Validation 3: Ensure instance type is different
if($instance.getInstanceType().equals($instance.getRecommendedInstanceType())){
log.info("Skipping recommendation because the current instanceType: {} is already the recommended instanceType: {} for instance: {}",
$instance.getInstanceType(), $instance.getRecommendedInstanceType(), instanceId);
return;
}
// Calculate monthly savings
double recommendedCostPerHour = Math.max(
0,
$instance.getOdCostPerHour()
.subtract(BigDecimal.valueOf($instance.getRecommendedInstanceCostPerHour()))
.doubleValue()
);
// Validation 4: Minimum savings threshold
if(recommendedCostPerHour*720 <= 0.005){
log.warn("Recommendation not generated as Potential Saving for EC2 instance: {} is not greater than: ${} in account: {}, region: {}",
instanceId, $instance.getRecommendedInstanceCostPerHour(), $instance.getAccountId(), $instance.getRegion());
return;
}
// Generate recommendation
String status = "GENERATED";
String description = String.format("Maximum CPU utilisation of EC2 is %d%%\nThe instance has been overprovisioned for the past %d days",
$instance.getCpuUtilization(), restParams);
String action = String.format("Downsize EC2 instance type from %s to %s",
$instance.getInstanceType(), $instance.getRecommendedInstanceType());
String message = String.format("EC2 instance [%s] is overprovisioned. Current cost per hour: %.3f USD, Recommended cost per hour: %.3f USD. Status: %s",
instanceId, $instance.getOdCostPerHour(), $instance.getRecommendedInstanceCostPerHour(), status);
// Build metadata JSON
String metadata = String.format(
"{\"instanceType\": \"%s\", \"cpu\": %d, " +
"\"memory\": %d, \"cpuUtilization\": %d, " +
"\"tenancy\": \"%s\", \"architecture\": \"%s\", " +
"\"platformDetails\": \"%s\", \"recommendedInstanceType\": \"%s\"}",
$instance.getInstanceType(),
$instance.getCpu(),
(int) $instance.getMemory(),
$instance.getCpuUtilization(),
$instance.getTenancy(),
$instance.getArchitecture(),
$instance.getPlatformDetails(),
$instance.getRecommendedInstanceType()
);
// Create recommendation object
RecommendationInfo recommendation = new RecommendationInfo(
$instance.getAccountId(),
instanceId,
$instance.getInstanceName(),
$instance.getRegion(),
description,
action,
message,
Double.parseDouble(String.valueOf($instance.getOdCostPerHour())),
recommendedCostPerHour,
status,
metadata
);
// Add to results list
recommendationList.add(recommendation);
log.info("Recommendation added for OverProvisioned EC2 instance: {} recommended:{} and status: {}",
instanceId, $instance.getRecommendedInstanceType(), status);
end
Step 2: Rule Compilation (Automatic at runtime)
Drools compiles .drl files to Java bytecode for fast execution:
KieServices kieServices = KieServices.Factory.get();
KieContainer kieContainer = kieServices.getKieClasspathContainer();
KieBase kieBase = kieContainer.getKieBase("tuner-recommendation-rules");
Step 3: Fact Insertion (Input data to rule engine)
KieSession kieSession = kieBase.newKieSession();
// Set global variables
kieSession.setGlobal("log", logger);
kieSession.setGlobal("recommendationList", new ArrayList<RecommendationInfo>());
kieSession.setGlobal("priceMap", pricingData);
kieSession.setGlobal("restParams", lookbackPeriod);
// Insert facts (EC2 instances to analyze)
for (Ec2InstanceInfo instance : instanceList) {
kieSession.insert(instance);
}
// Fire all rules
kieSession.fireAllRules();
// Retrieve results
List<RecommendationInfo> recommendations = (List<RecommendationInfo>) kieSession.getGlobal("recommendationList");
Step 4: Rule Execution (Pattern matching and inference)
Drools uses RETE algorithm for efficient pattern matching:
For each fact (Ec2InstanceInfo):
1. Match against rule conditions (when clause)
2. If matched, execute action (then clause)
3. Continue to next fact
Drools optimizes by:
- Indexing facts for fast lookup
- Caching intermediate results
- Parallel evaluation (where possible)
Recommendation Job Framework
Job Processor Interface
All recommendation jobs implement JobProcessor:
public interface JobProcessor {
void process(Map<String, Object> dataMap);
}
Standard Job Implementation Pattern
@Service("OVER_PROVISIONED_EC2_RECOMMENDATION_JOB_PROCESSOR")
@Slf4j
@RequiredArgsConstructor
public class OverProvisionedEc2RecommendationJob implements JobProcessor {
// Configuration (from application.yml)
@Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.overProvisionedLookBackPeriod:30}")
private int overProvisionedLookBackPeriod;
@Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.overProvisionedThreshold:30}")
private int overProvisionedThreshold;
@Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.cloudWatch.metricPeriod:3600}")
private Integer cloudWatchMetricPeriod;
// Dependencies
private final TunerEventService tunerEventService;
private final DroolsEngine droolsEngine;
private final OverProvisionedEc2Service operProvisionedEc2Service;
private final Ec2ResourceRepository ec2ResourceRepository;
private final OverProvisionedEc2PricingService operProvisionedEc2PricingService;
@Override
public void process(Map<String, Object> dataMap) {
String accountId = (String) dataMap.get(ACCOUNT_ID);
String factorKey = (String) dataMap.get(FACTOR_KEY);
try {
log.info("Processing {} recommendations with data: {}", factorKey, dataMap);
// 1. Generate recommendations
List<RecommendationInfo> recommendationInfos = generateRecommendation(dataMap);
// 2. Publish results
dataMap.put(RECOMMENDATION, recommendationInfos);
tunerEventService.sendEvent(TunerEvent.SYNC_RECOMMENDATION_SUCCESS, accountId, dataMap);
log.info("{} Recommendation job completed successfully for account: {}", factorKey, dataMap);
} catch (Exception e) {
log.error("Error processing {} Recommendation job for account: {}, error: {}",
factorKey, dataMap, e.getMessage(), e);
}
}
private List<RecommendationInfo> generateRecommendation(Map<String, Object> jobDataMap) {
String accountId = (String) jobDataMap.get(ACCOUNT_ID);
String region = (String) jobDataMap.get(REGION);
// 1. Fetch EC2 resources from MongoDB cache
List<Ec2InstanceInfo> ec2InstancesList =
ec2ResourceRepository.findByAccountIdInAndRegion(List.of(accountId), region).stream()
.map(Ec2ResourceDocument::getResourceInfo)
.toList();
// 2. Fetch CloudWatch metrics and analyze utilization
AccountEc2InstanceInfo accountOperProvisionedEc2InstanceInfo =
operProvisionedEc2Service.getAllOperProvisionedEc2Instances(
awsAccountDto,
region,
ec2InstancesList,
overProvisionedLookBackPeriod, // 30 days
overProvisionedThreshold, // 30% CPU
cloudWatchMetricPeriod // 1 hour granularity
);
if (accountOperProvisionedEc2InstanceInfo.getInstances().isEmpty()) {
log.info("No {} resource for account: {}, region: {}", factorKey, accountId, region);
return List.of();
}
// 3. Find optimal instance types with pricing
List<Ec2InstanceInfo> optimizedEc2Instances =
operProvisionedEc2PricingService.getOptimizedEc2InstanceTypes(
accountOperProvisionedEc2InstanceInfo.getInstances(),
accountId,
region
);
// 4. Execute Drools rules
List<RecommendationInfo> recommendationInfos =
droolsEngine.fireRules(
accountId,
factor.getDroolRuleFilePath(), // "rules/oper_provisioned_ec2_instance_rules.drl"
optimizedEc2Instances,
null,
overProvisionedLookBackPeriod
);
log.info("Generated {} resource recommendations for account: {}, region: {}, total recommendations: {}",
factorKey, accountId, region, recommendationInfos.size());
return recommendationInfos;
}
}
Job Scheduling
Jobs are scheduled using Quartz:
// application.yml
tuner:
recommendation:
sync:
schedule:
overProvisionedEc2: "0 0 2 * * ?" # Daily at 2 AM
snapshot: "0 0 3 * * SUN" # Weekly Sunday 3 AM
idleNatGateway: "0 0 4 * * ?" # Daily at 4 AM
Quartz triggers invoke job processors:
@Component
@Slf4j
public class RecommendationJobScheduler {
@Scheduled(cron = "${tuner.recommendation.sync.schedule.overProvisionedEc2}")
public void scheduleOverProvisionedEc2Recommendations() {
// For each account and region
for (Account account : accountService.getAllAccounts()) {
for (String region : account.getRegions()) {
Map<String, Object> jobData = Map.of(
"accountId", account.getAccountId(),
"region", region,
"factorKey", "OVER_PROVISIONED_EC2"
);
jobProcessor.process(jobData);
}
}
}
}
Rule Definitions & Business Logic
Rule Categories
Tuner uses 40+ rule files organized by recommendation type:
1. Cleaner Rules
Purpose: Identify unused resources to delete
| Rule File | Target Service | Logic |
|---|---|---|
snapshot-rules.drl | EBS Snapshots | Age > 90 days AND no attached volumes |
natgateway-rules.drl | NAT Gateway | Bytes transferred = 0 for 30 days |
idle-vpc-endpoint-recommendation-rules.drl | VPC Endpoint | No active connections for 30 days |
s3-incomplete-multipart-rules.drl | S3 Multipart | Upload initiated > 7 days ago, not completed |
idle-dynamodb-table-rules.drl | DynamoDB | Read/Write capacity units = 0 for 30 days |
unused-ebs-snapshot-rules.drl | EBS Snapshots | No AMI reference, volume deleted |
unused-ami-recommendation-rules.drl | AMI | No running instances, age > 180 days |
idle-emr-cluster-rules.drl | EMR | Cluster in WAITING state > 24 hours |
Example: Snapshot Cleanup Rule
rule "Generate EBS Snapshot Cleanup Recommendations"
when
$snapshot: EbsSnapshotInfo(
age > 90,
volumeId == null || volumeStatus == "deleted",
used_in_ami == false
)
then
double monthlyStorage = $snapshot.getSize() * 0.05; // $0.05 per GB-month
String description = String.format(
"EBS snapshot is %d days old and the volume has been deleted. " +
"Snapshot is not used in any AMI.",
$snapshot.getAge()
);
String action = "Delete EBS snapshot " + $snapshot.getSnapshotId();
RecommendationInfo recommendation = new RecommendationInfo(
$snapshot.getAccountId(),
$snapshot.getSnapshotId(),
$snapshot.getDescription(),
$snapshot.getRegion(),
description,
action,
"Snapshot cleanup opportunity",
monthlyStorage,
0.0, // Cost after deletion
"GENERATED",
buildMetadata($snapshot)
);
recommendationList.add(recommendation);
end
2. OverProvisioned Rules
Purpose: Identify resources with excess capacity
| Rule File | Target Service | Logic |
|---|---|---|
oper_provisioned_ec2_instance_rules.drl | EC2 | CPU < 30% AND memory < 40% for 30 days |
over-provisioned-redshift-cluster-recommendation-rules.drl | Redshift | CPU < 30% for 30 days |
over-provisioned-ecs-fargate-recommendation-rules.drl | ECS Fargate | CPU < 30% AND memory < 40% for 30 days |
Thresholds (Configurable):
tuner:
recommendation:
config:
ec2Instance:
overProvisionedLookBackPeriod: 30 # Days to analyze
overProvisionedThreshold: 30 # CPU % threshold
rdsInstance:
overProvisionedLookBackPeriod: 30
overProvisionedCpuThreshold: 30
overProvisionedMemoryThreshold: 40
redshift:
overProvisionedLookBackPeriod: 30
overProvisionedCpuThreshold: 30
3. Modernization Rules
Purpose: Identify upgrade opportunities to avoid surcharges or use better alternatives
| Rule File | Target Service | Logic |
|---|---|---|
rds-extended-support-rules.drl | RDS | MySQL/PostgreSQL version end-of-support within 6 months |
eks-rules.drl | EKS | Kubernetes version end-of-support within 3 months |
modernise-elasticache-rules.drl | ElastiCache | Redis version < 7.0, recommend Valkey migration |
modernise_opensearch-rules.drl | OpenSearch | Using gp2 volumes, recommend gp3 upgrade |
elasticache-rules.drl | ElastiCache | Redis end-of-support, recommend Valkey |
ms-sql-server-licence-cost-recommendation-rules.drl | RDS SQL Server | License Included, recommend BYOL |
Example: RDS Extended Support Rule
rule "Generate RDS Extended Support Recommendations"
when
$rds: RdsInstanceInfo(
engine == "mysql" || engine == "postgres",
engineVersion in ("5.7", "10.x", "11.x"), // End-of-support versions
daysUntilExtendedSupport < 180 // Within 6 months
)
then
// Extended support adds 50% surcharge
double currentCost = $rds.getMonthlyCost();
double extendedSupportCost = currentCost * 1.5;
double recommendedCost = currentCost; // Upgrade to supported version
String description = String.format(
"RDS %s version %s is approaching end-of-standard-support. " +
"Extended support will add 50%% surcharge ($%.2f/month additional). " +
"Upgrade to version %s to avoid surcharge.",
$rds.getEngine(),
$rds.getEngineVersion(),
currentCost * 0.5,
$rds.getRecommendedVersion()
);
String action = String.format(
"Upgrade RDS instance from %s %s to %s",
$rds.getEngine(),
$rds.getEngineVersion(),
$rds.getRecommendedVersion()
);
double monthlySavings = extendedSupportCost - recommendedCost;
RecommendationInfo recommendation = new RecommendationInfo(
$rds.getAccountId(),
$rds.getDbInstanceIdentifier(),
$rds.getDbName(),
$rds.getRegion(),
description,
action,
"Modernization opportunity",
extendedSupportCost, // Future cost if not upgraded
recommendedCost, // Cost if upgraded
"GENERATED",
buildMetadata($rds)
);
recommendationList.add(recommendation);
end
4. Idle Resource Rules (GCP Support)
Purpose: Multi-cloud optimization for Google Cloud Platform
| Rule File | Target Service | Logic |
|---|---|---|
idle-cloud-vm-recommendation-rules.drl | GCP Compute | CPU < 10% for 30 days |
idle-cloud-sql-recommendation-rules.drl | Cloud SQL | Connections = 0 for 30 days |
idle-static-ip-recommendation-rules.drl | Static IP | Not attached to instance for 30 days |
idle-persistent-disk-recommendation-rules.drl | Persistent Disk | Not attached, age > 30 days |
idle-load-balancer-gcp-recommendation-rules.drl | Load Balancer | Request count = 0 for 30 days |
Recommendation Types Catalog
Complete List (42+ Types)
AWS Compute (8 types)
- OverProvisioned EC2 - Downsize underutilized instances
- Idle EC2 - Stop or terminate instances with no traffic
- EC2 Reserved Instances - Purchase commitment discounts
- EC2 Savings Plans - Flexible commitment discounts
- OverProvisioned ECS Fargate - Rightsize Fargate tasks
- Lambda Error Rate - Fix functions with high error rates
- Lambda Timeout - Optimize timeout configurations
- Idle EMR Cluster - Terminate idle EMR clusters
AWS Storage (7 types)
- EBS Snapshot Cleanup - Delete orphaned snapshots
- EBS Volume Upgrade - Migrate gp2 → gp3
- S3 Lifecycle Policy - Implement intelligent tiering
- S3 Incomplete Multipart - Clean up failed uploads
- AMI Cleanup - Delete unused AMIs
- ECR Lifecycle Policy - Delete old container images
- Unused EBS Snapshot - Delete snapshots with no volume
AWS Database (7 types)
- OverProvisioned RDS - Downsize underutilized databases
- Idle RDS - Stop unused databases
- RDS Extended Support - Upgrade to avoid surcharges
- RDS Reserved Instances - Purchase commitment discounts
- OverProvisioned Redshift - Rightsize Redshift clusters
- Idle Redshift - Pause idle clusters
- DynamoDB Idle Table - Delete or archive unused tables
AWS Networking (4 types)
- Idle NAT Gateway - Delete NAT gateways with no traffic
- Idle VPC Endpoint - Delete unused VPC endpoints
- Idle Load Balancer - Delete load balancers with no targets
- Idle Network Firewall - Delete unused firewalls
AWS Modernization (6 types)
- ElastiCache Redis → Valkey - Migrate to open-source alternative
- Modernize OpenSearch - Upgrade to gp3 volumes
- EKS Extended Support - Upgrade Kubernetes version
- MS SQL Server License - Migrate to BYOL
- Database Migration Service - Optimize DMS instances
- Route53 - Optimize hosted zone costs
AWS Other (3 types)
- EC2 Attach Volume - Attach unattached EBS volumes
- Compute Savings Plans - Flexible compute discounts
- Spot Instance Recommendations - Workload suitability analysis
GCP Compute (3 types)
- Idle Compute Instance (GCP) - Stop idle VMs
- Idle Machine Image (GCP) - Delete unused images
- Idle NAT Gateway (GCP) - Delete idle Cloud NAT
GCP Storage (2 types)
- Idle Persistent Disk (GCP) - Delete unattached disks
- Idle PD Snapshots (GCP) - Delete old snapshots
GCP Database (2 types)
- Idle Cloud SQL (GCP) - Stop idle databases
- Idle Memorystore (GCP) - Delete idle Redis instances
GCP Networking (2 types)
- Idle Load Balancer (GCP) - Delete unused load balancers
- Idle Static IP (GCP) - Release unused static IPs
Data Collection & Analysis
CloudWatch Metrics Collection
Metrics Gathered:
// EC2 Metrics (30-day period, 1-hour granularity)
CloudWatch.getMetricStatistics(
namespace: "AWS/EC2",
metricName: "CPUUtilization",
dimensions: [{"Name": "InstanceId", "Value": "i-xxxxx"}],
startTime: now() - 30.days,
endTime: now(),
period: 3600, // 1 hour
statistics: ["Average", "Maximum"]
)
// Additional EC2 Metrics
- NetworkIn (bytes)
- NetworkOut (bytes)
- DiskReadBytes
- DiskWriteBytes
// RDS Metrics
- CPUUtilization
- DatabaseConnections
- ReadIOPS
- WriteIOPS
- FreeableMemory
// NAT Gateway Metrics
- BytesInFromDestination
- BytesInFromSource
- BytesOutToDestination
- BytesOutToSource
Utilization Calculation:
public int calculateCpuUtilization(List<Datapoint> datapoints) {
if (datapoints.isEmpty()) return 0;
// Calculate average of all hourly averages over 30 days
double sum = datapoints.stream()
.mapToDouble(Datapoint::getAverage)
.sum();
int avgUtilization = (int) Math.ceil(sum / datapoints.size());
log.debug("CPU Utilization: {} datapoints, average: {}%",
datapoints.size(), avgUtilization);
return avgUtilization;
}
AWS Pricing Data Collection
Pricing API Integration:
public class OverProvisionedEc2PricingService {
public List<Ec2InstanceInfo> getOptimizedEc2InstanceTypes(
List<Ec2InstanceInfo> instances,
String accountId,
String region
) {
List<Ec2InstanceInfo> optimizedList = new ArrayList<>();
for (Ec2InstanceInfo instance : instances) {
// Get pricing for current instance type
double currentPrice = getPricing(
instance.getInstanceType(),
region,
instance.getPlatformDetails(),
instance.getTenancy()
);
// Find optimal alternative based on utilization
String optimalType = findOptimalInstanceType(
instance.getCpuUtilization(),
instance.getMemoryUtilization(),
instance.getInstanceType(),
instance.getArchitecture()
);
// Get pricing for recommended type
double recommendedPrice = getPricing(
optimalType,
region,
instance.getPlatformDetails(),
instance.getTenancy()
);
// Enrich instance info
instance.setRecommendedInstanceType(optimalType);
instance.setOdCostPerHour(BigDecimal.valueOf(currentPrice));
instance.setRecommendedInstanceCostPerHour(recommendedPrice);
optimizedList.add(instance);
}
return optimizedList;
}
private String findOptimalInstanceType(
int cpuUtil,
int memoryUtil,
String currentType,
String architecture
) {
// Logic to find right-sized instance
// Example: m5.2xlarge (8 vCPU, 32 GB) with 12% CPU → m5.large (2 vCPU, 8 GB)
InstanceSpec currentSpec = parseInstanceType(currentType);
// Target: 70-80% utilization on recommended instance
int targetCpu = (int) Math.ceil(currentSpec.vCpus * cpuUtil / 75.0);
int targetMemory = (int) Math.ceil(currentSpec.memoryGb * memoryUtil / 75.0);
// Find smallest instance that meets target
return findSmallestInstanceMatchingSpec(
targetCpu,
targetMemory,
currentSpec.family, // Prefer same family (m5)
architecture // x86_64 or arm64
);
}
}
Pricing Cache:
Pricing data cached in Redis (24-hour TTL):
@Cacheable(value = "ec2-pricing", key = "#instanceType + '-' + #region")
public double getPricing(String instanceType, String region, String os, String tenancy) {
// Query AWS Pricing API or cache
}
Cost Calculation Engine
Savings Calculation
public class CostCalculator {
public double calculateMonthlySavings(
double currentCostPerHour,
double recommendedCostPerHour
) {
double hourlySavings = Math.max(0, currentCostPerHour - recommendedCostPerHour);
return hourlySavings * 730; // Average hours per month
}
public double calculateSchedulerSavings(
double instanceCostPerHour,
int hoursOffPerWeek
) {
int hoursOffPerMonth = (int) (hoursOffPerWeek * 4.3); // 4.3 weeks per month
return instanceCostPerHour * hoursOffPerMonth;
}
public double calculateSnapshotSavings(
int sizeGB,
String region
) {
double pricePerGbMonth = getEbsSnapshotPricing(region); // $0.05/GB-month
return sizeGB * pricePerGbMonth;
}
}
ROI Tracking
Projected vs. Realized Savings:
public class SavingsTracker {
public SavingsReport calculateRealizedSavings(
String accountId,
String month
) {
// 1. Get all implemented recommendations for month
List<Recommendation> implemented =
recommendationRepository.findByAccountIdAndStatusAndMonth(
accountId,
"IMPLEMENTED",
month
);
// 2. Query actual cost from Snowflake CUR data
Map<String, Double> actualCosts = snowflakeService.getActualCosts(
accountId,
month,
implemented.stream().map(Recommendation::getResourceId).toList()
);
// 3. Compare projected vs. actual
double totalProjected = 0;
double totalRealized = 0;
for (Recommendation rec : implemented) {
double projected = rec.getProjectedMonthlySavings();
double actual = rec.getCurrentCost() - actualCosts.get(rec.getResourceId());
totalProjected += projected;
totalRealized += actual;
}
return new SavingsReport(
totalProjected,
totalRealized,
(totalRealized / totalProjected) * 100 // % accuracy
);
}
}
Performance & Scalability
Performance Characteristics
Recommendation Generation:
| Metric | Value | Notes |
|---|---|---|
| EC2 Analysis | ~100 instances/second | CloudWatch API bottleneck |
| Rule Evaluation | ~10,000 facts/second | Drools RETE algorithm |
| Pricing Lookup | ~500 lookups/second | Redis cache hit: 95% |
| Database Write | ~1,000 recommendations/second | MongoDB bulk insert |
Scalability Limits:
| Accounts | Instances | Daily Recommendation Time | Notes |
|---|---|---|---|
| 10 | 1,000 | 5 minutes | Single tuner-core instance |
| 100 | 10,000 | 45 minutes | 3 tuner-core instances |
| 1,000 | 100,000 | 6 hours | 10 tuner-core instances + sharded MongoDB |
Optimization Techniques
1. Caching Strategy:
// L1: Redis cache (24-hour TTL)
@Cacheable(value = "pricing-data")
public PricingInfo getPricing(String instanceType, String region);
// L2: MongoDB resource cache (6-hour refresh)
@Cacheable(value = "ec2-resources", ttl = 21600)
public List<Ec2InstanceInfo> getEc2Resources(String accountId, String region);
// L3: JVM cache (1-hour TTL)
@Cacheable(value = "drools-rules", ttl = 3600)
public KieBase loadRules(String ruleFile);
2. Batch Processing:
// Process instances in batches of 100
List<List<Ec2InstanceInfo>> batches = Lists.partition(instances, 100);
for (List<Ec2InstanceInfo> batch : batches) {
List<RecommendationInfo> batchRecommendations = droolsEngine.fireRules(
accountId,
ruleFile,
batch,
priceMap,
lookbackPeriod
);
// Bulk insert to MongoDB
recommendationRepository.insertAll(batchRecommendations);
}
3. Async Processing:
@Async
public CompletableFuture<List<RecommendationInfo>> generateRecommendationsAsync(
String accountId,
String region
) {
List<RecommendationInfo> recommendations = generateRecommendation(accountId, region);
return CompletableFuture.completedFuture(recommendations);
}
// Parallel execution for multiple regions
List<CompletableFuture<List<RecommendationInfo>>> futures = regions.stream()
.map(region -> generateRecommendationsAsync(accountId, region))
.toList();
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
Extending the Engine
Adding a New Recommendation Type
Step 1: Create Drools Rule File
src/main/resources/rules/my-new-recommendation-rules.drl:
package com.ttn.ck.tuner.recommendation.rules;
import com.ttn.ck.tuner.utils.aws.MyResourceInfo;
import com.ttn.ck.tuner.utils.dtos.recommendation.RecommendationInfo;
global org.slf4j.Logger log;
global java.util.List recommendationList;
rule "Generate My New Recommendations"
when
$resource: MyResourceInfo()
then
// Your business logic here
if ($resource.shouldRecommend()) {
RecommendationInfo recommendation = new RecommendationInfo(...);
recommendationList.add(recommendation);
}
end
Step 2: Create Job Processor
@Service("MY_NEW_RECOMMENDATION_JOB_PROCESSOR")
@Slf4j
@RequiredArgsConstructor
public class MyNewRecommendationJob implements JobProcessor {
private final DroolsEngine droolsEngine;
private final MyResourceService myResourceService;
@Override
public void process(Map<String, Object> dataMap) {
String accountId = (String) dataMap.get("accountId");
String region = (String) dataMap.get("region");
// 1. Collect data
List<MyResourceInfo> resources = myResourceService.getResources(accountId, region);
// 2. Execute Drools rules
List<RecommendationInfo> recommendations = droolsEngine.fireRules(
accountId,
"rules/my-new-recommendation-rules.drl",
resources,
null,
30
);
// 3. Publish results
tunerEventService.sendEvent(
TunerEvent.SYNC_RECOMMENDATION_SUCCESS,
accountId,
Map.of("recommendations", recommendations)
);
}
}
Step 3: Register in Configuration
public enum TunerRecommendationFactor {
// ... existing factors
MY_NEW_RECOMMENDATION(
"MY_NEW_RECOMMENDATION",
"MY_NEW_RECOMMENDATION_JOB_PROCESSOR",
"rules/my-new-recommendation-rules.drl"
);
// ...
}
Step 4: Schedule Job
# application.yml
tuner:
recommendation:
sync:
schedule:
myNewRecommendation: "0 0 3 * * ?" # Daily at 3 AM
Testing New Rules
@SpringBootTest
class MyNewRecommendationJobTest {
@Autowired
private DroolsEngine droolsEngine;
@Test
void testMyNewRecommendationRule() {
// 1. Prepare test data
MyResourceInfo resource = MyResourceInfo.builder()
.resourceId("test-resource-123")
.accountId("123456789012")
.region("us-east-1")
.shouldRecommend(true)
.build();
// 2. Execute rules
List<RecommendationInfo> recommendations = droolsEngine.fireRules(
"123456789012",
"rules/my-new-recommendation-rules.drl",
List.of(resource),
null,
30
);
// 3. Verify results
assertThat(recommendations).hasSize(1);
assertThat(recommendations.get(0).getResourceId()).isEqualTo("test-resource-123");
assertThat(recommendations.get(0).getMonthlySavings()).isGreaterThan(0);
}
}
Next Steps
- Security & Compliance - Security architecture and compliance
- API Reference - REST API documentation
- Data Architecture - Database schemas and models