Skip to main content

Recommendation Engine Deep Dive


Module: Tuner Component: Recommendation Engine Version: 1.0.0-RELEASE Last Updated: October 26, 2025 Document Type: Technical Architecture (Architect Reference)


Table of Contents

  1. Introduction
  2. Drools Rule Engine Architecture
  3. Recommendation Job Framework
  4. Rule Definitions & Business Logic
  5. Recommendation Types Catalog
  6. Data Collection & Analysis
  7. Cost Calculation Engine
  8. Performance & Scalability
  9. Extending the Engine

Introduction

Purpose

The AWS Tuner Recommendation Engine is a rule-based decision system that analyzes cloud infrastructure to identify cost optimization opportunities. Unlike machine learning approaches, it uses explicit business rules defined in Drools (Business Rules Management System) to ensure recommendations are:

  • Transparent: Business logic visible and auditable
  • Predictable: Same inputs always produce same outputs
  • Explainable: Clear reasoning for each recommendation
  • Configurable: Thresholds adjustable without code changes
  • Extensible: New recommendation types added without redeployment

Architecture Philosophy

Rule-Based vs. ML-Based:

AspectRule-Based (Drools)ML-Based
Transparency✅ Fully transparent❌ Black box
Explainability✅ Clear decision path⚠️ Difficult to explain
Consistency✅ Deterministic⚠️ Probabilistic
Maintenance✅ Rules updated easily❌ Requires retraining
Regulatory Compliance✅ Auditable⚠️ Challenging
Edge Cases✅ Explicit handling⚠️ May fail unexpectedly

Why Drools?

Tuner uses Drools because cost optimization requires explainable, auditable decisions that can be validated by finance and engineering teams. When recommending a $10K/month change, stakeholders need to understand why the recommendation was made.


Drools Rule Engine Architecture

Component Overview

Drools Rule Engine Component Overview

Drools Workflow

Step 1: Rule Definition (Business Rules in .drl files)

package com.ttn.ck.tuner.recommendation.rules;

import com.ttn.ck.tuner.utils.aws.Ec2InstanceInfo;
import com.ttn.ck.tuner.utils.dtos.recommendation.RecommendationInfo;

global org.slf4j.Logger log;
global java.util.Map priceMap;
global java.lang.Integer restParams;
global java.util.List recommendationList;

rule "Generate OverProvisioned EC2 Recommendations"
when
$instance: Ec2InstanceInfo()
then
log.info("Generating recommendations for EC2 instance: " + $instance.getResourceId());

String instanceId = $instance.getResourceId();

// Validation 1: Ensure recommended instance type exists
if($instance.getRecommendedInstanceType() == null){
log.info("Skipping recommendation for instance {} as no recommendedInstanceType found.", instanceId);
return;
}

// Validation 2: Ensure savings exist
if($instance.getOdCostPerHour().doubleValue() <= $instance.getRecommendedInstanceCostPerHour()){
log.debug("Skipping recommendation for instance {} as current cost per hour {} is less than or equal to recommended cost per hour {}",
instanceId, $instance.getOdCostPerHour(), $instance.getRecommendedInstanceCostPerHour());
return;
}

// Validation 3: Ensure instance type is different
if($instance.getInstanceType().equals($instance.getRecommendedInstanceType())){
log.info("Skipping recommendation because the current instanceType: {} is already the recommended instanceType: {} for instance: {}",
$instance.getInstanceType(), $instance.getRecommendedInstanceType(), instanceId);
return;
}

// Calculate monthly savings
double recommendedCostPerHour = Math.max(
0,
$instance.getOdCostPerHour()
.subtract(BigDecimal.valueOf($instance.getRecommendedInstanceCostPerHour()))
.doubleValue()
);

// Validation 4: Minimum savings threshold
if(recommendedCostPerHour*720 <= 0.005){
log.warn("Recommendation not generated as Potential Saving for EC2 instance: {} is not greater than: ${} in account: {}, region: {}",
instanceId, $instance.getRecommendedInstanceCostPerHour(), $instance.getAccountId(), $instance.getRegion());
return;
}

// Generate recommendation
String status = "GENERATED";
String description = String.format("Maximum CPU utilisation of EC2 is %d%%\nThe instance has been overprovisioned for the past %d days",
$instance.getCpuUtilization(), restParams);
String action = String.format("Downsize EC2 instance type from %s to %s",
$instance.getInstanceType(), $instance.getRecommendedInstanceType());
String message = String.format("EC2 instance [%s] is overprovisioned. Current cost per hour: %.3f USD, Recommended cost per hour: %.3f USD. Status: %s",
instanceId, $instance.getOdCostPerHour(), $instance.getRecommendedInstanceCostPerHour(), status);

// Build metadata JSON
String metadata = String.format(
"{\"instanceType\": \"%s\", \"cpu\": %d, " +
"\"memory\": %d, \"cpuUtilization\": %d, " +
"\"tenancy\": \"%s\", \"architecture\": \"%s\", " +
"\"platformDetails\": \"%s\", \"recommendedInstanceType\": \"%s\"}",
$instance.getInstanceType(),
$instance.getCpu(),
(int) $instance.getMemory(),
$instance.getCpuUtilization(),
$instance.getTenancy(),
$instance.getArchitecture(),
$instance.getPlatformDetails(),
$instance.getRecommendedInstanceType()
);

// Create recommendation object
RecommendationInfo recommendation = new RecommendationInfo(
$instance.getAccountId(),
instanceId,
$instance.getInstanceName(),
$instance.getRegion(),
description,
action,
message,
Double.parseDouble(String.valueOf($instance.getOdCostPerHour())),
recommendedCostPerHour,
status,
metadata
);

// Add to results list
recommendationList.add(recommendation);
log.info("Recommendation added for OverProvisioned EC2 instance: {} recommended:{} and status: {}",
instanceId, $instance.getRecommendedInstanceType(), status);
end

Step 2: Rule Compilation (Automatic at runtime)

Drools compiles .drl files to Java bytecode for fast execution:

KieServices kieServices = KieServices.Factory.get();
KieContainer kieContainer = kieServices.getKieClasspathContainer();
KieBase kieBase = kieContainer.getKieBase("tuner-recommendation-rules");

Step 3: Fact Insertion (Input data to rule engine)

KieSession kieSession = kieBase.newKieSession();

// Set global variables
kieSession.setGlobal("log", logger);
kieSession.setGlobal("recommendationList", new ArrayList<RecommendationInfo>());
kieSession.setGlobal("priceMap", pricingData);
kieSession.setGlobal("restParams", lookbackPeriod);

// Insert facts (EC2 instances to analyze)
for (Ec2InstanceInfo instance : instanceList) {
kieSession.insert(instance);
}

// Fire all rules
kieSession.fireAllRules();

// Retrieve results
List<RecommendationInfo> recommendations = (List<RecommendationInfo>) kieSession.getGlobal("recommendationList");

Step 4: Rule Execution (Pattern matching and inference)

Drools uses RETE algorithm for efficient pattern matching:

For each fact (Ec2InstanceInfo):
1. Match against rule conditions (when clause)
2. If matched, execute action (then clause)
3. Continue to next fact

Drools optimizes by:
- Indexing facts for fast lookup
- Caching intermediate results
- Parallel evaluation (where possible)

Recommendation Job Framework

Job Processor Interface

All recommendation jobs implement JobProcessor:

public interface JobProcessor {
void process(Map<String, Object> dataMap);
}

Standard Job Implementation Pattern

@Service("OVER_PROVISIONED_EC2_RECOMMENDATION_JOB_PROCESSOR")
@Slf4j
@RequiredArgsConstructor
public class OverProvisionedEc2RecommendationJob implements JobProcessor {

// Configuration (from application.yml)
@Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.overProvisionedLookBackPeriod:30}")
private int overProvisionedLookBackPeriod;

@Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.overProvisionedThreshold:30}")
private int overProvisionedThreshold;

@Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.cloudWatch.metricPeriod:3600}")
private Integer cloudWatchMetricPeriod;

// Dependencies
private final TunerEventService tunerEventService;
private final DroolsEngine droolsEngine;
private final OverProvisionedEc2Service operProvisionedEc2Service;
private final Ec2ResourceRepository ec2ResourceRepository;
private final OverProvisionedEc2PricingService operProvisionedEc2PricingService;

@Override
public void process(Map<String, Object> dataMap) {
String accountId = (String) dataMap.get(ACCOUNT_ID);
String factorKey = (String) dataMap.get(FACTOR_KEY);

try {
log.info("Processing {} recommendations with data: {}", factorKey, dataMap);

// 1. Generate recommendations
List<RecommendationInfo> recommendationInfos = generateRecommendation(dataMap);

// 2. Publish results
dataMap.put(RECOMMENDATION, recommendationInfos);
tunerEventService.sendEvent(TunerEvent.SYNC_RECOMMENDATION_SUCCESS, accountId, dataMap);

log.info("{} Recommendation job completed successfully for account: {}", factorKey, dataMap);
} catch (Exception e) {
log.error("Error processing {} Recommendation job for account: {}, error: {}",
factorKey, dataMap, e.getMessage(), e);
}
}

private List<RecommendationInfo> generateRecommendation(Map<String, Object> jobDataMap) {
String accountId = (String) jobDataMap.get(ACCOUNT_ID);
String region = (String) jobDataMap.get(REGION);

// 1. Fetch EC2 resources from MongoDB cache
List<Ec2InstanceInfo> ec2InstancesList =
ec2ResourceRepository.findByAccountIdInAndRegion(List.of(accountId), region).stream()
.map(Ec2ResourceDocument::getResourceInfo)
.toList();

// 2. Fetch CloudWatch metrics and analyze utilization
AccountEc2InstanceInfo accountOperProvisionedEc2InstanceInfo =
operProvisionedEc2Service.getAllOperProvisionedEc2Instances(
awsAccountDto,
region,
ec2InstancesList,
overProvisionedLookBackPeriod, // 30 days
overProvisionedThreshold, // 30% CPU
cloudWatchMetricPeriod // 1 hour granularity
);

if (accountOperProvisionedEc2InstanceInfo.getInstances().isEmpty()) {
log.info("No {} resource for account: {}, region: {}", factorKey, accountId, region);
return List.of();
}

// 3. Find optimal instance types with pricing
List<Ec2InstanceInfo> optimizedEc2Instances =
operProvisionedEc2PricingService.getOptimizedEc2InstanceTypes(
accountOperProvisionedEc2InstanceInfo.getInstances(),
accountId,
region
);

// 4. Execute Drools rules
List<RecommendationInfo> recommendationInfos =
droolsEngine.fireRules(
accountId,
factor.getDroolRuleFilePath(), // "rules/oper_provisioned_ec2_instance_rules.drl"
optimizedEc2Instances,
null,
overProvisionedLookBackPeriod
);

log.info("Generated {} resource recommendations for account: {}, region: {}, total recommendations: {}",
factorKey, accountId, region, recommendationInfos.size());

return recommendationInfos;
}
}

Job Scheduling

Jobs are scheduled using Quartz:

// application.yml
tuner:
recommendation:
sync:
schedule:
overProvisionedEc2: "0 0 2 * * ?" # Daily at 2 AM
snapshot: "0 0 3 * * SUN" # Weekly Sunday 3 AM
idleNatGateway: "0 0 4 * * ?" # Daily at 4 AM

Quartz triggers invoke job processors:

@Component
@Slf4j
public class RecommendationJobScheduler {

@Scheduled(cron = "${tuner.recommendation.sync.schedule.overProvisionedEc2}")
public void scheduleOverProvisionedEc2Recommendations() {
// For each account and region
for (Account account : accountService.getAllAccounts()) {
for (String region : account.getRegions()) {
Map<String, Object> jobData = Map.of(
"accountId", account.getAccountId(),
"region", region,
"factorKey", "OVER_PROVISIONED_EC2"
);

jobProcessor.process(jobData);
}
}
}
}

Rule Definitions & Business Logic

Rule Categories

Tuner uses 40+ rule files organized by recommendation type:

1. Cleaner Rules

Purpose: Identify unused resources to delete

Rule FileTarget ServiceLogic
snapshot-rules.drlEBS SnapshotsAge > 90 days AND no attached volumes
natgateway-rules.drlNAT GatewayBytes transferred = 0 for 30 days
idle-vpc-endpoint-recommendation-rules.drlVPC EndpointNo active connections for 30 days
s3-incomplete-multipart-rules.drlS3 MultipartUpload initiated > 7 days ago, not completed
idle-dynamodb-table-rules.drlDynamoDBRead/Write capacity units = 0 for 30 days
unused-ebs-snapshot-rules.drlEBS SnapshotsNo AMI reference, volume deleted
unused-ami-recommendation-rules.drlAMINo running instances, age > 180 days
idle-emr-cluster-rules.drlEMRCluster in WAITING state > 24 hours

Example: Snapshot Cleanup Rule

rule "Generate EBS Snapshot Cleanup Recommendations"
when
$snapshot: EbsSnapshotInfo(
age > 90,
volumeId == null || volumeStatus == "deleted",
used_in_ami == false
)
then
double monthlyStorage = $snapshot.getSize() * 0.05; // $0.05 per GB-month

String description = String.format(
"EBS snapshot is %d days old and the volume has been deleted. " +
"Snapshot is not used in any AMI.",
$snapshot.getAge()
);

String action = "Delete EBS snapshot " + $snapshot.getSnapshotId();

RecommendationInfo recommendation = new RecommendationInfo(
$snapshot.getAccountId(),
$snapshot.getSnapshotId(),
$snapshot.getDescription(),
$snapshot.getRegion(),
description,
action,
"Snapshot cleanup opportunity",
monthlyStorage,
0.0, // Cost after deletion
"GENERATED",
buildMetadata($snapshot)
);

recommendationList.add(recommendation);
end

2. OverProvisioned Rules

Purpose: Identify resources with excess capacity

Rule FileTarget ServiceLogic
oper_provisioned_ec2_instance_rules.drlEC2CPU < 30% AND memory < 40% for 30 days
over-provisioned-redshift-cluster-recommendation-rules.drlRedshiftCPU < 30% for 30 days
over-provisioned-ecs-fargate-recommendation-rules.drlECS FargateCPU < 30% AND memory < 40% for 30 days

Thresholds (Configurable):

tuner:
recommendation:
config:
ec2Instance:
overProvisionedLookBackPeriod: 30 # Days to analyze
overProvisionedThreshold: 30 # CPU % threshold
rdsInstance:
overProvisionedLookBackPeriod: 30
overProvisionedCpuThreshold: 30
overProvisionedMemoryThreshold: 40
redshift:
overProvisionedLookBackPeriod: 30
overProvisionedCpuThreshold: 30

3. Modernization Rules

Purpose: Identify upgrade opportunities to avoid surcharges or use better alternatives

Rule FileTarget ServiceLogic
rds-extended-support-rules.drlRDSMySQL/PostgreSQL version end-of-support within 6 months
eks-rules.drlEKSKubernetes version end-of-support within 3 months
modernise-elasticache-rules.drlElastiCacheRedis version < 7.0, recommend Valkey migration
modernise_opensearch-rules.drlOpenSearchUsing gp2 volumes, recommend gp3 upgrade
elasticache-rules.drlElastiCacheRedis end-of-support, recommend Valkey
ms-sql-server-licence-cost-recommendation-rules.drlRDS SQL ServerLicense Included, recommend BYOL

Example: RDS Extended Support Rule

rule "Generate RDS Extended Support Recommendations"
when
$rds: RdsInstanceInfo(
engine == "mysql" || engine == "postgres",
engineVersion in ("5.7", "10.x", "11.x"), // End-of-support versions
daysUntilExtendedSupport < 180 // Within 6 months
)
then
// Extended support adds 50% surcharge
double currentCost = $rds.getMonthlyCost();
double extendedSupportCost = currentCost * 1.5;
double recommendedCost = currentCost; // Upgrade to supported version

String description = String.format(
"RDS %s version %s is approaching end-of-standard-support. " +
"Extended support will add 50%% surcharge ($%.2f/month additional). " +
"Upgrade to version %s to avoid surcharge.",
$rds.getEngine(),
$rds.getEngineVersion(),
currentCost * 0.5,
$rds.getRecommendedVersion()
);

String action = String.format(
"Upgrade RDS instance from %s %s to %s",
$rds.getEngine(),
$rds.getEngineVersion(),
$rds.getRecommendedVersion()
);

double monthlySavings = extendedSupportCost - recommendedCost;

RecommendationInfo recommendation = new RecommendationInfo(
$rds.getAccountId(),
$rds.getDbInstanceIdentifier(),
$rds.getDbName(),
$rds.getRegion(),
description,
action,
"Modernization opportunity",
extendedSupportCost, // Future cost if not upgraded
recommendedCost, // Cost if upgraded
"GENERATED",
buildMetadata($rds)
);

recommendationList.add(recommendation);
end

4. Idle Resource Rules (GCP Support)

Purpose: Multi-cloud optimization for Google Cloud Platform

Rule FileTarget ServiceLogic
idle-cloud-vm-recommendation-rules.drlGCP ComputeCPU < 10% for 30 days
idle-cloud-sql-recommendation-rules.drlCloud SQLConnections = 0 for 30 days
idle-static-ip-recommendation-rules.drlStatic IPNot attached to instance for 30 days
idle-persistent-disk-recommendation-rules.drlPersistent DiskNot attached, age > 30 days
idle-load-balancer-gcp-recommendation-rules.drlLoad BalancerRequest count = 0 for 30 days

Recommendation Types Catalog

Complete List (42+ Types)

AWS Compute (8 types)

  1. OverProvisioned EC2 - Downsize underutilized instances
  2. Idle EC2 - Stop or terminate instances with no traffic
  3. EC2 Reserved Instances - Purchase commitment discounts
  4. EC2 Savings Plans - Flexible commitment discounts
  5. OverProvisioned ECS Fargate - Rightsize Fargate tasks
  6. Lambda Error Rate - Fix functions with high error rates
  7. Lambda Timeout - Optimize timeout configurations
  8. Idle EMR Cluster - Terminate idle EMR clusters

AWS Storage (7 types)

  1. EBS Snapshot Cleanup - Delete orphaned snapshots
  2. EBS Volume Upgrade - Migrate gp2 → gp3
  3. S3 Lifecycle Policy - Implement intelligent tiering
  4. S3 Incomplete Multipart - Clean up failed uploads
  5. AMI Cleanup - Delete unused AMIs
  6. ECR Lifecycle Policy - Delete old container images
  7. Unused EBS Snapshot - Delete snapshots with no volume

AWS Database (7 types)

  1. OverProvisioned RDS - Downsize underutilized databases
  2. Idle RDS - Stop unused databases
  3. RDS Extended Support - Upgrade to avoid surcharges
  4. RDS Reserved Instances - Purchase commitment discounts
  5. OverProvisioned Redshift - Rightsize Redshift clusters
  6. Idle Redshift - Pause idle clusters
  7. DynamoDB Idle Table - Delete or archive unused tables

AWS Networking (4 types)

  1. Idle NAT Gateway - Delete NAT gateways with no traffic
  2. Idle VPC Endpoint - Delete unused VPC endpoints
  3. Idle Load Balancer - Delete load balancers with no targets
  4. Idle Network Firewall - Delete unused firewalls

AWS Modernization (6 types)

  1. ElastiCache Redis → Valkey - Migrate to open-source alternative
  2. Modernize OpenSearch - Upgrade to gp3 volumes
  3. EKS Extended Support - Upgrade Kubernetes version
  4. MS SQL Server License - Migrate to BYOL
  5. Database Migration Service - Optimize DMS instances
  6. Route53 - Optimize hosted zone costs

AWS Other (3 types)

  1. EC2 Attach Volume - Attach unattached EBS volumes
  2. Compute Savings Plans - Flexible compute discounts
  3. Spot Instance Recommendations - Workload suitability analysis

GCP Compute (3 types)

  1. Idle Compute Instance (GCP) - Stop idle VMs
  2. Idle Machine Image (GCP) - Delete unused images
  3. Idle NAT Gateway (GCP) - Delete idle Cloud NAT

GCP Storage (2 types)

  1. Idle Persistent Disk (GCP) - Delete unattached disks
  2. Idle PD Snapshots (GCP) - Delete old snapshots

GCP Database (2 types)

  1. Idle Cloud SQL (GCP) - Stop idle databases
  2. Idle Memorystore (GCP) - Delete idle Redis instances

GCP Networking (2 types)

  1. Idle Load Balancer (GCP) - Delete unused load balancers
  2. Idle Static IP (GCP) - Release unused static IPs

Data Collection & Analysis

CloudWatch Metrics Collection

Metrics Gathered:

// EC2 Metrics (30-day period, 1-hour granularity)
CloudWatch.getMetricStatistics(
namespace: "AWS/EC2",
metricName: "CPUUtilization",
dimensions: [{"Name": "InstanceId", "Value": "i-xxxxx"}],
startTime: now() - 30.days,
endTime: now(),
period: 3600, // 1 hour
statistics: ["Average", "Maximum"]
)

// Additional EC2 Metrics
- NetworkIn (bytes)
- NetworkOut (bytes)
- DiskReadBytes
- DiskWriteBytes

// RDS Metrics
- CPUUtilization
- DatabaseConnections
- ReadIOPS
- WriteIOPS
- FreeableMemory

// NAT Gateway Metrics
- BytesInFromDestination
- BytesInFromSource
- BytesOutToDestination
- BytesOutToSource

Utilization Calculation:

public int calculateCpuUtilization(List<Datapoint> datapoints) {
if (datapoints.isEmpty()) return 0;

// Calculate average of all hourly averages over 30 days
double sum = datapoints.stream()
.mapToDouble(Datapoint::getAverage)
.sum();

int avgUtilization = (int) Math.ceil(sum / datapoints.size());

log.debug("CPU Utilization: {} datapoints, average: {}%",
datapoints.size(), avgUtilization);

return avgUtilization;
}

AWS Pricing Data Collection

Pricing API Integration:

public class OverProvisionedEc2PricingService {

public List<Ec2InstanceInfo> getOptimizedEc2InstanceTypes(
List<Ec2InstanceInfo> instances,
String accountId,
String region
) {
List<Ec2InstanceInfo> optimizedList = new ArrayList<>();

for (Ec2InstanceInfo instance : instances) {
// Get pricing for current instance type
double currentPrice = getPricing(
instance.getInstanceType(),
region,
instance.getPlatformDetails(),
instance.getTenancy()
);

// Find optimal alternative based on utilization
String optimalType = findOptimalInstanceType(
instance.getCpuUtilization(),
instance.getMemoryUtilization(),
instance.getInstanceType(),
instance.getArchitecture()
);

// Get pricing for recommended type
double recommendedPrice = getPricing(
optimalType,
region,
instance.getPlatformDetails(),
instance.getTenancy()
);

// Enrich instance info
instance.setRecommendedInstanceType(optimalType);
instance.setOdCostPerHour(BigDecimal.valueOf(currentPrice));
instance.setRecommendedInstanceCostPerHour(recommendedPrice);

optimizedList.add(instance);
}

return optimizedList;
}

private String findOptimalInstanceType(
int cpuUtil,
int memoryUtil,
String currentType,
String architecture
) {
// Logic to find right-sized instance
// Example: m5.2xlarge (8 vCPU, 32 GB) with 12% CPU → m5.large (2 vCPU, 8 GB)

InstanceSpec currentSpec = parseInstanceType(currentType);

// Target: 70-80% utilization on recommended instance
int targetCpu = (int) Math.ceil(currentSpec.vCpus * cpuUtil / 75.0);
int targetMemory = (int) Math.ceil(currentSpec.memoryGb * memoryUtil / 75.0);

// Find smallest instance that meets target
return findSmallestInstanceMatchingSpec(
targetCpu,
targetMemory,
currentSpec.family, // Prefer same family (m5)
architecture // x86_64 or arm64
);
}
}

Pricing Cache:

Pricing data cached in Redis (24-hour TTL):

@Cacheable(value = "ec2-pricing", key = "#instanceType + '-' + #region")
public double getPricing(String instanceType, String region, String os, String tenancy) {
// Query AWS Pricing API or cache
}

Cost Calculation Engine

Savings Calculation

public class CostCalculator {

public double calculateMonthlySavings(
double currentCostPerHour,
double recommendedCostPerHour
) {
double hourlySavings = Math.max(0, currentCostPerHour - recommendedCostPerHour);
return hourlySavings * 730; // Average hours per month
}

public double calculateSchedulerSavings(
double instanceCostPerHour,
int hoursOffPerWeek
) {
int hoursOffPerMonth = (int) (hoursOffPerWeek * 4.3); // 4.3 weeks per month
return instanceCostPerHour * hoursOffPerMonth;
}

public double calculateSnapshotSavings(
int sizeGB,
String region
) {
double pricePerGbMonth = getEbsSnapshotPricing(region); // $0.05/GB-month
return sizeGB * pricePerGbMonth;
}
}

ROI Tracking

Projected vs. Realized Savings:

public class SavingsTracker {

public SavingsReport calculateRealizedSavings(
String accountId,
String month
) {
// 1. Get all implemented recommendations for month
List<Recommendation> implemented =
recommendationRepository.findByAccountIdAndStatusAndMonth(
accountId,
"IMPLEMENTED",
month
);

// 2. Query actual cost from Snowflake CUR data
Map<String, Double> actualCosts = snowflakeService.getActualCosts(
accountId,
month,
implemented.stream().map(Recommendation::getResourceId).toList()
);

// 3. Compare projected vs. actual
double totalProjected = 0;
double totalRealized = 0;

for (Recommendation rec : implemented) {
double projected = rec.getProjectedMonthlySavings();
double actual = rec.getCurrentCost() - actualCosts.get(rec.getResourceId());

totalProjected += projected;
totalRealized += actual;
}

return new SavingsReport(
totalProjected,
totalRealized,
(totalRealized / totalProjected) * 100 // % accuracy
);
}
}

Performance & Scalability

Performance Characteristics

Recommendation Generation:

MetricValueNotes
EC2 Analysis~100 instances/secondCloudWatch API bottleneck
Rule Evaluation~10,000 facts/secondDrools RETE algorithm
Pricing Lookup~500 lookups/secondRedis cache hit: 95%
Database Write~1,000 recommendations/secondMongoDB bulk insert

Scalability Limits:

AccountsInstancesDaily Recommendation TimeNotes
101,0005 minutesSingle tuner-core instance
10010,00045 minutes3 tuner-core instances
1,000100,0006 hours10 tuner-core instances + sharded MongoDB

Optimization Techniques

1. Caching Strategy:

// L1: Redis cache (24-hour TTL)
@Cacheable(value = "pricing-data")
public PricingInfo getPricing(String instanceType, String region);

// L2: MongoDB resource cache (6-hour refresh)
@Cacheable(value = "ec2-resources", ttl = 21600)
public List<Ec2InstanceInfo> getEc2Resources(String accountId, String region);

// L3: JVM cache (1-hour TTL)
@Cacheable(value = "drools-rules", ttl = 3600)
public KieBase loadRules(String ruleFile);

2. Batch Processing:

// Process instances in batches of 100
List<List<Ec2InstanceInfo>> batches = Lists.partition(instances, 100);

for (List<Ec2InstanceInfo> batch : batches) {
List<RecommendationInfo> batchRecommendations = droolsEngine.fireRules(
accountId,
ruleFile,
batch,
priceMap,
lookbackPeriod
);

// Bulk insert to MongoDB
recommendationRepository.insertAll(batchRecommendations);
}

3. Async Processing:

@Async
public CompletableFuture<List<RecommendationInfo>> generateRecommendationsAsync(
String accountId,
String region
) {
List<RecommendationInfo> recommendations = generateRecommendation(accountId, region);
return CompletableFuture.completedFuture(recommendations);
}

// Parallel execution for multiple regions
List<CompletableFuture<List<RecommendationInfo>>> futures = regions.stream()
.map(region -> generateRecommendationsAsync(accountId, region))
.toList();

CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();

Extending the Engine

Adding a New Recommendation Type

Step 1: Create Drools Rule File

src/main/resources/rules/my-new-recommendation-rules.drl:

package com.ttn.ck.tuner.recommendation.rules;

import com.ttn.ck.tuner.utils.aws.MyResourceInfo;
import com.ttn.ck.tuner.utils.dtos.recommendation.RecommendationInfo;

global org.slf4j.Logger log;
global java.util.List recommendationList;

rule "Generate My New Recommendations"
when
$resource: MyResourceInfo()
then
// Your business logic here
if ($resource.shouldRecommend()) {
RecommendationInfo recommendation = new RecommendationInfo(...);
recommendationList.add(recommendation);
}
end

Step 2: Create Job Processor

@Service("MY_NEW_RECOMMENDATION_JOB_PROCESSOR")
@Slf4j
@RequiredArgsConstructor
public class MyNewRecommendationJob implements JobProcessor {

private final DroolsEngine droolsEngine;
private final MyResourceService myResourceService;

@Override
public void process(Map<String, Object> dataMap) {
String accountId = (String) dataMap.get("accountId");
String region = (String) dataMap.get("region");

// 1. Collect data
List<MyResourceInfo> resources = myResourceService.getResources(accountId, region);

// 2. Execute Drools rules
List<RecommendationInfo> recommendations = droolsEngine.fireRules(
accountId,
"rules/my-new-recommendation-rules.drl",
resources,
null,
30
);

// 3. Publish results
tunerEventService.sendEvent(
TunerEvent.SYNC_RECOMMENDATION_SUCCESS,
accountId,
Map.of("recommendations", recommendations)
);
}
}

Step 3: Register in Configuration

public enum TunerRecommendationFactor {
// ... existing factors
MY_NEW_RECOMMENDATION(
"MY_NEW_RECOMMENDATION",
"MY_NEW_RECOMMENDATION_JOB_PROCESSOR",
"rules/my-new-recommendation-rules.drl"
);

// ...
}

Step 4: Schedule Job

# application.yml
tuner:
recommendation:
sync:
schedule:
myNewRecommendation: "0 0 3 * * ?" # Daily at 3 AM

Testing New Rules

@SpringBootTest
class MyNewRecommendationJobTest {

@Autowired
private DroolsEngine droolsEngine;

@Test
void testMyNewRecommendationRule() {
// 1. Prepare test data
MyResourceInfo resource = MyResourceInfo.builder()
.resourceId("test-resource-123")
.accountId("123456789012")
.region("us-east-1")
.shouldRecommend(true)
.build();

// 2. Execute rules
List<RecommendationInfo> recommendations = droolsEngine.fireRules(
"123456789012",
"rules/my-new-recommendation-rules.drl",
List.of(resource),
null,
30
);

// 3. Verify results
assertThat(recommendations).hasSize(1);
assertThat(recommendations.get(0).getResourceId()).isEqualTo("test-resource-123");
assertThat(recommendations.get(0).getMonthlySavings()).isGreaterThan(0);
}
}

Next Steps