Recommendation Engine Deep Dive

Module: Tuner Component: Recommendation Engine Version: 1.0.0-RELEASE Last Updated: October 26, 2025 Document Type: Technical Architecture (Architect Reference)

Introduction
Drools Rule Engine Architecture
Recommendation Job Framework
Rule Definitions & Business Logic
Recommendation Types Catalog
Data Collection & Analysis
Cost Calculation Engine
Performance & Scalability
Extending the Engine

Introduction

Purpose

The AWS Tuner Recommendation Engine is a rule-based decision system that analyzes cloud infrastructure to identify cost optimization opportunities. Unlike machine learning approaches, it uses explicit business rules defined in Drools (Business Rules Management System) to ensure recommendations are:

Transparent: Business logic visible and auditable
Predictable: Same inputs always produce same outputs
Explainable: Clear reasoning for each recommendation
Configurable: Thresholds adjustable without code changes
Extensible: New recommendation types added without redeployment

Architecture Philosophy

Rule-Based vs. ML-Based:

Aspect	Rule-Based (Drools)	ML-Based
Transparency	✅ Fully transparent	❌ Black box
Explainability	✅ Clear decision path	⚠️ Difficult to explain
Consistency	✅ Deterministic	⚠️ Probabilistic
Maintenance	✅ Rules updated easily	❌ Requires retraining
Regulatory Compliance	✅ Auditable	⚠️ Challenging
Edge Cases	✅ Explicit handling	⚠️ May fail unexpectedly

Why Drools?

Tuner uses Drools because cost optimization requires explainable, auditable decisions that can be validated by finance and engineering teams. When recommending a $10K/month change, stakeholders need to understand why the recommendation was made.

Drools Rule Engine Architecture

Component Overview

Drools Rule Engine Component Overview

Drools Workflow

Step 1: Rule Definition (Business Rules in .drl files)

package com.ttn.ck.tuner.recommendation.rules;

import com.ttn.ck.tuner.utils.aws.Ec2InstanceInfo;
import com.ttn.ck.tuner.utils.dtos.recommendation.RecommendationInfo;

global org.slf4j.Logger log;
global java.util.Map priceMap;
global java.lang.Integer restParams;
global java.util.List recommendationList;

rule "Generate OverProvisioned EC2 Recommendations"
when
    $instance: Ec2InstanceInfo()
then
    log.info("Generating recommendations for EC2 instance: " + $instance.getResourceId());

    String instanceId = $instance.getResourceId();

    // Validation 1: Ensure recommended instance type exists
    if($instance.getRecommendedInstanceType() == null){
        log.info("Skipping recommendation for instance {} as no recommendedInstanceType found.", instanceId);
        return;
    }

    // Validation 2: Ensure savings exist
    if($instance.getOdCostPerHour().doubleValue() <= $instance.getRecommendedInstanceCostPerHour()){
        log.debug("Skipping recommendation for instance {} as current cost per hour {} is less than or equal to recommended cost per hour {}",
            instanceId, $instance.getOdCostPerHour(), $instance.getRecommendedInstanceCostPerHour());
        return;
    }

    // Validation 3: Ensure instance type is different
    if($instance.getInstanceType().equals($instance.getRecommendedInstanceType())){
        log.info("Skipping recommendation because the current instanceType: {} is already the recommended instanceType: {} for instance: {}",
            $instance.getInstanceType(), $instance.getRecommendedInstanceType(), instanceId);
        return;
    }

    // Calculate monthly savings
    double recommendedCostPerHour = Math.max(
        0,
        $instance.getOdCostPerHour()
                 .subtract(BigDecimal.valueOf($instance.getRecommendedInstanceCostPerHour()))
                 .doubleValue()
    );

    // Validation 4: Minimum savings threshold
    if(recommendedCostPerHour*720 <= 0.005){
        log.warn("Recommendation not generated as Potential Saving for EC2 instance: {} is not greater than: ${} in account: {}, region: {}",
        instanceId, $instance.getRecommendedInstanceCostPerHour(), $instance.getAccountId(), $instance.getRegion());
        return;
    }

    // Generate recommendation
    String status = "GENERATED";
    String description = String.format("Maximum CPU utilisation of EC2 is %d%%\nThe instance has been overprovisioned for the past %d days",
        $instance.getCpuUtilization(), restParams);
    String action = String.format("Downsize EC2 instance type from %s to %s",
        $instance.getInstanceType(), $instance.getRecommendedInstanceType());
    String message = String.format("EC2 instance [%s] is overprovisioned. Current cost per hour: %.3f USD, Recommended cost per hour: %.3f USD. Status: %s",
        instanceId, $instance.getOdCostPerHour(), $instance.getRecommendedInstanceCostPerHour(), status);

    // Build metadata JSON
    String metadata = String.format(
        "{\"instanceType\": \"%s\", \"cpu\": %d, " +
        "\"memory\": %d, \"cpuUtilization\": %d, " +
        "\"tenancy\": \"%s\", \"architecture\": \"%s\", " +
        "\"platformDetails\": \"%s\", \"recommendedInstanceType\": \"%s\"}",
        $instance.getInstanceType(),
        $instance.getCpu(),
        (int) $instance.getMemory(),
        $instance.getCpuUtilization(),
        $instance.getTenancy(),
        $instance.getArchitecture(),
        $instance.getPlatformDetails(),
        $instance.getRecommendedInstanceType()
    );

    // Create recommendation object
    RecommendationInfo recommendation = new RecommendationInfo(
        $instance.getAccountId(),
        instanceId,
        $instance.getInstanceName(),
        $instance.getRegion(),
        description,
        action,
        message,
        Double.parseDouble(String.valueOf($instance.getOdCostPerHour())),
        recommendedCostPerHour,
        status,
        metadata
    );

    // Add to results list
    recommendationList.add(recommendation);
    log.info("Recommendation added for OverProvisioned EC2 instance: {} recommended:{} and status: {}",
        instanceId, $instance.getRecommendedInstanceType(), status);
end

Step 2: Rule Compilation (Automatic at runtime)

Drools compiles .drl files to Java bytecode for fast execution:

KieServices kieServices = KieServices.Factory.get();
KieContainer kieContainer = kieServices.getKieClasspathContainer();
KieBase kieBase = kieContainer.getKieBase("tuner-recommendation-rules");

Step 3: Fact Insertion (Input data to rule engine)

KieSession kieSession = kieBase.newKieSession();

// Set global variables
kieSession.setGlobal("log", logger);
kieSession.setGlobal("recommendationList", new ArrayList<RecommendationInfo>());
kieSession.setGlobal("priceMap", pricingData);
kieSession.setGlobal("restParams", lookbackPeriod);

// Insert facts (EC2 instances to analyze)
for (Ec2InstanceInfo instance : instanceList) {
    kieSession.insert(instance);
}

// Fire all rules
kieSession.fireAllRules();

// Retrieve results
List<RecommendationInfo> recommendations = (List<RecommendationInfo>) kieSession.getGlobal("recommendationList");

Step 4: Rule Execution (Pattern matching and inference)

Drools uses RETE algorithm for efficient pattern matching:

For each fact (Ec2InstanceInfo):
    1. Match against rule conditions (when clause)
    2. If matched, execute action (then clause)
    3. Continue to next fact

Drools optimizes by:
- Indexing facts for fast lookup
- Caching intermediate results
- Parallel evaluation (where possible)

Recommendation Job Framework

Job Processor Interface

All recommendation jobs implement JobProcessor:

public interface JobProcessor {
    void process(Map<String, Object> dataMap);
}

Standard Job Implementation Pattern

@Service("OVER_PROVISIONED_EC2_RECOMMENDATION_JOB_PROCESSOR")
@Slf4j
@RequiredArgsConstructor
public class OverProvisionedEc2RecommendationJob implements JobProcessor {

    // Configuration (from application.yml)
    @Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.overProvisionedLookBackPeriod:30}")
    private int overProvisionedLookBackPeriod;

    @Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.overProvisionedThreshold:30}")
    private int overProvisionedThreshold;

    @Value("${tuner.recommendation.sync.recommendation.config.ec2Instance.cloudWatch.metricPeriod:3600}")
    private Integer cloudWatchMetricPeriod;

    // Dependencies
    private final TunerEventService tunerEventService;
    private final DroolsEngine droolsEngine;
    private final OverProvisionedEc2Service operProvisionedEc2Service;
    private final Ec2ResourceRepository ec2ResourceRepository;
    private final OverProvisionedEc2PricingService operProvisionedEc2PricingService;

    @Override
    public void process(Map<String, Object> dataMap) {
        String accountId = (String) dataMap.get(ACCOUNT_ID);
        String factorKey = (String) dataMap.get(FACTOR_KEY);

        try {
            log.info("Processing {} recommendations with data: {}", factorKey, dataMap);

            // 1. Generate recommendations
            List<RecommendationInfo> recommendationInfos = generateRecommendation(dataMap);

            // 2. Publish results
            dataMap.put(RECOMMENDATION, recommendationInfos);
            tunerEventService.sendEvent(TunerEvent.SYNC_RECOMMENDATION_SUCCESS, accountId, dataMap);

            log.info("{} Recommendation job completed successfully for account: {}", factorKey, dataMap);
        } catch (Exception e) {
            log.error("Error processing {} Recommendation job for account: {}, error: {}",
                factorKey, dataMap, e.getMessage(), e);
        }
    }

    private List<RecommendationInfo> generateRecommendation(Map<String, Object> jobDataMap) {
        String accountId = (String) jobDataMap.get(ACCOUNT_ID);
        String region = (String) jobDataMap.get(REGION);

        // 1. Fetch EC2 resources from MongoDB cache
        List<Ec2InstanceInfo> ec2InstancesList =
            ec2ResourceRepository.findByAccountIdInAndRegion(List.of(accountId), region).stream()
                .map(Ec2ResourceDocument::getResourceInfo)
                .toList();

        // 2. Fetch CloudWatch metrics and analyze utilization
        AccountEc2InstanceInfo accountOperProvisionedEc2InstanceInfo =
            operProvisionedEc2Service.getAllOperProvisionedEc2Instances(
                awsAccountDto,
                region,
                ec2InstancesList,
                overProvisionedLookBackPeriod,      // 30 days
                overProvisionedThreshold,            // 30% CPU
                cloudWatchMetricPeriod               // 1 hour granularity
            );

        if (accountOperProvisionedEc2InstanceInfo.getInstances().isEmpty()) {
            log.info("No {} resource for account: {}, region: {}", factorKey, accountId, region);
            return List.of();
        }

        // 3. Find optimal instance types with pricing
        List<Ec2InstanceInfo> optimizedEc2Instances =
            operProvisionedEc2PricingService.getOptimizedEc2InstanceTypes(
                accountOperProvisionedEc2InstanceInfo.getInstances(),
                accountId,
                region
            );

        // 4. Execute Drools rules
        List<RecommendationInfo> recommendationInfos =
            droolsEngine.fireRules(
                accountId,
                factor.getDroolRuleFilePath(),  // "rules/oper_provisioned_ec2_instance_rules.drl"
                optimizedEc2Instances,
                null,
                overProvisionedLookBackPeriod
            );

        log.info("Generated {} resource recommendations for account: {}, region: {}, total recommendations: {}",
            factorKey, accountId, region, recommendationInfos.size());

        return recommendationInfos;
    }
}

Job Scheduling

Jobs are scheduled using Quartz:

// application.yml
tuner:
  recommendation:
    sync:
      schedule:
        overProvisionedEc2: "0 0 2 * * ?"  # Daily at 2 AM
        snapshot: "0 0 3 * * SUN"           # Weekly Sunday 3 AM
        idleNatGateway: "0 0 4 * * ?"      # Daily at 4 AM

Quartz triggers invoke job processors:

@Component
@Slf4j
public class RecommendationJobScheduler {

    @Scheduled(cron = "${tuner.recommendation.sync.schedule.overProvisionedEc2}")
    public void scheduleOverProvisionedEc2Recommendations() {
        // For each account and region
        for (Account account : accountService.getAllAccounts()) {
            for (String region : account.getRegions()) {
                Map<String, Object> jobData = Map.of(
                    "accountId", account.getAccountId(),
                    "region", region,
                    "factorKey", "OVER_PROVISIONED_EC2"
                );

                jobProcessor.process(jobData);
            }
        }
    }
}

Rule Definitions & Business Logic

Rule Categories

Tuner uses 40+ rule files organized by recommendation type:

1. Cleaner Rules

Purpose: Identify unused resources to delete

Rule File	Target Service	Logic
`snapshot-rules.drl`	EBS Snapshots	Age > 90 days AND no attached volumes
`natgateway-rules.drl`	NAT Gateway	Bytes transferred = 0 for 30 days
`idle-vpc-endpoint-recommendation-rules.drl`	VPC Endpoint	No active connections for 30 days
`s3-incomplete-multipart-rules.drl`	S3 Multipart	Upload initiated > 7 days ago, not completed
`idle-dynamodb-table-rules.drl`	DynamoDB	Read/Write capacity units = 0 for 30 days
`unused-ebs-snapshot-rules.drl`	EBS Snapshots	No AMI reference, volume deleted
`unused-ami-recommendation-rules.drl`	AMI	No running instances, age > 180 days
`idle-emr-cluster-rules.drl`	EMR	Cluster in WAITING state > 24 hours

Example: Snapshot Cleanup Rule

rule "Generate EBS Snapshot Cleanup Recommendations"
when
    $snapshot: EbsSnapshotInfo(
        age > 90,
        volumeId == null || volumeStatus == "deleted",
        used_in_ami == false
    )
then
    double monthlyStorage = $snapshot.getSize() * 0.05;  // $0.05 per GB-month

    String description = String.format(
        "EBS snapshot is %d days old and the volume has been deleted. " +
        "Snapshot is not used in any AMI.",
        $snapshot.getAge()
    );

    String action = "Delete EBS snapshot " + $snapshot.getSnapshotId();

    RecommendationInfo recommendation = new RecommendationInfo(
        $snapshot.getAccountId(),
        $snapshot.getSnapshotId(),
        $snapshot.getDescription(),
        $snapshot.getRegion(),
        description,
        action,
        "Snapshot cleanup opportunity",
        monthlyStorage,
        0.0,  // Cost after deletion
        "GENERATED",
        buildMetadata($snapshot)
    );

    recommendationList.add(recommendation);
end

2. OverProvisioned Rules

Purpose: Identify resources with excess capacity

Rule File	Target Service	Logic
`oper_provisioned_ec2_instance_rules.drl`	EC2	CPU < 30% AND memory < 40% for 30 days
`over-provisioned-redshift-cluster-recommendation-rules.drl`	Redshift	CPU < 30% for 30 days
`over-provisioned-ecs-fargate-recommendation-rules.drl`	ECS Fargate	CPU < 30% AND memory < 40% for 30 days

Thresholds (Configurable):

tuner:
  recommendation:
    config:
      ec2Instance:
        overProvisionedLookBackPeriod: 30      # Days to analyze
        overProvisionedThreshold: 30           # CPU % threshold
      rdsInstance:
        overProvisionedLookBackPeriod: 30
        overProvisionedCpuThreshold: 30
        overProvisionedMemoryThreshold: 40
      redshift:
        overProvisionedLookBackPeriod: 30
        overProvisionedCpuThreshold: 30

3. Modernization Rules

Purpose: Identify upgrade opportunities to avoid surcharges or use better alternatives

Rule File	Target Service	Logic
`rds-extended-support-rules.drl`	RDS	MySQL/PostgreSQL version end-of-support within 6 months
`eks-rules.drl`	EKS	Kubernetes version end-of-support within 3 months
`modernise-elasticache-rules.drl`	ElastiCache	Redis version < 7.0, recommend Valkey migration
`modernise_opensearch-rules.drl`	OpenSearch	Using gp2 volumes, recommend gp3 upgrade
`elasticache-rules.drl`	ElastiCache	Redis end-of-support, recommend Valkey
`ms-sql-server-licence-cost-recommendation-rules.drl`	RDS SQL Server	License Included, recommend BYOL

Example: RDS Extended Support Rule

rule "Generate RDS Extended Support Recommendations"
when
    $rds: RdsInstanceInfo(
        engine == "mysql" || engine == "postgres",
        engineVersion in ("5.7", "10.x", "11.x"),  // End-of-support versions
        daysUntilExtendedSupport < 180              // Within 6 months
    )
then
    // Extended support adds 50% surcharge
    double currentCost = $rds.getMonthlyCost();
    double extendedSupportCost = currentCost * 1.5;
    double recommendedCost = currentCost;  // Upgrade to supported version

    String description = String.format(
        "RDS %s version %s is approaching end-of-standard-support. " +
        "Extended support will add 50%% surcharge ($%.2f/month additional). " +
        "Upgrade to version %s to avoid surcharge.",
        $rds.getEngine(),
        $rds.getEngineVersion(),
        currentCost * 0.5,
        $rds.getRecommendedVersion()
    );

    String action = String.format(
        "Upgrade RDS instance from %s %s to %s",
        $rds.getEngine(),
        $rds.getEngineVersion(),
        $rds.getRecommendedVersion()
    );

    double monthlySavings = extendedSupportCost - recommendedCost;

    RecommendationInfo recommendation = new RecommendationInfo(
        $rds.getAccountId(),
        $rds.getDbInstanceIdentifier(),
        $rds.getDbName(),
        $rds.getRegion(),
        description,
        action,
        "Modernization opportunity",
        extendedSupportCost,  // Future cost if not upgraded
        recommendedCost,       // Cost if upgraded
        "GENERATED",
        buildMetadata($rds)
    );

    recommendationList.add(recommendation);
end

4. Idle Resource Rules (GCP Support)

Purpose: Multi-cloud optimization for Google Cloud Platform

Rule File	Target Service	Logic
`idle-cloud-vm-recommendation-rules.drl`	GCP Compute	CPU < 10% for 30 days
`idle-cloud-sql-recommendation-rules.drl`	Cloud SQL	Connections = 0 for 30 days
`idle-static-ip-recommendation-rules.drl`	Static IP	Not attached to instance for 30 days
`idle-persistent-disk-recommendation-rules.drl`	Persistent Disk	Not attached, age > 30 days
`idle-load-balancer-gcp-recommendation-rules.drl`	Load Balancer	Request count = 0 for 30 days

Recommendation Types Catalog

Complete List (42+ Types)

AWS Compute (8 types)

OverProvisioned EC2 - Downsize underutilized instances
Idle EC2 - Stop or terminate instances with no traffic
EC2 Reserved Instances - Purchase commitment discounts
EC2 Savings Plans - Flexible commitment discounts
OverProvisioned ECS Fargate - Rightsize Fargate tasks
Lambda Error Rate - Fix functions with high error rates
Lambda Timeout - Optimize timeout configurations
Idle EMR Cluster - Terminate idle EMR clusters

AWS Storage (7 types)

EBS Snapshot Cleanup - Delete orphaned snapshots
EBS Volume Upgrade - Migrate gp2 → gp3
S3 Lifecycle Policy - Implement intelligent tiering
S3 Incomplete Multipart - Clean up failed uploads
AMI Cleanup - Delete unused AMIs
ECR Lifecycle Policy - Delete old container images
Unused EBS Snapshot - Delete snapshots with no volume

AWS Database (7 types)

OverProvisioned RDS - Downsize underutilized databases
Idle RDS - Stop unused databases
RDS Extended Support - Upgrade to avoid surcharges
RDS Reserved Instances - Purchase commitment discounts
OverProvisioned Redshift - Rightsize Redshift clusters
Idle Redshift - Pause idle clusters
DynamoDB Idle Table - Delete or archive unused tables

AWS Networking (4 types)

Idle NAT Gateway - Delete NAT gateways with no traffic
Idle VPC Endpoint - Delete unused VPC endpoints
Idle Load Balancer - Delete load balancers with no targets
Idle Network Firewall - Delete unused firewalls

AWS Modernization (6 types)

ElastiCache Redis → Valkey - Migrate to open-source alternative
Modernize OpenSearch - Upgrade to gp3 volumes
EKS Extended Support - Upgrade Kubernetes version
MS SQL Server License - Migrate to BYOL
Database Migration Service - Optimize DMS instances
Route53 - Optimize hosted zone costs

AWS Other (3 types)

EC2 Attach Volume - Attach unattached EBS volumes
Compute Savings Plans - Flexible compute discounts
Spot Instance Recommendations - Workload suitability analysis

GCP Compute (3 types)

Idle Compute Instance (GCP) - Stop idle VMs
Idle Machine Image (GCP) - Delete unused images
Idle NAT Gateway (GCP) - Delete idle Cloud NAT

GCP Storage (2 types)

Idle Persistent Disk (GCP) - Delete unattached disks
Idle PD Snapshots (GCP) - Delete old snapshots

GCP Database (2 types)

Idle Cloud SQL (GCP) - Stop idle databases
Idle Memorystore (GCP) - Delete idle Redis instances

GCP Networking (2 types)

Idle Load Balancer (GCP) - Delete unused load balancers
Idle Static IP (GCP) - Release unused static IPs

Data Collection & Analysis

CloudWatch Metrics Collection

Metrics Gathered:

// EC2 Metrics (30-day period, 1-hour granularity)
CloudWatch.getMetricStatistics(
    namespace: "AWS/EC2",
    metricName: "CPUUtilization",
    dimensions: [{"Name": "InstanceId", "Value": "i-xxxxx"}],
    startTime: now() - 30.days,
    endTime: now(),
    period: 3600,  // 1 hour
    statistics: ["Average", "Maximum"]
)

// Additional EC2 Metrics
- NetworkIn (bytes)
- NetworkOut (bytes)
- DiskReadBytes
- DiskWriteBytes

// RDS Metrics
- CPUUtilization
- DatabaseConnections
- ReadIOPS
- WriteIOPS
- FreeableMemory

// NAT Gateway Metrics
- BytesInFromDestination
- BytesInFromSource
- BytesOutToDestination
- BytesOutToSource

Utilization Calculation:

public int calculateCpuUtilization(List<Datapoint> datapoints) {
    if (datapoints.isEmpty()) return 0;

    // Calculate average of all hourly averages over 30 days
    double sum = datapoints.stream()
        .mapToDouble(Datapoint::getAverage)
        .sum();

    int avgUtilization = (int) Math.ceil(sum / datapoints.size());

    log.debug("CPU Utilization: {} datapoints, average: {}%",
        datapoints.size(), avgUtilization);

    return avgUtilization;
}

AWS Pricing Data Collection

Pricing API Integration:

public class OverProvisionedEc2PricingService {

    public List<Ec2InstanceInfo> getOptimizedEc2InstanceTypes(
        List<Ec2InstanceInfo> instances,
        String accountId,
        String region
    ) {
        List<Ec2InstanceInfo> optimizedList = new ArrayList<>();

        for (Ec2InstanceInfo instance : instances) {
            // Get pricing for current instance type
            double currentPrice = getPricing(
                instance.getInstanceType(),
                region,
                instance.getPlatformDetails(),
                instance.getTenancy()
            );

            // Find optimal alternative based on utilization
            String optimalType = findOptimalInstanceType(
                instance.getCpuUtilization(),
                instance.getMemoryUtilization(),
                instance.getInstanceType(),
                instance.getArchitecture()
            );

            // Get pricing for recommended type
            double recommendedPrice = getPricing(
                optimalType,
                region,
                instance.getPlatformDetails(),
                instance.getTenancy()
            );

            // Enrich instance info
            instance.setRecommendedInstanceType(optimalType);
            instance.setOdCostPerHour(BigDecimal.valueOf(currentPrice));
            instance.setRecommendedInstanceCostPerHour(recommendedPrice);

            optimizedList.add(instance);
        }

        return optimizedList;
    }

    private String findOptimalInstanceType(
        int cpuUtil,
        int memoryUtil,
        String currentType,
        String architecture
    ) {
        // Logic to find right-sized instance
        // Example: m5.2xlarge (8 vCPU, 32 GB) with 12% CPU → m5.large (2 vCPU, 8 GB)

        InstanceSpec currentSpec = parseInstanceType(currentType);

        // Target: 70-80% utilization on recommended instance
        int targetCpu = (int) Math.ceil(currentSpec.vCpus * cpuUtil / 75.0);
        int targetMemory = (int) Math.ceil(currentSpec.memoryGb * memoryUtil / 75.0);

        // Find smallest instance that meets target
        return findSmallestInstanceMatchingSpec(
            targetCpu,
            targetMemory,
            currentSpec.family,  // Prefer same family (m5)
            architecture         // x86_64 or arm64
        );
    }
}

Pricing Cache:

Pricing data cached in Redis (24-hour TTL):

@Cacheable(value = "ec2-pricing", key = "#instanceType + '-' + #region")
public double getPricing(String instanceType, String region, String os, String tenancy) {
    // Query AWS Pricing API or cache
}

Cost Calculation Engine

Savings Calculation

public class CostCalculator {

    public double calculateMonthlySavings(
        double currentCostPerHour,
        double recommendedCostPerHour
    ) {
        double hourlySavings = Math.max(0, currentCostPerHour - recommendedCostPerHour);
        return hourlySavings * 730;  // Average hours per month
    }

    public double calculateSchedulerSavings(
        double instanceCostPerHour,
        int hoursOffPerWeek
    ) {
        int hoursOffPerMonth = (int) (hoursOffPerWeek * 4.3);  // 4.3 weeks per month
        return instanceCostPerHour * hoursOffPerMonth;
    }

    public double calculateSnapshotSavings(
        int sizeGB,
        String region
    ) {
        double pricePerGbMonth = getEbsSnapshotPricing(region);  // $0.05/GB-month
        return sizeGB * pricePerGbMonth;
    }
}

ROI Tracking

Projected vs. Realized Savings:

public class SavingsTracker {

    public SavingsReport calculateRealizedSavings(
        String accountId,
        String month
    ) {
        // 1. Get all implemented recommendations for month
        List<Recommendation> implemented =
            recommendationRepository.findByAccountIdAndStatusAndMonth(
                accountId,
                "IMPLEMENTED",
                month
            );

        // 2. Query actual cost from Snowflake CUR data
        Map<String, Double> actualCosts = snowflakeService.getActualCosts(
            accountId,
            month,
            implemented.stream().map(Recommendation::getResourceId).toList()
        );

        // 3. Compare projected vs. actual
        double totalProjected = 0;
        double totalRealized = 0;

        for (Recommendation rec : implemented) {
            double projected = rec.getProjectedMonthlySavings();
            double actual = rec.getCurrentCost() - actualCosts.get(rec.getResourceId());

            totalProjected += projected;
            totalRealized += actual;
        }

        return new SavingsReport(
            totalProjected,
            totalRealized,
            (totalRealized / totalProjected) * 100  // % accuracy
        );
    }
}

Performance & Scalability

Performance Characteristics

Recommendation Generation:

Metric	Value	Notes
EC2 Analysis	~100 instances/second	CloudWatch API bottleneck
Rule Evaluation	~10,000 facts/second	Drools RETE algorithm
Pricing Lookup	~500 lookups/second	Redis cache hit: 95%
Database Write	~1,000 recommendations/second	MongoDB bulk insert

Scalability Limits:

Accounts	Instances	Daily Recommendation Time	Notes
10	1,000	5 minutes	Single tuner-core instance
100	10,000	45 minutes	3 tuner-core instances
1,000	100,000	6 hours	10 tuner-core instances + sharded MongoDB

Optimization Techniques

1. Caching Strategy:

// L1: Redis cache (24-hour TTL)
@Cacheable(value = "pricing-data")
public PricingInfo getPricing(String instanceType, String region);

// L2: MongoDB resource cache (6-hour refresh)
@Cacheable(value = "ec2-resources", ttl = 21600)
public List<Ec2InstanceInfo> getEc2Resources(String accountId, String region);

// L3: JVM cache (1-hour TTL)
@Cacheable(value = "drools-rules", ttl = 3600)
public KieBase loadRules(String ruleFile);

2. Batch Processing:

// Process instances in batches of 100
List<List<Ec2InstanceInfo>> batches = Lists.partition(instances, 100);

for (List<Ec2InstanceInfo> batch : batches) {
    List<RecommendationInfo> batchRecommendations = droolsEngine.fireRules(
        accountId,
        ruleFile,
        batch,
        priceMap,
        lookbackPeriod
    );

    // Bulk insert to MongoDB
    recommendationRepository.insertAll(batchRecommendations);
}

3. Async Processing:

@Async
public CompletableFuture<List<RecommendationInfo>> generateRecommendationsAsync(
    String accountId,
    String region
) {
    List<RecommendationInfo> recommendations = generateRecommendation(accountId, region);
    return CompletableFuture.completedFuture(recommendations);
}

// Parallel execution for multiple regions
List<CompletableFuture<List<RecommendationInfo>>> futures = regions.stream()
    .map(region -> generateRecommendationsAsync(accountId, region))
    .toList();

CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();

Extending the Engine

Adding a New Recommendation Type

Step 1: Create Drools Rule File

src/main/resources/rules/my-new-recommendation-rules.drl:

package com.ttn.ck.tuner.recommendation.rules;

import com.ttn.ck.tuner.utils.aws.MyResourceInfo;
import com.ttn.ck.tuner.utils.dtos.recommendation.RecommendationInfo;

global org.slf4j.Logger log;
global java.util.List recommendationList;

rule "Generate My New Recommendations"
when
    $resource: MyResourceInfo()
then
    // Your business logic here
    if ($resource.shouldRecommend()) {
        RecommendationInfo recommendation = new RecommendationInfo(...);
        recommendationList.add(recommendation);
    }
end

Step 2: Create Job Processor

@Service("MY_NEW_RECOMMENDATION_JOB_PROCESSOR")
@Slf4j
@RequiredArgsConstructor
public class MyNewRecommendationJob implements JobProcessor {

    private final DroolsEngine droolsEngine;
    private final MyResourceService myResourceService;

    @Override
    public void process(Map<String, Object> dataMap) {
        String accountId = (String) dataMap.get("accountId");
        String region = (String) dataMap.get("region");

        // 1. Collect data
        List<MyResourceInfo> resources = myResourceService.getResources(accountId, region);

        // 2. Execute Drools rules
        List<RecommendationInfo> recommendations = droolsEngine.fireRules(
            accountId,
            "rules/my-new-recommendation-rules.drl",
            resources,
            null,
            30
        );

        // 3. Publish results
        tunerEventService.sendEvent(
            TunerEvent.SYNC_RECOMMENDATION_SUCCESS,
            accountId,
            Map.of("recommendations", recommendations)
        );
    }
}

Step 3: Register in Configuration

public enum TunerRecommendationFactor {
    // ... existing factors
    MY_NEW_RECOMMENDATION(
        "MY_NEW_RECOMMENDATION",
        "MY_NEW_RECOMMENDATION_JOB_PROCESSOR",
        "rules/my-new-recommendation-rules.drl"
    );

    // ...
}

Step 4: Schedule Job

# application.yml
tuner:
  recommendation:
    sync:
      schedule:
        myNewRecommendation: "0 0 3 * * ?"  # Daily at 3 AM

Testing New Rules

@SpringBootTest
class MyNewRecommendationJobTest {

    @Autowired
    private DroolsEngine droolsEngine;

    @Test
    void testMyNewRecommendationRule() {
        // 1. Prepare test data
        MyResourceInfo resource = MyResourceInfo.builder()
            .resourceId("test-resource-123")
            .accountId("123456789012")
            .region("us-east-1")
            .shouldRecommend(true)
            .build();

        // 2. Execute rules
        List<RecommendationInfo> recommendations = droolsEngine.fireRules(
            "123456789012",
            "rules/my-new-recommendation-rules.drl",
            List.of(resource),
            null,
            30
        );

        // 3. Verify results
        assertThat(recommendations).hasSize(1);
        assertThat(recommendations.get(0).getResourceId()).isEqualTo("test-resource-123");
        assertThat(recommendations.get(0).getMonthlySavings()).isGreaterThan(0);
    }
}

Next Steps

Security & Compliance - Security architecture and compliance
API Reference - REST API documentation
Data Architecture - Database schemas and models

Table of Contents​

Introduction​

Purpose​

Architecture Philosophy​

Drools Rule Engine Architecture​

Component Overview​

Drools Workflow​

Recommendation Job Framework​

Job Processor Interface​

Standard Job Implementation Pattern​

Job Scheduling​

Rule Definitions & Business Logic​

Rule Categories​

1. Cleaner Rules​

2. OverProvisioned Rules​

3. Modernization Rules​

4. Idle Resource Rules (GCP Support)​

Recommendation Types Catalog​

Complete List (42+ Types)​

AWS Compute (8 types)​

AWS Storage (7 types)​

AWS Database (7 types)​

AWS Networking (4 types)​

AWS Modernization (6 types)​

AWS Other (3 types)​

GCP Compute (3 types)​

GCP Storage (2 types)​

GCP Database (2 types)​

GCP Networking (2 types)​

Data Collection & Analysis​

CloudWatch Metrics Collection​

AWS Pricing Data Collection​

Cost Calculation Engine​

Savings Calculation​

ROI Tracking​

Performance & Scalability​

Performance Characteristics​

Optimization Techniques​

Extending the Engine​

Adding a New Recommendation Type​

Testing New Rules​

Next Steps​

Table of Contents