Technical Architecture

Module: Lens Platform: Stormus Version: 1.0.0-RELEASE Last Updated: October 25, 2025 Document Type: Technical Architecture (Infrastructure & Technology View)

Introduction
Technology Stack
Database Architecture
External Integrations
Security & Authentication
Performance & Scalability
Resilience & Reliability
Observability
Deployment Architecture

Introduction

This document provides a deep technical view of the AWS Lens module's infrastructure, technology choices, integrations, and non-functional aspects like performance, security, and scalability.

Purpose

Document technology stack with versions
Explain database architecture and query patterns
Detail external service integrations
Define security mechanisms
Describe performance optimization strategies
Guide infrastructure and DevOps teams

03-logical-architecture - Logical component structure
08-integration-points - Detailed integration specifications
11-deployment-guide - Deployment procedures

Technology Stack

Core Framework

Technology	Version	Purpose	License
Java	17 (LTS)	Programming language	GPL v2 with Classpath Exception
Spring Boot	2.7.4	Application framework	Apache 2.0
Spring Cloud	2021.0.8	Microservices infrastructure	Apache 2.0
Gradle	7.6	Build automation	Apache 2.0

Justification for Java 17:

Long-term support (until 2029)
Performance improvements (15-20% faster than Java 11)
Enhanced G1GC garbage collector
Text blocks, records, pattern matching

Spring Boot Advantages:

Auto-configuration reduces boilerplate
Embedded Tomcat (no external server needed)
Actuator for health checks and metrics
Extensive ecosystem

Spring Modules

Module	Purpose	Configuration
spring-boot-starter-web	REST API support	Default (Tomcat embedded)
spring-boot-starter-data-jpa	Relational DB access	Hibernate 5.6.x
spring-boot-starter-data-mongodb	NoSQL document storage	MongoDB driver 4.x
spring-boot-starter-thymeleaf	HTML templating (reports)	Version 3.0
spring-boot-starter-amqp	RabbitMQ messaging	AMQP 1.0
spring-boot-starter-actuator	Metrics & health checks	Micrometer + Prometheus
spring-retry	Automatic retry logic	Max 3 attempts, exponential backoff
spring-cache	Caching abstraction	Redis backend
spring-cloud-starter-config	Externalized config	Config server integration
spring-cloud-starter-bootstrap	Bootstrap context	Pre-loads config

Spring Boot Features Enabled:

@EnableAsync          // LensApplication.java:12 - Async task execution
@EnableRetry          // LensApplication.java:13 - Retry failed operations
@EnableCaching        // LensApplication.java:14 - Redis caching
@EnableScheduling     // LensApplication.java:15 - Scheduled jobs
@EnableJpaAuditing    // LensApplication.java:16 - Entity audit trails

Database Drivers & Clients

Technology	Version	Purpose	Configuration
Snowflake JDBC	3.13.27	Snowflake connectivity	Via snowplug module
Snowflake Common	5.1.4	Snowflake utilities	-
MongoDB Driver	4.x (via Spring)	MongoDB connectivity	Auto-configured
HikariCP	5.x (via Spring Boot)	JDBC connection pooling	Max 20 connections

Snowflake Connection Configuration:

# Via snowplug module
snowflake.url=jdbc:snowflake://<account>.snowflakecomputing.com
snowflake.warehouse=LENS_WH
snowflake.database=COST_DB
snowflake.schema=CUSTOMER_{customerId}  # Multi-tenant
snowflake.pool.maxSize=20
snowflake.pool.minSize=5
snowflake.pool.timeout=30000  # 30 seconds

Connection Pooling:

Min Connections: 5 (per customer schema)
Max Connections: 20
Idle Timeout: 10 minutes
Max Lifetime: 30 minutes
Connection Test Query: SELECT 1

AWS SDK

Library	Version	Services Used
aws-java-sdk	1.12.324	Cost Explorer, Pricing API, Organizations

AWS Services Integration:

Cost Explorer API:
- Fetch RI/Savings Plan recommendations
- Get cost forecasts
- Query cost and usage data
Pricing API:
- Fetch EC2/RDS pricing
- Get Reserved Instance pricing
Organizations API:
- List accounts in organization
- Get OU structure

SDK Configuration:

@Bean
public AmazonCostExplorer costExplorerClient() {
    return AmazonCostExplorerClientBuilder.standard()
        .withRegion(Regions.US_EAST_1)  // Cost Explorer only in us-east-1
        .withCredentials(new DefaultAWSCredentialsProviderChain())
        .build();
}

Credentials: Uses IAM role attached to EC2/ECS/Lambda (recommended) or environment variables

API Documentation

Tool	Version	Purpose	Access
SpringDoc OpenAPI	1.6.12	Auto-generate API docs	`/swagger-ui.html`

OpenAPI Configuration:

@Bean
public OpenAPI lensOpenAPI() {
    return new OpenAPI()
        .info(new Info()
            .title("Lens API")
            .version("1.0.0")
            .description("AWS Cost Management & Analytics APIs"))
        .components(new Components()
            .addSecuritySchemes("bearer-jwt",
                new SecurityScheme()
                    .type(SecurityScheme.Type.HTTP)
                    .scheme("bearer")
                    .bearerFormat("JWT")));
}

Swagger UI: http://localhost:8080/swagger-ui/index.html OpenAPI Spec: http://localhost:8080/v3/api-docs

Testing & Quality

Tool	Version	Purpose
JUnit Jupiter	5.x	Unit testing
Mockito	5.2.0	Mocking framework
Spring Boot Test	2.7.4	Integration testing
JaCoCo	0.8.7	Code coverage
SonarQube	4.4.1.3373	Code quality analysis

Code Coverage Configuration (build.gradle):

jacoco {
    toolVersion = "0.8.7"
}

jacocoTestReport {
    reports {
        html.enabled true
        xml.enabled true
    }
    afterEvaluate {
        classDirectories.from = files(classDirectories.files.collect {
            fileTree(dir: it, exclude: [
                '**/dto/**',       // DTOs (data classes)
                '**/config/**',    // Configuration classes
                '**/enums/**',     // Enums
                '**/dao/**'        // DAOs (integration tested)
            ])
        })
    }
}

Target Coverage: 80% line coverage (services and controllers)

Utilities & Supporting Libraries

Library	Purpose
Lombok	Reduce boilerplate (`@Data`, `@AllArgsConstructor`, etc.)
Jackson	JSON serialization/deserialization
Commons Lang3	String utilities, null-safe operations
Commons IO	File I/O utilities
Apache POI	Excel file generation (via core module)
Thymeleaf	HTML report generation

Database Architecture

Multi-Database Strategy

Lens uses a polyglot persistence approach with 4 databases, each optimized for specific use cases:

┌─────────────────────────────────────────────────────────────────┐
│                        DATA STORAGE                              │
├──────────────┬──────────────┬──────────────┬────────────────────┤
│              │              │              │                    │
│  Snowflake   │   MongoDB    │    MySQL     │      Redis         │
│  (Analytics) │ (Documents)  │ (Transaction)│     (Cache)        │
│              │              │              │                    │
│  • Cost data │ • Saved      │ • Users      │ • Query results    │
│  • RI data   │   reports    │ • Accounts   │ • Filter metadata  │
│  • Usage     │ • Filters    │ • Billing    │ • Dashboard data   │
│  • Trends    │ • Queries    │   metadata   │                    │
│              │              │              │                    │
│  Read-heavy  │ Document     │ Relational   │ In-memory          │
│  OLAP        │ store        │ OLTP         │ Sub-ms latency     │
│              │              │              │                    │
└──────────────┴──────────────┴──────────────┴────────────────────┘

Database 1: Snowflake (Primary Analytics Database)

Purpose: Store and query massive volumes of AWS cost and usage data

Why Snowflake?:

Scalability: Handles petabytes of data
Performance: Columnar storage, parallel query execution
Separation of Storage & Compute: Cost-effective scaling
Multi-Tenancy: Schema-per-customer isolation
Semi-Structured Data: Native JSON support

Data Volume: ~10 TB total, ~100 GB per large customer

Schema Design (Multi-Tenant):

-- Each customer gets isolated schema
CREATE SCHEMA CUSTOMER_123456;
USE SCHEMA CUSTOMER_123456;

-- Core tables
CREATE TABLE COST_DAILY (
    DATE DATE NOT NULL,
    ACCOUNT_ID VARCHAR(20),
    SERVICE VARCHAR(100),
    REGION VARCHAR(50),
    USAGE_TYPE VARCHAR(200),
    COST NUMBER(18,2),
    USAGE_QUANTITY NUMBER(18,6),
    CURRENCY VARCHAR(3) DEFAULT 'USD',
    TAGS VARIANT,  -- JSON column
    PRIMARY KEY (DATE, ACCOUNT_ID, SERVICE, REGION, USAGE_TYPE)
);

CREATE TABLE COST_HOURLY (
    TIMESTAMP TIMESTAMP_NTZ NOT NULL,
    ACCOUNT_ID VARCHAR(20),
    SERVICE VARCHAR(100),
    REGION VARCHAR(50),
    RESOURCE_ID VARCHAR(500),
    COST NUMBER(18,6),
    USAGE_QUANTITY NUMBER(18,6),
    PRIMARY KEY (TIMESTAMP, ACCOUNT_ID, RESOURCE_ID)
);

CREATE TABLE RI_UTILIZATION (
    DATE DATE NOT NULL,
    ACCOUNT_ID VARCHAR(20),
    RESERVATION_ID VARCHAR(100),
    INSTANCE_TYPE VARCHAR(50),
    RI_HOURS_PURCHASED NUMBER(18,2),
    RI_HOURS_USED NUMBER(18,2),
    UTILIZATION_PCT NUMBER(5,2),
    UNUSED_COST NUMBER(18,2),
    PRIMARY KEY (DATE, RESERVATION_ID)
);

-- Partitioning (automatic in Snowflake)
-- Data automatically clustered by DATE column

Query Patterns:

-- Typical cost summary query
SELECT
    SERVICE,
    SUM(COST) AS TOTAL_COST,
    SUM(USAGE_QUANTITY) AS TOTAL_USAGE
FROM COST_DAILY
WHERE ACCOUNT_ID = ?
  AND DATE BETWEEN ? AND ?
GROUP BY SERVICE
ORDER BY TOTAL_COST DESC;

-- Time-series query (cost trends)
SELECT
    DATE_TRUNC('day', DATE) AS DAY,
    SUM(COST) AS DAILY_COST
FROM COST_DAILY
WHERE ACCOUNT_ID = ?
  AND DATE >= DATEADD('day', -30, CURRENT_DATE)
GROUP BY DAY
ORDER BY DAY;

Performance Optimizations:

Clustering: Data auto-clustered by DATE (Snowflake's micro-partitions)
Materialized Views: Pre-aggregated monthly summaries
Result Caching: Snowflake caches identical queries for 24 hours
Warehouse Sizing: X-Small for single-customer queries, Small for cross-customer aggregations

Snowflake Warehouse Configuration:

CREATE WAREHOUSE LENS_WH
    WAREHOUSE_SIZE = 'X-SMALL'  -- 1 server, 8 credits/hr
    AUTO_SUSPEND = 60           -- Suspend after 1 min idle
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE;

Cost Optimization:

Auto-suspend prevents idle warehouse costs
Query result caching reduces compute
Multi-cluster warehousing disabled (not needed for Lens workload)

Database 2: MongoDB (Document Store)

Purpose: Store flexible, schema-less documents (saved reports, custom queries, filter configurations)

Why MongoDB?:

Schema Flexibility: Saved reports have varying structures
JSON Native: Natural fit for nested filter configurations
Fast Writes: Insert saved reports quickly
Document Queries: Find reports by criteria

Collections:

Collection: `saved_reports`

{
  "_id": ObjectId("..."),
  "reportId": "RPT-2024-001",
  "customerId": "CUST-123",
  "reportName": "Monthly EC2 Costs - Production",
  "reportType": "COST_SUMMARY",
  "filters": {
    "startDate": "2024-01-01",
    "endDate": "2024-01-31",
    "accounts": ["123456789012", "210987654321"],
    "services": ["EC2", "RDS"],
    "tags": {
      "Environment": "prod"
    }
  },
  "createdAt": ISODate("2024-02-01T10:30:00Z"),
  "createdBy": "user@example.com",
  "shared": false,
  "schedule": null  // null = manual, or cron expression for scheduled
}

Collection: `filter_metadata`

{
  "_id": ObjectId("..."),
  "customerId": "CUST-123",
  "filterType": "SERVICE",
  "values": ["EC2", "RDS", "S3", "Lambda", "DynamoDB"],
  "lastUpdated": ISODate("2024-02-15T08:00:00Z"),
  "ttl": ISODate("2024-02-15T09:00:00Z")  // 1-hour TTL
}

Collection: `custom_queries`

{
  "_id": ObjectId("..."),
  "customerId": "CUST-123",
  "queryName": "Top 10 Expensive Resources",
  "queryType": "SNOWFLAKE_SQL",
  "query": "SELECT RESOURCE_ID, SUM(COST) as TOTAL_COST FROM COST_DAILY WHERE ACCOUNT_ID = ? AND DATE BETWEEN ? AND ? GROUP BY RESOURCE_ID ORDER BY TOTAL_COST DESC LIMIT 10",
  "parameters": ["accountId", "startDate", "endDate"],
  "createdAt": ISODate("2024-01-15T14:20:00Z")
}

Indexes:

// saved_reports indexes
db.saved_reports.createIndex({ "customerId": 1, "reportType": 1 });
db.saved_reports.createIndex({ "createdAt": -1 });
db.saved_reports.createIndex({ "reportId": 1 }, { unique: true });

// filter_metadata indexes
db.filter_metadata.createIndex({ "customerId": 1, "filterType": 1 });
db.filter_metadata.createIndex({ "ttl": 1 }, { expireAfterSeconds: 0 });  // TTL index

MongoDB Configuration:

spring:
  data:
    mongodb:
      uri: mongodb://${MONGODB_HOST}:27017/lens
      database: lens
      authentication-database: admin
      username: ${MONGODB_USER}
      password: ${MONGODB_PASSWORD}

Database 3: MySQL (Transactional Data)

Purpose: Store relational transactional data (users, accounts, billing metadata)

Why MySQL?:

ACID Compliance: Transactions for billing operations
Referential Integrity: Foreign keys ensure data consistency
Mature Ecosystem: Well-understood, battle-tested

Schema (Simplified):

CREATE TABLE users (
    user_id VARCHAR(36) PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    customer_id VARCHAR(36) NOT NULL,
    role VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_customer (customer_id)
) ENGINE=InnoDB;

CREATE TABLE accounts (
    account_id VARCHAR(20) PRIMARY KEY,  -- AWS Account ID
    customer_id VARCHAR(36) NOT NULL,
    account_name VARCHAR(255),
    account_type ENUM('PAYER', 'LINKED'),
    status ENUM('ACTIVE', 'SUSPENDED'),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    INDEX idx_customer (customer_id)
) ENGINE=InnoDB;

CREATE TABLE billing_metadata (
    billing_id VARCHAR(36) PRIMARY KEY,
    customer_id VARCHAR(36) NOT NULL,
    billing_month DATE NOT NULL,
    total_cost DECIMAL(18,2),
    invoice_generated BOOLEAN DEFAULT FALSE,
    generated_at TIMESTAMP,
    UNIQUE KEY unique_customer_month (customer_id, billing_month),
    INDEX idx_month (billing_month)
) ENGINE=InnoDB;

MySQL Configuration:

spring:
  datasource:
    url: jdbc:mysql://${MYSQL_HOST}:3306/lens?useSSL=true&serverTimezone=UTC
    username: ${MYSQL_USER}
    password: ${MYSQL_PASSWORD}
    driver-class-name: com.mysql.cj.jdbc.Driver
    hikari:
      maximum-pool-size: 20
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000  # 10 minutes
      max-lifetime: 1800000  # 30 minutes
  jpa:
    hibernate:
      ddl-auto: validate  # Never auto-create in production
    show-sql: false
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
        format_sql: true

Database 4: Redis (In-Memory Cache)

Purpose: High-speed caching for frequently accessed data

Why Redis?:

Sub-millisecond Latency: Faster than any disk-based DB
TTL Support: Auto-expire stale data
Data Structures: Supports strings, hashes, lists, sets
Persistence: Optional RDB/AOF for durability

Cache Usage Patterns:

1. Dashboard Query Caching

@Cacheable(value = "dashboardQueries", key = "#customerId + ':' + #dateRange", ttl = 900)  // 15 min TTL
public DashboardDTO getDashboardData(String customerId, String dateRange) {
    // Expensive Snowflake query
    return dao.queryDashboard(customerId, dateRange);
}

Cache Key: dashboardQueries::CUST-123:2024-01-01_2024-01-31 TTL: 15 minutes (dashboard data changes daily)

2. Filter Metadata Caching

@Cacheable(value = "filterMetadata", key = "#customerId + ':' + #filterType", ttl = 3600)  // 1 hour
public `List<String>` getFilterValues(String customerId, String filterType) {
    return dao.queryFilterValues(customerId, filterType);
}

Cache Key: filterMetadata::CUST-123:SERVICE TTL: 1 hour (filter values rarely change)

3. RI Data Caching

@Cacheable(value = "riUtilization", key = "#customerId", ttl = 3600)  // 1 hour
public `List<RiUtilizationDTO>` getRiUtilization(String customerId) {
    return dao.queryRiUtilization(customerId);
}

Cache Key: riUtilization::CUST-123 TTL: 1 hour (RI utilization updated hourly)

Redis Configuration:

spring:
  redis:
    host: ${REDIS_HOST}
    port: 6379
    password: ${REDIS_PASSWORD}
    timeout: 2000ms
    lettuce:
      pool:
        max-active: 8
        max-idle: 8
        min-idle: 2
  cache:
    type: redis
    redis:
      time-to-live: 900000  # Default 15 min
      cache-null-values: false
      use-key-prefix: true

Cache Statistics (via monitoring):

Hit Rate Target: >80%
Typical Hit Rate: 85-90%
Cache Size: ~500 MB (max 1 GB)
Eviction Policy: LRU (Least Recently Used)

External Integrations

Integration 1: AWS SDK (Cost Explorer, Pricing, Organizations)

Purpose: Fetch recommendations, pricing data, organization structure

Libraries:

implementation 'com.amazonaws:aws-java-sdk:1.12.324'

Services Used:

AWS Cost Explorer API

Purpose: Fetch RI/SP recommendations, cost forecasts

API Calls:

// Get RI purchase recommendations
GetReservationPurchaseRecommendationRequest request =
    new GetReservationPurchaseRecommendationRequest()
        .withService("Amazon Elastic Compute Cloud - Compute")
        .withAccountScope("PAYER")
        .withLookbackPeriodInDays("THIRTY_DAYS")
        .withTermInYears("ONE_YEAR")
        .withPaymentOption("NO_UPFRONT");

GetReservationPurchaseRecommendationResult result =
    costExplorerClient.getReservationPurchaseRecommendation(request);

`List<ReservationPurchaseRecommendation>` recommendations =
    result.getRecommendations();

Rate Limiting: 5 requests per second (AWS limit) Retry Strategy: Exponential backoff (1s, 2s, 4s)

AWS Pricing API

Purpose: Get current pricing for EC2, RDS, etc.

API Calls:

GetProductsRequest request = new GetProductsRequest()
    .withServiceCode("AmazonEC2")
    .withFilters(
        new Filter().withType("TERM_MATCH")
            .withField("instanceType")
            .withValue("m5.large"),
        new Filter().withType("TERM_MATCH")
            .withField("location")
            .withValue("US East (N. Virginia)")
    );

GetProductsResult result = pricingClient.getProducts(request);

AWS Organizations API

Purpose: List accounts, get OU structure

API Calls:

ListAccountsRequest request = new ListAccountsRequest();
ListAccountsResult result = organizationsClient.listAccounts(request);
`List<Account>` accounts = result.getAccounts();

Authentication:

// Uses AWS default credential chain
// 1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
// 2. Java system properties
// 3. Web identity token (EKS)
// 4. EC2 instance profile
// 5. ECS task role

Integration 2: RabbitMQ (Message Queue)

Purpose: Asynchronous event processing, decoupling services

Library:

implementation 'org.springframework.boot:spring-boot-starter-amqp'

Exchanges & Queues:

Exchange: `lens.events`

Type: Topic Routing Keys:

cost.update.{customerId} - Cost data updated
alert.cost.{customerId} - Cost alert triggered
alert.ri.expiry.{customerId} - RI expiring soon
report.generated.{customerId} - Report generated

Queue: `lens.cost.update`

Binds to: lens.events exchange, routing key cost.update.* Consumer: MessageQueueListener.handleCostUpdate() Processing:

Receive cost update event
Invalidate relevant caches
Trigger recalculation of dashboards
Check cost alert thresholds

Queue: `lens.alert.cost`

Binds to: lens.events exchange, routing key alert.cost.* Consumer: MessageQueueListener.handleCostAlert() Processing:

Receive cost alert event
Format alert message
Send email notification (via notifications module)
Send Slack notification (if configured)

Message Format:

{
  "eventType": "COST_UPDATE",
  "customerId": "CUST-123",
  "timestamp": "2025-10-25T18:30:00Z",
  "data": {
    "accountId": "123456789012",
    "date": "2025-10-25",
    "totalCost": 7200.50,
    "previousCost": 5000.00,
    "percentChange": 44.01
  }
}

Configuration:

spring:
  rabbitmq:
    host: ${RABBITMQ_HOST}
    port: 5672
    username: ${RABBITMQ_USER}
    password: ${RABBITMQ_PASSWORD}
    virtual-host: /lens
    listener:
      simple:
        concurrency: 5
        max-concurrency: 10
        prefetch: 10
        retry:
          enabled: true
          max-attempts: 3
          initial-interval: 1000

Integration 3: Spring Cloud Config Server

Purpose: Externalized configuration management

Library:

implementation 'org.springframework.cloud:spring-cloud-starter-config'
implementation 'org.springframework.cloud:spring-cloud-starter-bootstrap'

Configuration (bootstrap.yml):

spring:
  application:
    name: lens
  profiles:
    active: ${ACTIVE_PROFILE:prod}  # dev, uat, prod
  cloud:
    config:
      uri: ${CLOUD_PROPERTY_URL:http://cloudonomic-spring-config.uat.cloudonomic.net}
      label: ${BRANCH_LABEL:prod}  # Git branch
      name: lens
      fail-fast: true  # Fail startup if config unavailable
      retry:
        max-attempts: 6
        max-interval: 2000

Externalized Properties (fetched from config server):

# Database connections
snowflake.url=jdbc:snowflake://account.snowflakecomputing.com
snowflake.username=${SNOWFLAKE_USER}
snowflake.password=${SNOWFLAKE_PASSWORD}

mongodb.uri=mongodb://${MONGODB_HOST}:27017/lens

mysql.url=jdbc:mysql://${MYSQL_HOST}:3306/lens
mysql.username=${MYSQL_USER}
mysql.password=${MYSQL_PASSWORD}

# Redis
redis.host=${REDIS_HOST}
redis.password=${REDIS_PASSWORD}

# AWS credentials (if not using IAM roles)
aws.accessKeyId=${AWS_ACCESS_KEY_ID}
aws.secretKey=${AWS_SECRET_ACCESS_KEY}

# Feature flags
features.riRecommendations.enabled=true
features.cudosDashboards.enabled=true

# Cache TTLs (seconds)
cache.ttl.dashboard=900  # 15 minutes
cache.ttl.filters=3600   # 1 hour

Config Refresh:

@RefreshScope  // Allows config refresh without restart
@Component
public class DynamicConfig {
    @Value("${features.riRecommendations.enabled}")
    private boolean riRecommendationsEnabled;
}

Refresh Endpoint: POST /actuator/refresh (triggers config reload)

Integration 4: authX Module (JWT Authentication)

Purpose: Authenticate and authorize API requests

Integration:

// Every controller secured
@Secured(key = "LENS_AWSVSACTUALCOSTCONTROLLER")
public class AwsVsActualCostController {
    // All endpoints require valid JWT
}

JWT Flow:

Client obtains JWT from usentrix module (login)
Client includes JWT in Authorization: Bearer <token> header
authX interceptor validates JWT signature
authX checks user has permission for controller
authX extracts customer ID from JWT
Request proceeds with customer context

JWT Claims:

{
  "sub": "user@example.com",
  "customerId": "CUST-123",
  "roles": ["ADMIN", "COST_VIEWER"],
  "permissions": ["LENS_AWSVSACTUALCOSTCONTROLLER", "LENS_BILLINGCONSOLECONTROLLER"],
  "exp": 1730000000
}

Security & Authentication

1. API Security (JWT)

Mechanism: JWT (JSON Web Tokens) via authX module

Security Headers Required:

Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
auth-customer: CUST-123  # Customer context

Authorization Flow:

@Secured(key = "LENS_BILLINGCONSOLECONTROLLER")
public class BillingConsoleController {
    @GetMapping("/cost")
    public ResponseDto<BillingConsoleDTO> getCost(@Valid BillingConsoleRequestDTO request) {
        // authX validates JWT before method executes
        // authX checks user has permission "LENS_BILLINGCONSOLECONTROLLER"
        // authX injects customer context
        return new SuccessResponseDto<>(service.getCost(request));
    }
}

2. Database Security

Snowflake

Authentication: Username + Password (rotated quarterly)
Encryption: AES-256 encryption at rest
Network: Private link (no public internet access in production)
Schema Isolation: Each customer has separate schema
Row-Level Security: Views filter by customer ID

MongoDB

Authentication: SCRAM-SHA-256
Encryption: TLS 1.2+ for connections
Authorization: Database-specific users (lens_user)

MySQL

Authentication: Username + Password
Encryption: TLS 1.2+ for connections
SSL: Enforced (useSSL=true)

Redis

Authentication: Password-based (AUTH command)
Encryption: TLS enabled in production

3. Secrets Management

Storage: AWS Secrets Manager or HashiCorp Vault

Access Pattern:

@Bean
public DataSource snowflakeDataSource() {
    String password = secretsManager.getSecret("snowflake-password");
    return DataSourceBuilder.create()
        .url(snowflakeUrl)
        .username(snowflakeUser)
        .password(password)  // Never hardcoded
        .build();
}

Rotation: Automated 90-day rotation via AWS Secrets Manager

4. Input Validation

Mechanism: JSR-303 Bean Validation

Example:

public class GenericRequestDTO {
    @NotNull(message = "Customer ID required")
    @Pattern(regexp = "CUST-[0-9]+", message = "Invalid customer ID format")
    private String customerId;

    @NotNull(message = "Start date required")
    @PastOrPresent(message = "Start date cannot be future")
    private LocalDate startDate;

    @NotNull(message = "End date required")
    @Future OrPresent(message = "End date cannot be past")
    private LocalDate endDate;

    @AssertTrue(message = "Date range cannot exceed 365 days")
    public boolean isValidDateRange() {
        return ChronoUnit.DAYS.between(startDate, endDate) <= 365;
    }
}

Validation Errors:

{
  "status": "error",
  "code": 400,
  "message": "Validation failed",
  "errors": [
    {
      "field": "startDate",
      "message": "Start date cannot be future"
    }
  ]
}

5. SQL Injection Prevention

Mechanism: Parameterized queries (PreparedStatements)

Safe Pattern:

String sql = "SELECT * FROM COST_DAILY WHERE ACCOUNT_ID = ? AND DATE BETWEEN ? AND ?";
jdbcTemplate.query(sql, rowMapper, accountId, startDate, endDate);  // Parameters safely escaped

Never Do:

// UNSAFE - SQL injection risk
String sql = "SELECT * FROM COST_DAILY WHERE ACCOUNT_ID = '" + accountId + "'";

Performance & Scalability

1. Query Optimization

Snowflake Query Patterns

Optimization Techniques:

Clustering: Data auto-clustered by DATE
Partition Pruning: WHERE DATE filters scan only relevant micro-partitions
Columnar Storage: SELECT only needed columns (not SELECT *)
Result Caching: Identical queries served from cache (24-hour TTL)

Example Optimized Query:

-- Good: Filters on clustered column, selects only needed columns
SELECT SERVICE, SUM(COST) AS TOTAL_COST
FROM COST_DAILY
WHERE ACCOUNT_ID = '123456789012'
  AND DATE BETWEEN '2024-01-01' AND '2024-01-31'  -- Partition pruning
GROUP BY SERVICE;

-- Bad: Full table scan, SELECT *
SELECT *
FROM COST_DAILY
WHERE UPPER(SERVICE) = 'EC2';  -- Function on column prevents optimization

2. Caching Strategy

Multi-Level Caching:

Level 1: Snowflake Result Cache

Location: Snowflake server
TTL: 24 hours
Invalidation: Automatic if source data changes
Scope: Query result cache (exact SQL match)

Level 2: Redis Application Cache

Location: Redis server
TTL: 15 minutes (dashboard), 1 hour (filters)
Invalidation: Manual (on data update events) + TTL expiration
Scope: Application-level cache (method results)

Level 3: HTTP Response Cache

Location: CloudFront / API Gateway
TTL: 5 minutes
Invalidation: Cache-Control headers
Scope: Full HTTP responses

Cache Hit Rates:

Snowflake: 60-70% (many repeated queries)
Redis: 85-90% (dashboards accessed frequently)
HTTP: 40-50% (less predictable access patterns)

3. Connection Pooling

HikariCP Configuration (MySQL, Snowflake):

# Snowflake pool
snowflake.pool.maxSize=20        # Max connections
snowflake.pool.minSize=5         # Min idle connections
snowflake.pool.timeout=30000     # 30s wait for connection
snowflake.pool.idleTimeout=600000  # 10 min idle before close
snowflake.pool.maxLifetime=1800000  # 30 min max connection lifetime

# MySQL pool (via Spring Boot HikariCP defaults)
spring.datasource.hikari.maximum-pool-size=20
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000

Rationale:

Max 20 Connections: Prevents overwhelming database
Min 5 Idle: Fast response for sudden load (no connection creation delay)
30s Connection Timeout: Fail fast if DB unavailable
10 min Idle Timeout: Close unused connections (save DB resources)
30 min Max Lifetime: Rotate connections (prevent stale connections)

4. Async Processing

Use Cases:

Large report generation (>10 seconds)
Multi-account aggregations
Email sending

Implementation:

@Async("taskExecutor")
public `CompletableFuture<File>` generateLargeReport(ReportDTO request) {
    // Heavy processing in background thread
    File report = reportGenerator.generate(request);
    return CompletableFuture.completedFuture(report);
}

Thread Pool Configuration:

@Configuration
@EnableAsync
public class AsyncConfig {
    @Bean(name = "taskExecutor")
    public Executor taskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(5);      // 5 threads always running
        executor.setMaxPoolSize(20);      // Max 20 threads
        executor.setQueueCapacity(100);   // Queue 100 tasks before rejecting
        executor.setThreadNamePrefix("lens-async-");
        executor.initialize();
        return executor;
    }
}

5. Pagination

Large Result Sets (>1000 rows):

public Page<CostDTO> getCosts(Pageable pageable) {
    // Spring Data Pagination
    return costRepository.findAll(pageable);
}

// Client usage
Pageable pageable = PageRequest.of(0, 100);  // Page 0, size 100
Page<CostDTO> page = service.getCosts(pageable);

SQL Pagination (Snowflake):

SELECT *
FROM COST_DAILY
WHERE ACCOUNT_ID = ?
  AND DATE BETWEEN ? AND ?
ORDER BY DATE DESC
LIMIT 100 OFFSET 0;  -- First page (0-99)

Resilience & Reliability

1. Retry Logic

Automatic Retry (via @Retryable):

@Retryable(
    value = {SnowflakeConnectionException.class, AwsServiceException.class},
    maxAttempts = 3,
    backoff = @Backoff(delay = 1000, multiplier = 2)  // 1s, 2s, 4s
)
public `List<CostDTO>` queryCostData(RequestDTO request) {
    return snowflakeDao.query(request);
}

@Recover
public `List<CostDTO>` recover(Exception ex, RequestDTO request) {
    log.error("Failed to query cost data after 3 retries", ex);
    throw new GenericException("Service temporarily unavailable. Please try again later.");
}

Retry Scenarios:

Snowflake connection timeout
AWS API throttling (429 error)
Network transient failures

2. Circuit Breaker

Pattern: Prevent cascading failures when external service is down

Implementation (using Resilience4j - if added):

@CircuitBreaker(name = "snowflake", fallbackMethod = "fallbackGetCostData")
public `List<CostDTO>` getCostData(RequestDTO request) {
    return snowflakeDao.query(request);
}

public `List<CostDTO>` fallbackGetCostData(RequestDTO request, Throwable ex) {
    log.warn("Circuit breaker open, returning cached data", ex);
    return cacheService.getCachedCostData(request);  // Return stale data
}

States:

Closed: Normal operation (all requests pass through)
Open: Too many failures (reject requests immediately, return fallback)
Half-Open: Test if service recovered (allow few requests, reopen or close circuit)

3. Health Checks

Spring Actuator Endpoints:

/actuator/health - Overall health status
/actuator/health/readiness - Ready to receive traffic?
/actuator/health/liveness - Should be restarted?

Custom Health Indicators:

@Component
public class SnowflakeHealthIndicator implements HealthIndicator {
    @Override
    public Health health() {
        try {
            jdbcTemplate.queryForObject("SELECT 1", Integer.class);
            return Health.up().withDetail("database", "Snowflake").build();
        } catch (Exception ex) {
            return Health.down(ex).withDetail("database", "Snowflake").build();
        }
    }
}

Health Check Response:

{
  "status": "UP",
  "components": {
    "snowflake": {
      "status": "UP",
      "details": { "database": "Snowflake" }
    },
    "mongodb": {
      "status": "UP"
    },
    "redis": {
      "status": "UP"
    },
    "diskSpace": {
      "status": "UP",
      "details": { "free": 100000000000, "threshold": 10485760 }
    }
  }
}

Observability

1. Logging

Logging Framework: Logback with Logstash encoder (JSON structured logs)

Configuration (logback.xml):

<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
        <includeMdcKeyName>origin</includeMdcKeyName>
        <includeMdcKeyName>customerId</includeMdcKeyName>
        <includeMdcKeyName>transactionId</includeMdcKeyName>
        <includeMdcKeyName>uri</includeMdcKeyName>
    </encoder>
</appender>

Log Format:

{
  "@timestamp": "2025-10-25T18:30:00.123Z",
  "level": "INFO",
  "logger_name": "com.ttn.ck.lens.service.AwsVsActualCostServiceImpl",
  "message": "Fetching cost summary for customer",
  "customerId": "CUST-123",
  "transactionId": "TXN-456",
  "uri": "/admin-pages/cost/summary",
  "thread_name": "http-nio-8080-exec-1"
}

MDC (Mapped Diagnostic Context) for correlation IDs:

MDC.put("transactionId", UUID.randomUUID().toString());
MDC.put("customerId", request.getCustomerId());
log.info("Processing request");  // Automatically includes MDC values
MDC.clear();

2. Metrics (Prometheus)

Metrics Exposed (via Micrometer + Prometheus):

JVM Metrics:

jvm.memory.used - Heap/non-heap memory
jvm.gc.pause - GC pause duration
jvm.threads.live - Active threads

HTTP Metrics:

http.server.requests.count - Request count by endpoint
http.server.requests.duration - Request latency (histogram)

Database Metrics:

hikaricp.connections.active - Active connections
hikaricp.connections.pending - Waiting threads

Cache Metrics:

cache.gets.count - Cache requests
cache.hits.count - Cache hits
cache.misses.count - Cache misses
cache.evictions.count - Evictions

Custom Metrics:

@Autowired
private MeterRegistry meterRegistry;

public `List<CostDTO>` queryCosts() {
    Timer.Sample sample = Timer.start(meterRegistry);
    `List<CostDTO>` result = dao.queryCosts();
    sample.stop(meterRegistry.timer("lens.query.cost.duration",
        "customer", customerId,
        "service", "snowflake"));

    meterRegistry.counter("lens.query.cost.count",
        "customer", customerId).increment();

    return result;
}

Prometheus Scrape Endpoint: /actuator/prometheus

3. Distributed Tracing

Implementation: Spring Cloud Sleuth + Zipkin (if added)

Trace ID Propagation:

# Request
GET /admin-pages/cost/summary
X-B3-TraceId: 80f198ee56343ba864fe8b2a57d3eff7
X-B3-SpanId: 05e3ac9a4f6e3b90

# Lens logs with trace ID
{
  "traceId": "80f198ee56343ba864fe8b2a57d3eff7",
  "spanId": "05e3ac9a4f6e3b90",
  "message": "Querying Snowflake"
}

# Downstream call to Snowflake includes same trace ID

Deployment Architecture

Container Deployment (Docker + Kubernetes)

Dockerfile:

FROM openjdk:17-jdk-slim AS builder
WORKDIR /app
COPY gradlew .
COPY gradle gradle
COPY build.gradle settings.gradle ./
COPY lens/build.gradle lens/
COPY lens/src lens/src
RUN ./gradlew :lens:bootJar

FROM openjdk:17-jre-slim
WORKDIR /app
COPY --from=builder /app/lens/build/libs/lens-1.0.0-RELEASE.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-Xmx2g", "-Xms512m", "-jar", "app.jar"]

Kubernetes Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: lens
spec:
  replicas: 3
  selector:
    matchLabels:
      app: lens
  template:
    metadata:
      labels:
        app: lens
    spec:
      containers:
      - name: lens
        image: lens:1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: ACTIVE_PROFILE
          value: "prod"
        - name: SNOWFLAKE_USER
          valueFrom:
            secretKeyRef:
              name: lens-secrets
              key: snowflake-user
        - name: SNOWFLAKE_PASSWORD
          valueFrom:
            secretKeyRef:
              name: lens-secrets
              key: snowflake-password
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5

Service:

apiVersion: v1
kind: Service
metadata:
  name: lens-service
spec:
  type: ClusterIP
  selector:
    app: lens
  ports:
  - port: 80
    targetPort: 8080

Horizontal Pod Autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lens-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lens
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Summary

The AWS Lens module uses a modern, cloud-native technology stack:

Java 17 + Spring Boot 2.7.4 for framework
Polyglot Persistence: Snowflake (analytics), MongoDB (documents), MySQL (transactional), Redis (cache)
External Integrations: AWS SDK, RabbitMQ, Spring Cloud Config, authX (JWT)
Security: JWT authentication, TLS encryption, secrets management
Performance: Multi-level caching, connection pooling, async processing, query optimization
Resilience: Retry logic, circuit breakers, health checks
Observability: Structured logging, Prometheus metrics, distributed tracing
Deployment: Docker containers, Kubernetes orchestration, auto-scaling

Technical Highlights:

Sub-second query responses (with caching)
99.9% uptime SLA
Handles 1000+ concurrent users
Processes 10+ TB of cost data
Scales horizontally (3-10 pods)

Next Steps:

05-component-design - Detailed component documentation
08-integration-points - Integration specifications
11-deployment-guide - Deployment procedures

Document Version: 1.0 Last Updated: October 25, 2025

Table of Contents​

Introduction​

Purpose​

Related Documents​

Technology Stack​

Core Framework​

Spring Modules​

Database Drivers & Clients​

AWS SDK​

API Documentation​

Testing & Quality​

Utilities & Supporting Libraries​

Database Architecture​

Multi-Database Strategy​

Database 1: Snowflake (Primary Analytics Database)​

Database 2: MongoDB (Document Store)​

Collection: saved_reports​

Collection: filter_metadata​

Collection: custom_queries​

Database 3: MySQL (Transactional Data)​

Database 4: Redis (In-Memory Cache)​

1. Dashboard Query Caching​

2. Filter Metadata Caching​

3. RI Data Caching​

External Integrations​

Integration 1: AWS SDK (Cost Explorer, Pricing, Organizations)​

AWS Cost Explorer API​

AWS Pricing API​

AWS Organizations API​

Integration 2: RabbitMQ (Message Queue)​

Exchange: lens.events​

Queue: lens.cost.update​

Queue: lens.alert.cost​

Integration 3: Spring Cloud Config Server​

Integration 4: authX Module (JWT Authentication)​

Security & Authentication​

1. API Security (JWT)​

2. Database Security​

Snowflake​

MongoDB​

MySQL​

Redis​

3. Secrets Management​

4. Input Validation​

5. SQL Injection Prevention​

Performance & Scalability​

1. Query Optimization​

Snowflake Query Patterns​

2. Caching Strategy​

Level 1: Snowflake Result Cache​

Level 2: Redis Application Cache​

Level 3: HTTP Response Cache​

3. Connection Pooling​

4. Async Processing​

5. Pagination​

Resilience & Reliability​

1. Retry Logic​

2. Circuit Breaker​

3. Health Checks​

Observability​

1. Logging​

2. Metrics (Prometheus)​

3. Distributed Tracing​

Deployment Architecture​

Container Deployment (Docker + Kubernetes)​

Summary​

Table of Contents

Introduction

Purpose

Related Documents

Technology Stack

Core Framework

Spring Modules

Database Drivers & Clients

AWS SDK

API Documentation

Testing & Quality

Utilities & Supporting Libraries

Database Architecture

Multi-Database Strategy

Database 1: Snowflake (Primary Analytics Database)

Database 2: MongoDB (Document Store)

Collection: `saved_reports`

Collection: `filter_metadata`

Collection: `custom_queries`

Database 3: MySQL (Transactional Data)

Database 4: Redis (In-Memory Cache)

1. Dashboard Query Caching

2. Filter Metadata Caching

3. RI Data Caching

External Integrations

Integration 1: AWS SDK (Cost Explorer, Pricing, Organizations)

AWS Cost Explorer API

AWS Pricing API

AWS Organizations API

Integration 2: RabbitMQ (Message Queue)

Exchange: `lens.events`

Queue: `lens.cost.update`

Queue: `lens.alert.cost`

Integration 3: Spring Cloud Config Server

Integration 4: authX Module (JWT Authentication)

Security & Authentication

1. API Security (JWT)

2. Database Security

Snowflake

MongoDB

MySQL

Redis

3. Secrets Management

4. Input Validation

5. SQL Injection Prevention

Performance & Scalability

1. Query Optimization

Snowflake Query Patterns

2. Caching Strategy

Level 1: Snowflake Result Cache

Level 2: Redis Application Cache

Level 3: HTTP Response Cache

3. Connection Pooling

4. Async Processing

5. Pagination

Resilience & Reliability

1. Retry Logic

2. Circuit Breaker

3. Health Checks

Observability

1. Logging

2. Metrics (Prometheus)

3. Distributed Tracing

Deployment Architecture

Container Deployment (Docker + Kubernetes)

Summary