Skip to main content

Solution Architecture

Overview

AWS Lens is a cloud-native FinOps platform designed to provide comprehensive cost intelligence for AWS environments. This document describes the solution architecture from a high-level perspective, focusing on how the system delivers value to different stakeholders.


Architecture Principles

1. Read-Only Access

  • AWS Lens never has write permissions to customer AWS accounts
  • All data collection uses AWS IAM roles with read-only policies
  • Ensures complete safety - cannot modify customer infrastructure

2. Cloud-Native Design

  • Built for AWS, runs on AWS
  • Leverages AWS-native services for scalability and reliability
  • Serverless-first approach for cost efficiency

3. Multi-Tenancy

  • Secure data isolation between customers
  • Shared infrastructure for cost efficiency
  • Customer-specific encryption keys

4. Real-Time Processing

  • Near real-time cost data (hourly granularity)
  • Event-driven architecture for immediate insights
  • Asynchronous processing for scalability

High-Level Architecture


System Components

Data Collection Layer

AWS Cost & Usage Reports (CUR)

Purpose: Primary data source for AWS billing data

How it Works:

  1. Customer enables CUR in their AWS account
  2. AWS generates detailed cost reports hourly/daily
  3. Reports deposited in customer's S3 bucket
  4. AWS Lens reads reports via cross-account IAM role

Data Captured:

  • Line-item billing details (every resource, every hour)
  • Resource tags and metadata
  • Pricing information
  • Usage quantities
  • Account/region/service information

CloudWatch Metrics (Optional)

Purpose: Performance metrics for right-sizing recommendations

Data Captured:

  • EC2 CPU/memory utilization
  • RDS connection counts
  • S3 request patterns
  • Lambda invocation counts

Data Ingestion Layer

Components:

  1. Change Detection: Monitors customer S3 buckets for new CUR files
  2. CUR Parser: Parses compressed CSV/Parquet CUR files
  3. Data Normalizer: Standardizes data format across AWS billing versions
  4. Enrichment Service: Adds metadata (account names, tags, business context)
  5. Validation: Ensures data quality and completeness
  6. Processing Queue: Manages async processing pipeline

Processing Layer

Cost Processing Engine

Functions:

  1. Aggregation: Rolls up detailed line items to multiple time granularities
  2. Dimension Building: Creates cost views by service, account, region, tags
  3. Trend Analysis: Identifies spending patterns and anomalies
  4. Forecasting: ML-based projection of future costs

Recommendation Engine

Recommendation Types:

  1. Right-Sizing: Identify over-provisioned instances
  2. RI/SP: Recommend reservation purchases based on steady-state usage
  3. Storage Optimization: Lifecycle policies, storage class changes
  4. Idle Resources: Identify unused resources (stopped instances, unattached EBS)
  5. Architecture: Multi-region, serverless alternatives, modern services

Scoring Factors:

  • Potential savings amount
  • Implementation complexity
  • Business risk
  • Confidence level

Storage Layer

AWS Lens uses polyglot persistence - multiple databases optimized for different use cases.

Snowflake (Analytics)

Purpose: Primary cost data warehouse

Stored Data:

  • Historical cost & usage data (all CUR line items)
  • Aggregated cost views (hourly, daily, monthly)
  • Trend data and forecasts
  • Recommendation results

Why Snowflake:

  • Columnar storage optimized for analytics
  • Automatic scaling for query performance
  • Data sharing capabilities (for MSP use cases)
  • Cost-effective for large-scale time-series data

MySQL (Transactional)

Purpose: OLTP database for operational data

Stored Data:

  • User accounts and permissions
  • Dashboard configurations
  • Report schedules
  • Alert rules
  • Audit logs

Why MySQL:

  • ACID compliance for critical data
  • Well-understood relational model
  • Strong consistency guarantees

MongoDB (Documents)

Purpose: Flexible schema for metadata

Stored Data:

  • Customer account metadata
  • AWS resource metadata
  • Tag mappings
  • Custom dimension definitions
  • Saved reports/filters

Why MongoDB:

  • Flexible schema for varied metadata
  • Fast document retrieval
  • Good for hierarchical data (account organizations)

Redis (Cache)

Purpose: High-performance caching layer

Cached Data:

  • Recent dashboard queries
  • Frequently accessed cost summaries
  • User session data
  • Rate limiting counters

Why Redis:

  • Sub-millisecond response times
  • Reduces load on primary databases
  • TTL-based automatic expiration

API Layer

Components:

  1. API Gateway: Entry point, rate limiting, request routing
  2. Authentication Service: JWT-based auth, SSO integration
  3. Authorization Service: RBAC enforcement
  4. REST API: Business logic, data access
  5. Rate Limiter: Prevent abuse, ensure fair usage

API Capabilities:

  • Cost data queries (flexible filtering, grouping, aggregation)
  • Recommendation retrieval
  • Report generation
  • Alert configuration
  • User/account management
  • Data export

Presentation Layer

Web Application:

  • React-based single-page application
  • Responsive design (desktop, tablet, mobile)
  • Real-time updates (WebSocket for alerts)
  • Offline capability (cached dashboards)

Dashboard Service:

  • Pre-built dashboard templates
  • Custom dashboard builder
  • Widget library (charts, tables, metrics cards)
  • Drill-down navigation

Reporting Service:

  • Scheduled report generation
  • On-demand exports
  • Multiple formats (PDF, CSV, Excel)
  • Delivery via email, Slack, S3

Data Flow

Daily Cost Data Flow

Timeline:

  • T+0h: AWS generates CUR (hourly updates enabled)
  • T+1h: AWS Lens detects new CUR file
  • T+2h: Data parsed, validated, loaded to warehouse
  • T+3h: Aggregations and recommendations ready
  • Result: 3-hour delay from usage to visibility

Real-Time Anomaly Detection


Deployment Architecture

Multi-Tenant SaaS Deployment

Deployment Characteristics:

  • High Availability: Multi-AZ deployment, auto-scaling
  • Disaster Recovery: Cross-region replication, RTO < 4 hours, RPO < 1 hour
  • Scalability: Horizontal scaling (ECS Fargate), vertical scaling (RDS)
  • Security: VPC isolation, security groups, NACLs, encryption everywhere

Integration Points

Inbound Integrations (Data Sources)

Outbound Integrations (Notifications & Exports)

Authentication Integrations


Security Architecture

Defense in Depth

Security Features:

  1. Network Security:

    • WAF rules for common attacks (SQL injection, XSS)
    • DDoS protection (AWS Shield Standard)
    • VPC isolation with private subnets
  2. Application Security:

    • JWT-based authentication
    • Role-based access control (RBAC)
    • Input validation and sanitization
    • OWASP Top 10 mitigation
  3. Data Security:

    • TLS 1.3 for all data in transit
    • AES-256 encryption at rest
    • Customer-managed encryption keys (optional)
    • Data retention policies
  4. Access Control:

    • Read-only IAM roles for AWS access
    • Multi-factor authentication support
    • Audit logging (all API calls logged)
    • Principle of least privilege

Scalability & Performance

Horizontal Scaling

Scaling Triggers:

  • CPU utilization > 70%
  • Memory utilization > 80%
  • Request count > 1000 req/sec
  • Response time > 500ms (p95)

Caching Strategy

Caching TTLs:

  • Dashboard queries: 5 minutes
  • Cost summaries: 1 hour
  • Recommendations: 24 hours
  • User metadata: 15 minutes

Monitoring & Observability

Key Metrics

Alerting

Critical Alerts (PagerDuty, immediate response):

  • API error rate > 5%
  • Database connection pool exhausted
  • Data ingestion pipeline failure
  • Security incident detected

Warning Alerts (Email, review within 1 hour):

  • API latency p95 > 1 second
  • Cache hit rate < 70%
  • Data processing lag > 6 hours
  • Disk utilization > 80%

Disaster Recovery

RTO & RPO Targets

ScenarioRTORPORecovery Strategy
Application Failure5 minutes0Multi-AZ auto-recovery
Database Failure15 minutes0Multi-AZ automatic failover
Regional Outage4 hours1 hourCross-region failover
Data Corruption24 hours24 hoursPoint-in-time restore

Backup Strategy


Cost Optimization (The Platform Itself)

AWS Lens practices what it preaches:

  1. Compute:

    • Fargate Spot for non-critical workloads (70% savings)
    • Compute Savings Plans for baseline capacity (20% savings)
  2. Storage:

    • S3 Intelligent-Tiering for backups
    • Snowflake auto-suspend for idle warehouses
  3. Networking:

    • CloudFront CDN for static assets
    • VPC endpoints to avoid NAT Gateway costs
  4. Right-Sizing:

    • Continuous monitoring and adjustment
    • AWS Lens analyzes its own costs!

Result: Platform infrastructure costs < 5% of customer savings generated


Technology Stack Summary

LayerTechnologyPurpose
FrontendReact 19.0, TypeScriptWeb application
APISpring Boot 2.7.4, Java 17REST API
ProcessingApache Spark, AirflowData processing pipelines
AnalyticsSnowflakeCost data warehouse
TransactionalMySQL 8.0User/config data
Document StoreMongoDBMetadata
CacheRedisHigh-speed caching
ComputeECS Fargate, LambdaContainer orchestration, serverless
StorageS3, EBSObject and block storage
NetworkingALB, CloudFront, VPCLoad balancing, CDN, isolation
SecurityWAF, Shield, KMSSecurity and encryption
MonitoringCloudWatch, DataDogObservability

Next Steps

For Architects

For Developers

For Executives


This solution architecture provides a high-level understanding of how AWS Lens delivers cost intelligence. For technical implementation details, refer to the architecture documents listed above.