Solution Architecture
Overview
AWS Lens is a cloud-native FinOps platform designed to provide comprehensive cost intelligence for AWS environments. This document describes the solution architecture from a high-level perspective, focusing on how the system delivers value to different stakeholders.
Architecture Principles
1. Read-Only Access
- AWS Lens never has write permissions to customer AWS accounts
- All data collection uses AWS IAM roles with read-only policies
- Ensures complete safety - cannot modify customer infrastructure
2. Cloud-Native Design
- Built for AWS, runs on AWS
- Leverages AWS-native services for scalability and reliability
- Serverless-first approach for cost efficiency
3. Multi-Tenancy
- Secure data isolation between customers
- Shared infrastructure for cost efficiency
- Customer-specific encryption keys
4. Real-Time Processing
- Near real-time cost data (hourly granularity)
- Event-driven architecture for immediate insights
- Asynchronous processing for scalability
High-Level Architecture
System Components
Data Collection Layer
AWS Cost & Usage Reports (CUR)
Purpose: Primary data source for AWS billing data
How it Works:
- Customer enables CUR in their AWS account
- AWS generates detailed cost reports hourly/daily
- Reports deposited in customer's S3 bucket
- AWS Lens reads reports via cross-account IAM role
Data Captured:
- Line-item billing details (every resource, every hour)
- Resource tags and metadata
- Pricing information
- Usage quantities
- Account/region/service information
CloudWatch Metrics (Optional)
Purpose: Performance metrics for right-sizing recommendations
Data Captured:
- EC2 CPU/memory utilization
- RDS connection counts
- S3 request patterns
- Lambda invocation counts
Data Ingestion Layer
Components:
- Change Detection: Monitors customer S3 buckets for new CUR files
- CUR Parser: Parses compressed CSV/Parquet CUR files
- Data Normalizer: Standardizes data format across AWS billing versions
- Enrichment Service: Adds metadata (account names, tags, business context)
- Validation: Ensures data quality and completeness
- Processing Queue: Manages async processing pipeline
Processing Layer
Cost Processing Engine
Functions:
- Aggregation: Rolls up detailed line items to multiple time granularities
- Dimension Building: Creates cost views by service, account, region, tags
- Trend Analysis: Identifies spending patterns and anomalies
- Forecasting: ML-based projection of future costs
Recommendation Engine
Recommendation Types:
- Right-Sizing: Identify over-provisioned instances
- RI/SP: Recommend reservation purchases based on steady-state usage
- Storage Optimization: Lifecycle policies, storage class changes
- Idle Resources: Identify unused resources (stopped instances, unattached EBS)
- Architecture: Multi-region, serverless alternatives, modern services
Scoring Factors:
- Potential savings amount
- Implementation complexity
- Business risk
- Confidence level
Storage Layer
AWS Lens uses polyglot persistence - multiple databases optimized for different use cases.
Snowflake (Analytics)
Purpose: Primary cost data warehouse
Stored Data:
- Historical cost & usage data (all CUR line items)
- Aggregated cost views (hourly, daily, monthly)
- Trend data and forecasts
- Recommendation results
Why Snowflake:
- Columnar storage optimized for analytics
- Automatic scaling for query performance
- Data sharing capabilities (for MSP use cases)
- Cost-effective for large-scale time-series data
MySQL (Transactional)
Purpose: OLTP database for operational data
Stored Data:
- User accounts and permissions
- Dashboard configurations
- Report schedules
- Alert rules
- Audit logs
Why MySQL:
- ACID compliance for critical data
- Well-understood relational model
- Strong consistency guarantees
MongoDB (Documents)
Purpose: Flexible schema for metadata
Stored Data:
- Customer account metadata
- AWS resource metadata
- Tag mappings
- Custom dimension definitions
- Saved reports/filters
Why MongoDB:
- Flexible schema for varied metadata
- Fast document retrieval
- Good for hierarchical data (account organizations)
Redis (Cache)
Purpose: High-performance caching layer
Cached Data:
- Recent dashboard queries
- Frequently accessed cost summaries
- User session data
- Rate limiting counters
Why Redis:
- Sub-millisecond response times
- Reduces load on primary databases
- TTL-based automatic expiration
API Layer
Components:
- API Gateway: Entry point, rate limiting, request routing
- Authentication Service: JWT-based auth, SSO integration
- Authorization Service: RBAC enforcement
- REST API: Business logic, data access
- Rate Limiter: Prevent abuse, ensure fair usage
API Capabilities:
- Cost data queries (flexible filtering, grouping, aggregation)
- Recommendation retrieval
- Report generation
- Alert configuration
- User/account management
- Data export
Presentation Layer
Web Application:
- React-based single-page application
- Responsive design (desktop, tablet, mobile)
- Real-time updates (WebSocket for alerts)
- Offline capability (cached dashboards)
Dashboard Service:
- Pre-built dashboard templates
- Custom dashboard builder
- Widget library (charts, tables, metrics cards)
- Drill-down navigation
Reporting Service:
- Scheduled report generation
- On-demand exports
- Multiple formats (PDF, CSV, Excel)
- Delivery via email, Slack, S3
Data Flow
Daily Cost Data Flow
Timeline:
- T+0h: AWS generates CUR (hourly updates enabled)
- T+1h: AWS Lens detects new CUR file
- T+2h: Data parsed, validated, loaded to warehouse
- T+3h: Aggregations and recommendations ready
- Result: 3-hour delay from usage to visibility
Real-Time Anomaly Detection
Deployment Architecture
Multi-Tenant SaaS Deployment
Deployment Characteristics:
- High Availability: Multi-AZ deployment, auto-scaling
- Disaster Recovery: Cross-region replication, RTO < 4 hours, RPO < 1 hour
- Scalability: Horizontal scaling (ECS Fargate), vertical scaling (RDS)
- Security: VPC isolation, security groups, NACLs, encryption everywhere
Integration Points
Inbound Integrations (Data Sources)
Outbound Integrations (Notifications & Exports)
Authentication Integrations
Security Architecture
Defense in Depth
Security Features:
-
Network Security:
- WAF rules for common attacks (SQL injection, XSS)
- DDoS protection (AWS Shield Standard)
- VPC isolation with private subnets
-
Application Security:
- JWT-based authentication
- Role-based access control (RBAC)
- Input validation and sanitization
- OWASP Top 10 mitigation
-
Data Security:
- TLS 1.3 for all data in transit
- AES-256 encryption at rest
- Customer-managed encryption keys (optional)
- Data retention policies
-
Access Control:
- Read-only IAM roles for AWS access
- Multi-factor authentication support
- Audit logging (all API calls logged)
- Principle of least privilege
Scalability & Performance
Horizontal Scaling
Scaling Triggers:
- CPU utilization > 70%
- Memory utilization > 80%
- Request count > 1000 req/sec
- Response time > 500ms (p95)
Caching Strategy
Caching TTLs:
- Dashboard queries: 5 minutes
- Cost summaries: 1 hour
- Recommendations: 24 hours
- User metadata: 15 minutes
Monitoring & Observability
Key Metrics
Alerting
Critical Alerts (PagerDuty, immediate response):
- API error rate > 5%
- Database connection pool exhausted
- Data ingestion pipeline failure
- Security incident detected
Warning Alerts (Email, review within 1 hour):
- API latency p95 > 1 second
- Cache hit rate < 70%
- Data processing lag > 6 hours
- Disk utilization > 80%
Disaster Recovery
RTO & RPO Targets
| Scenario | RTO | RPO | Recovery Strategy |
|---|---|---|---|
| Application Failure | 5 minutes | 0 | Multi-AZ auto-recovery |
| Database Failure | 15 minutes | 0 | Multi-AZ automatic failover |
| Regional Outage | 4 hours | 1 hour | Cross-region failover |
| Data Corruption | 24 hours | 24 hours | Point-in-time restore |
Backup Strategy
Cost Optimization (The Platform Itself)
AWS Lens practices what it preaches:
-
Compute:
- Fargate Spot for non-critical workloads (70% savings)
- Compute Savings Plans for baseline capacity (20% savings)
-
Storage:
- S3 Intelligent-Tiering for backups
- Snowflake auto-suspend for idle warehouses
-
Networking:
- CloudFront CDN for static assets
- VPC endpoints to avoid NAT Gateway costs
-
Right-Sizing:
- Continuous monitoring and adjustment
- AWS Lens analyzes its own costs!
Result: Platform infrastructure costs < 5% of customer savings generated
Technology Stack Summary
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 19.0, TypeScript | Web application |
| API | Spring Boot 2.7.4, Java 17 | REST API |
| Processing | Apache Spark, Airflow | Data processing pipelines |
| Analytics | Snowflake | Cost data warehouse |
| Transactional | MySQL 8.0 | User/config data |
| Document Store | MongoDB | Metadata |
| Cache | Redis | High-speed caching |
| Compute | ECS Fargate, Lambda | Container orchestration, serverless |
| Storage | S3, EBS | Object and block storage |
| Networking | ALB, CloudFront, VPC | Load balancing, CDN, isolation |
| Security | WAF, Shield, KMS | Security and encryption |
| Monitoring | CloudWatch, DataDog | Observability |
Next Steps
For Architects
- Physical Architecture - Infrastructure details
- Logical Architecture - Component interactions
- Security Architecture - Security deep dive
- Integration Points - Integration patterns
For Developers
- Developer Quickstart - Get started quickly
- API Reference - API documentation
- Data Models - Database schemas
For Executives
- Executive Overview - Business value
- Key Features - Feature walkthrough
- Use Cases - Real-world examples
This solution architecture provides a high-level understanding of how AWS Lens delivers cost intelligence. For technical implementation details, refer to the architecture documents listed above.