foo.com Operational Dashboard

Service Value Chain - Last 6 hours - US-EAST-2
STEP 1
Customer Discovery
Customers discover foo.com through search engines, ads, or referrals. They land on our homepage or category pages and begin exploring our camping gear offerings.
Landing Page Traffic
4,127 /min
3000 3600 4200 4800 5400 3200 3600 00:00 02:00 04:00 06:00 now
Current
Warn
Alert
Deployment
Architecture: Customer traffic enters through Route53 DNS, hitting CloudFront CDN for static assets. Requests flow through Application Load Balancers to a fleet of EC2 web servers, which query RDS PostgreSQL read replicas for dynamic content. Ad network integration tracks attribution for paid traffic sources.
Browser DNS (Route53) CDN (CF) ALB Web Tier (EC2) Database (RDS) Ad Network (API)
Recent Events:
10/19, 2:15a - 2:47a - CloudFront cache hit ratio degraded to 72% due to edge location capacity, RCA-201487 10/16, 3:00a - 3:25a - RDS read replica lag increased to 8s during backup window, RCA-201461 10/15, 9:30p - 10:05p - Route53 health checks intermittent in us-east-2b, no customer impact, RCA-201455
DNS Resolution Time
Route53 DNS query resolution time from authoritative nameservers in us-east-2, P99 latency, alert threshold 22ms
12ms P99
8 18 28 22 18
Load Balancer Health
Rate of HTTP 200 responses vs non-200 status codes on ALB target groups aggregated across us-east-2a/b/c, target >99.6%
99.98%
99.0 99.5 100 99.3 99.6 2a 2b 2c
CDN Cache Hit Rate
CloudFront cache hit ratio (hits/[hits+misses]) for CSS/JS/images across 40+ edge locations, target >90%
94.2%
80 90 100 85 90
Homepage Load Time
P99 time-to-interactive from CloudWatch RUM JavaScript beacon, measures DOMContentLoaded + JS execution, target <1.7s
1.2s P99
0.8 1.5 2.2 2.0 1.7
Web Server CPU
CPU utilization % averaged across 48 EC2 c5.2xlarge instances in Auto Scaling Group, scales out at 75% threshold
42%
20 60 100 90 75 2a 2b 2c
Database Connections
Active PostgreSQL connections to 3 RDS read replicas (max_connections=500 per instance), PgBouncer connection pooling enabled
167
100 225 350 320 280
Ad Network Response
P99 API response time from Google Ads/Facebook Pixel attribution endpoints, timeout configured at 2000ms
187ms P99
120 220 320 280 230
SSL Handshake Time
P99 TLS 1.3 handshake duration at ALB using ACM-managed certificates, includes session resumption optimization
34ms P99
20 50 80 70 55
STEP 2
Browse and Find Equipment
Customers navigate through product categories, use search functionality, and view product detail pages to find the camping equipment they need.
Product Page Views
2,847 /min
2000 2400 2800 3200 3600 2200 2500 00:00 02:00 04:00 06:00 now
Current
Warn
Alert
Deployment
Architecture: Product browsing adds Elasticsearch clusters for search, S3-backed image service with CloudFront distribution, and a recommendation engine (SageMaker endpoint). Product catalog heavily cached in Redis to minimize database load. Review service provides ratings and customer feedback.
Web Tier (EC2) Search (ES) Product DB (RDS) Cache (ElastiCache) Image Svc (S3) Review Svc (DynamoDB) Recommend (SageMaker) CDN (CloudFront)
Recent Events:
10/22, 4:20p - 4:55p - Elasticsearch cluster us-east-2c experienced elevated search latency (500ms+ P99), RCA-201501 10/18, 11:45a - 12:30p - Redis eviction rate spike due to memory pressure, added 2 cache nodes, RCA-201472 10/17, 8:00p - 8:15p - Deployment d-2024101700 caused brief increased latency across product and cart services, RCA-201470 10/16, 3:00a - 3:25a - RDS read replica lag increased to 8s during backup window, RCA-201461
CDN Response Time
P99 CloudFront response time for S3-backed product images (JPEG/WebP) across edge locations, origin shield enabled
18ms P99
10 25 40 35 25
Search API Latency
P99 Elasticsearch search query latency across 6-node r5.xlarge cluster (18 shards, 2 replicas), includes faceting and highlighting
42ms P99
20 60 100 80 60 2a 2b 2c
Database Query Time
P99 PostgreSQL SELECT query time on product_catalog table (22M rows) against read replicas, PgBouncer connection pooling
23ms P99
15 30 45 40 30
Image Service Avail
S3 GET request success rate for product-images bucket (Standard-IA storage class), versioning enabled, target 99.99%
99.97%
99.0 99.5 100 99.3 99.6
Product DB Read Ops
RDS db.r5.2xlarge read IOPS consumed per second, provisioned IOPS = 5000, includes index scans on product_catalog
2,847 /sec
2000 3200 4400 4000 3500
Cache Hit Ratio
ElastiCache Redis 6.2 cache hit rate (hits/[hits+misses]) on 3-node r6g.large cluster, TTL=3600s, target >95%
87.3%
70 85 100 75 80
Review Service API
P99 API response time for review service (ECS Fargate) querying DynamoDB reviews table (on-demand capacity), includes pagination
52ms P99
30 80 130 110 90
Recommendation Engine
P99 SageMaker real-time inference latency for collaborative-filtering recommendation model (ml.m5.xlarge endpoint), batch size=1
127ms P99
80 150 220 200 170
STEP 3
Add Equipment to Cart
Customers select product options (size, color, quantity) and add items to their shopping cart. The cart service validates inventory and maintains session state.
Add to Cart Events
487 /min
350 425 500 575 650 380 420 00:00 02:00 04:00 06:00 now
Current
Warn
Alert
Deployment
Architecture: Cart service uses Redis/ElastiCache for session storage (sub-10ms latency) with RDS for cart persistence. Each cart operation validates against the inventory service (separate microservice with its own database), calls the pricing service for real-time price calculations including promotions, and authenticates via the auth service before modifying cart state.
Web Tier (EC2) Cart Svc (ECS) Session (ElastiCache) Cart DB (RDS) Auth Svc (Lambda) Inventory Svc (ECS) Pricing Svc (Lambda) Inv DB (RDS) Price (ElastiCache)
Recent Events:
10/20, 6:10p - 6:35p - Cart service connection pool exhausted during flash sale, increased pool size, RCA-201489 10/17, 8:00p - 8:15p - Deployment d-2024101700 caused brief increased latency across product and cart services, RCA-201470 10/17, 1:20a - 1:55a - Auth service Lambda cold starts elevated (800ms+) after deployment, RCA-201468 10/16, 3:00a - 3:25a - RDS read replica lag increased to 8s during backup window, RCA-201461
Cart API Latency
P99 REST API latency for cart operations (POST/PUT/DELETE) on ECS Fargate tasks, includes Redis and RDS round trips
67ms P99
40 80 120 100 80
Session Store Latency
P99 ElastiCache Redis GET/SET command latency on 3-node r6g.xlarge cluster with cluster mode enabled, cross-AZ replication
8ms P99
5 12 20 15 12 2a 2b 2c
Database Query Time
P99 PostgreSQL INSERT/UPDATE latency on cart_items table to RDS primary (db.r5.xlarge), synchronous replication to Multi-AZ standby
24ms P99
15 30 45 40 30
Inventory Service Avail
P99 inventory check API latency (ECS Fargate) querying warehouse_stock table (RDS), includes reservation locking mechanism
99.94%
99.0 99.5 100 99.3 99.6
Pricing Service
P99 Lambda function duration for pricing calculation including promotional rules engine, 1GB memory, Python 3.11 runtime
34ms P99
20 50 80 70 55
Product Options Cache
ElastiCache Redis cache hit rate for product variant data (size/color/SKU mappings), TTL=7200s, eviction policy=allkeys-lru
92.1%
75 87 100 80 87
Auth Service Latency
P99 Lambda JWT token validation latency (RS256 algorithm), public key cached in ElastiCache, 512MB memory allocation
15ms P99
10 25 40 35 27
STEP 4
Complete Purchase Flow
Customers review their cart, enter shipping and payment information, and complete the checkout process. Orders are created and payment is processed.
Completed Purchases
201 /min
150 200 250 300 350 180 200 00:00 02:00 04:00 06:00 now
Current
Warn
Alert
Deployment
Architecture: Checkout orchestrates multiple services: Order service coordinates the transaction, calling external payment gateway (Stripe/Braintree), tax calculation API (Avalara), shipping rate calculator, fraud detection ML service, address validation, email confirmation (SES), and promo code validation. Payment gateway degradation impacts overall checkout health.
Order Svc (Lambda) Payment (Stripe API) Tax (Avalara API) Shipping (Lambda) Fraud (SageMaker) Address (USPS API) Email (SES) Promo Svc (DynamoDB) Order DB (RDS)
Recent Events:
11/06, 8:45a - ONGOING - Stripe API elevated error rate (4.2% vs baseline 0.8%), order completion at 96.1%, monitoring 10/23, 11:05a - 2:32p - Avalara outage reduced order completion by 5%, RCA-201510 10/21, 5:15p - 7:40p - Fraud detection SageMaker endpoint timeout spike (3s+), orders delayed, RCA-201495 10/14, 2:00a - 2:45a - Order service DynamoDB throttling during deployment, RCA-201448
Payment Gateway Success
Stripe Payment Intents API success rate (HTTP 2xx / total requests), currently elevated 4xx error rate, baseline target >99.2%
98.7%
97.0 98.5 100 98.0 99.0
Order Service Latency
P99 Lambda order processing function duration (Node.js 18.x, 2GB memory), orchestrates payment/tax/fraud/shipping calls with circuit breakers
156ms P99
100 175 250 220 180 2a 2b 2c
Database Query Time
P99 PostgreSQL INSERT latency on orders table (RDS primary db.r5.xlarge), includes transaction commit with ACID guarantees, Multi-AZ
26ms P99
15 30 45 40 30
Email Service Avail
SES email delivery success rate for order confirmations, SQS queue buffering, includes bounce/complaint tracking, target >99.5%
99.98%
99.0 99.5 100 99.3 99.6
Tax Calculation API
P99 Avalara AvaTax REST API response time for tax calculation requests, includes jurisdiction lookup and rate calculation
89ms P99
60 120 180 160 130
Shipping Calculator
P99 Lambda shipping calculator duration calling UPS/FedEx/USPS rate APIs in parallel, returns lowest rate, 1.5GB memory
112ms P99
70 140 210 180 150
Fraud Detection
P99 SageMaker fraud detection model inference latency (XGBoost classifier, ml.c5.xlarge endpoint), risk score 0-1000
234ms P99
150 300 450 400 330
Address Validation
P99 USPS Address Validation API response time for address standardization and ZIP+4 lookup, timeout=1500ms
67ms P99
40 80 120 105 85
Promo Code Service
P99 promo code validation latency querying DynamoDB promo_codes table (GSI on code field), Redis cache for active codes
45ms P99
25 60 95 80 65