🟢 dev@learneezi.ai
// System Design Documentation
Platform Architecture Overview
Service Topology · Domains · Lifecycle · Kubernetes Infrastructure
01 — Client Layer
📱 MobileApp
iOS / Android
🖥 AdminWeb
Browser
HTTPS /api/learning/app
HTTPS /api/auth
02 — NGINX Ingress Controller
🔀 NGINX Ingress
TLS Termination · Load Balancing · Routing
Route to Services
03 — API Gateway
🔀 Gateway
Request Router + Rate Limiter
⚡ Redis
Rate Limit Cache
JWT Validate
Forward Request
04 — Application Services
🔐 AuthService
Authentication
🎓 LearningService
Core Business Logic
⚙️ AdminService
Admin Panel APIs
Read / Write
Read / Write
Cache / Lock
Push
Objects
05 — Infrastructure & External Services
🗄 AuthDB
PostgreSQL
🗄 LearningDB
PostgreSQL HA
⚡ Redis HA
Cache / Lock / Idempotency
💾 Longhorn
Distributed Storage
🔔 FCM
Push Notifications
🗂 S3 Storage
Object Store
NGINX Ingress
Application Service
Infrastructure
Client / Storage
External Service
📱 App APIs
ExamFlow
Rewards
Leaderboard
Content
Billing
⚙️ Admin APIs
Ranking
Rewards
Jobs
LearningService
CORE BUSINESS LOGIC
Central service handling all learning platform operations — exams, rankings, rewards, notifications, content delivery, and billing.
🎮 Game / Exam Flow
🏆 Ranking + Publish
🎁 Rewards + Claims
🔔 Notifications
📊 Leaderboard
📚 Content / Progress
💳 Billing / Subscriptions
⚡ Cache / Lock Layer
Redis Usage
Submit Idempotency
Ranking Lock
Throttling
Leaderboard Cache
Master Data Cache
🗄 Data Stores
LearningDB (PG)
Redis
S3 Object Store
🔌 External
FCM / APNS
⏱ Scheduling
Rank Exam @ 2AM
Admin Manual Trigger
Broadcast Optimization
Mobile App
1Start attempt → API
2Receive attemptToken + question stream
3Submit response per question (choiceId, elapsedMs)
4Complete attempt → API
5Open push notification → deeplink API call
6Receive final screen data
Learning Service
1Receive start → idempotency lock in Redis
2Create IN_PROGRESS attempt in DB
3Return attemptToken + stream questions
4Receive complete → throttle + idempotency checks
5Persist completion + PENDING result in DB
6Rank exam (2AM scheduled or admin trigger)
7Compute rank + economics + prize awards
8Publish results → send FCM notifications
Learning DB
1Write: Create IN_PROGRESS attempt
2Write: Persist response + timing per question
3Write: Persist attempt completion + PENDING result
4Write: Compute rank + economics + prize awards
5Write: Publish results
6Read: Fetch target data on deeplink action
Redis
1Start idempotency lock on attempt start
2Submit idempotency check on complete
3Throttle check on attempt complete
4Ranking lock during rank computation
5Rate-limit checks at Gateway layer
6Leaderboard + question-pool + master data cache
FCM / APNS
1Receive push payload from LearningService
2Dispatch winner/result notifications
3Route to APNS for iOS devices
4Deliver notification to client device
5Broadcast: no per-user persistence
6Dispatch audit maintained separately
Redis Usage
Submit/start idempotency locks
Ranking lock during computation
Request throttling at API level
Leaderboard real-time cache
Question pool + master data caching
Gateway rate-limit checks
Reward Status Flow
ASSIGNED
→
REQUESTED
→
APPROVED
/
REJECTED
→
FULFILLED
CANCELLED
for invalidated / withdrawn flow
Coupon Security Model
Coupon code not exposed in result metadata
Coupon resolved only at claim flow
Surfaced only to eligible learner
Resolved server-side — never in API response until claimed
Broadcast Notification Optimization
Per-user persistence avoided for global broadcasts
Dispatch audit maintained separately
Reduces DB write amplification at scale
FCM handles fan-out to devices directly
Ranking Computation
Scheduled daily at 2AM (automatic)
Admin manual trigger available outside schedule
Redis ranking lock prevents concurrent computation
Computes rank + economics + prize awards atomically
Publishes results → triggers FCM notifications
Request Security Flow
All requests carry JWT token
Gateway forwards to AuthService for introspection if needed
Redis rate-limit check at Gateway layer
Validated requests forwarded to LearningService
Idempotency enforced at service layer via Redis
⎈ Deployment Scenarios
🔷 Single Worker Node
Dev / Staging
EKS Cluster
⚙️ Control Plane (Managed AWS)
kube-apiserver
etcd
scheduler
controller-manager
↕ kubelet
🖥 Worker Node × 1
nginx-ingress
gateway-svc
auth-svc
learning-svc
admin-svc
redis
postgres
longhorn-mgr
cert-manager
All pods run on the same node — no redundancy
Node failure = complete service outage
Longhorn stores PVs locally on single disk
Redis / PG single instance, no replication
Use case: dev, testing, cost-optimized staging
🟢 Multiple Worker Nodes
Production HA
EKS Cluster — HA
⚙️ Control Plane (Managed AWS Multi-AZ)
kube-apiserver
etcd ×3
scheduler
controller-manager
↕ kubelet per node
🖥 Worker Node 1
nginx-ingress
gateway-svc
auth-svc
redis-primary
longhorn-node
🖥 Worker Node 2
learning-svc
admin-svc
postgres-primary
redis-replica
longhorn-node
🖥 Worker Node 3
postgres-standby
redis-sentinel
learning-svc
longhorn-node
cert-manager
Pod anti-affinity spreads replicas across nodes
Node failure — workloads reschedule to healthy nodes
Longhorn replicates data across all 3 nodes (replica=3)
Redis Sentinel auto-failover — promotes replica on primary failure
Postgres HA — streaming replication + automatic failover
🔧 Infrastructure Components
🔀 NGINX Ingress Controller
TLS termination (cert-manager + Let's Encrypt)
Routes external HTTPS → internal ClusterIP services
AWS ELB → NGINX → Services
SSL redirect + force HTTPS enforced
Namespace: ingress-nginx
💾 Longhorn Storage
Distributed block storage for Kubernetes PVs
Replication factor: 3 (across worker nodes)
Used by: PostgreSQL, Redis persistent volumes
Automatic volume snapshots + S3 backup
Web UI available via ingress
⚡ Redis HA (Sentinel)
1 Primary + 2 Replicas + 3 Sentinels
Sentinel monitors primary health continuously
Auto-promotes replica on primary failure
Spring Boot / apps connect via Sentinel endpoint
PV backed by Longhorn
🗄 PostgreSQL HA
Primary + Standby with streaming replication
Automatic failover via Patroni / operator
Separate instances: AuthDB + LearningDB
Automated S3 backups (scheduled)
PV backed by Longhorn (EBS on AWS)
🎓 Learning Module
LearningService — core exam + ranking + rewards
Deployment: 2+ replicas for HA
HPA: scales on CPU/memory thresholds
Connects to LearningDB + Redis + FCM + S3
Namespace: learneezi
🔀 Gateway Module
Request routing + JWT pass-through validation
Rate limiting via Redis
Forwards to AuthService / LearningService
Deployment: 2+ replicas
Sits behind NGINX Ingress
🔐 Authentication Module
AuthService — JWT issuance + validation
Reads/writes to AuthDB (PostgreSQL)
Stateless — horizontally scalable
Deployment: 2+ replicas
Token introspection on demand
⚙️ Admin Module
AdminService — admin panel APIs
Reward approval / rejection / fulfillment
Manual ranking trigger + exam publishing
Restricted access — internal routes only
Deployment: 1–2 replicas
✅ High Availability Summary
💾 Storage HA
Longhorn replicates every volume across 3 nodes. No single disk failure can cause data loss. Automatic failover for PVs.
⚡ Redis HA
Sentinel-based failover. Primary failure detected within seconds. Replica promoted automatically. Zero manual intervention.
🗄 PostgreSQL HA
Streaming replication to standby. Automated failover promotes standby to primary. S3 backups for point-in-time recovery.
🔀 Ingress HA
NGINX controller runs as DaemonSet or multiple replicas. AWS ELB distributes traffic. TLS auto-renewed by cert-manager.
🎓 Service HA
All application services deploy with 2+ replicas. Pod anti-affinity rules prevent co-location. HPA scales on load automatically.
☁️ Node HA (Multi-AZ)
Worker nodes spread across AWS Availability Zones. AZ failure only takes down a subset. Kubernetes reschedules within minutes.