Architecture

A technical deep-dive into the NeMo AI Platform architecture: 12 distributed services, 37 specialized ML engines, full Kubernetes deployment with Istio service mesh, all orchestrated on a single consumer GPU.

How It Works

A layered architecture designed for GPU efficiency and real-time processing.

Clients
Web Frontend
20+ pages
Flutter Mobile
Cross-platform
Gateway
NGINX
TLS Termination
:443
API Gateway
Auth / Limits
:8000
Services
Gemma
LLM AI
:8001
Transcription
ASR
:8003
RAG
Context
:8004
Emotion
Analysis
:8005
ML Service
37 Engines
:8006
Insights
Analytics
:8010
Fiserv
Banking
:8015
N8N
Automation
:8011
Infra
GPU Coordinator
Queue Service
:8002
Redis
Cache/Locks
:6379
PostgreSQL
Persistence
:5432

12 Microservices

10 application services + 2 infrastructure services working in concert, all Dockerized.

api-gateway :8000

Central entry point handling JWT auth, request routing, rate limiting, and WebSocket connections for real-time updates.

gemma-service :8001

Conversational AI powered by Gemma 3 4B (Q4 quantized). Integrates RAG context, emotional awareness, and business analysis.

queue-service (GPU Coord) :8002

VRAM orchestration via Redis semaphores. Manages model loading, memory allocation, and preemptive pausing between Gemma and ASR.

transcription-service :8003

Real-time ASR using Parakeet RNNT 0.6b. Includes speaker diarization (Sortformer) and streaming audio processing.

rag-service :8004

Retrieval-augmented generation with FAISS vector storage. Handles conversation memory and knowledge bases.

emotion-service :8005

Multi-dimensional emotion analysis using fine-tuned DistilRoBERTa. Provides sentiment scoring and temporal tracking.

ml-service :8006

"System 2" thinking with 37 specialized engines. Titan AutoML, Oracle Causal, Newton Symbolic, Salesforce CRM analytics, and more.

insights-service :8010

Analytics and AutoML experiments. Revenue forecasting, anomaly detection, and business intelligence visualization.

nginx :443

HTTPS reverse proxy with TLS 1.3 termination. Handles SSL certificates, security headers, and load balancing.

n8n-service :8011

Voice command integration and smart home automation. Connects to Voice Monkey for Alexa control.

fiserv-service :8015

Banking hub integrating with Fiserv API for account data, transactions, and financial analytics.

redis + postgres Infra

Persistence layer. Redis for high-speed caching and locks; PostgreSQL for durable data and task queues.

37 Specialized ML Engines

A comprehensive suite of analytical engines powered by the ML Service.

Core AI Models

Gemma 3 4B
Parakeet ASR
TitaNet Speaker
DistilRoBERTa (Emotion)
MiniLM Embeddings
Sortformer Diarization

AutoML & Predictive (Titan Series)

Titan AutoML
Oracle Causal
Newton Symbolic
Chronos Temporal
Galileo Geometric
Scout Discovery
Predictive Engine
Statistical Engine
Trend Engine

Financial Analysis

Revenue Forecast
Cash Flow
Budget Variance
Profit Margin
Cost Optimization
Pricing Strategy
ROI Prediction
Inventory Optimization

Advanced Analytics

Chaos Non-Linear
Anomaly Detector
Clustering Engine
Customer LTV
Market Basket
Spend Pattern
Deep Feature
Universal Graph
Flash Inference
Mirror Synthetic
RAG Evaluation
Quality Insights

Salesforce CRM Analytics

Churn Prediction
Next Best Action
Deal Velocity
Competitive Intel
Customer 360
Lead Scoring
Opportunity Score

Kubernetes Ready

Production-ready K8s manifests with NVIDIA GPU passthrough

Manifests

  • namespace.yaml
  • secrets.yaml (12 keys)
  • services.yaml (ClusterIP)
  • deployments.yaml (842 lines)
  • ingress.yaml (NGINX)
  • nvidia-device-plugin.yaml

GPU Passthrough

  • NVIDIA Device Plugin DaemonSet
  • Direct device mounts (/dev/nvidia*)
  • CUDA runtime in containers
  • Resource limits: nvidia.com/gpu: 1
  • Shared GPU across pods

Kustomize Overlays

  • base/ (common resources)
  • overlays/local/ (dev)
  • overlays/install/ (prod)
  • Health check probes
  • Init containers for setup

Istio Service Mesh

  • VirtualService routing
  • DestinationRule policies
  • Gateway TLS termination
  • mTLS between services
  • Traffic management