🎯 Mục tiêu Task 9:
📥 Input từ các Task trước:
API Deployment là bước triển khai service dự đoán đã được container hóa lên Kubernetes (EKS). Bước này đảm bảo ứng dụng được triển khai theo kiến trúc microservice, tự động scale và có tính sẵn sàng cao.
EKS Deployment Architecture:
Client → ALB → EKS Service → API Pods → S3 Models
↓
Auto-scaling (HPA)
Components:
mlopsCần tạo 5 file chính:
namespace.yaml - Tạo namespace mlopsserviceaccount.yaml - IRSA service accountdeployment.yaml - API application với SageMaker Registryservice.yaml - LoadBalancer servicehpa.yaml - Auto-scaling# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: mlops
labels:
app.kubernetes.io/name: retail-api
---
# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: retail-api-sa
namespace: mlops
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::842676018087:role/eks-sagemaker-access-role
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: retail-api
namespace: mlops
labels:
app: retail-api
spec:
replicas: 2
selector:
matchLabels:
app: retail-api
template:
metadata:
labels:
app: retail-api
spec:
serviceAccountName: retail-api-sa
containers:
- name: retail-api
image: 842676018087.dkr.ecr.ap-southeast-1.amazonaws.com/mlops/retail-api:latest
ports:
- containerPort: 8000
env:
- name: PORT
value: "8000"
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
- name: MODEL_PACKAGE_GROUP
value: "retail-price-sensitivity-models"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: retail-api-service
namespace: mlops
labels:
app: retail-api
spec:
selector:
app: retail-api
ports:
- name: http
port: 80
targetPort: 8000
type: LoadBalancer
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: retail-api-hpa
namespace: mlops
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: retail-api
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
# Deploy all manifests in order
kubectl apply -f namespace.yaml
kubectl apply -f serviceaccount.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f hpa.yaml
# Kiểm tra trạng thái pods
kubectl get pods -n mlops
# Kiểm tra service và load balancer
kubectl get svc -n mlops
# Kiểm tra horizontal pod autoscaler
kubectl get hpa -n mlops
# Kiểm tra logs của pod
kubectl logs -l app=retail-api -n mlops --tail=50
# Lấy URL của LoadBalancer
kubectl get svc retail-api-service -n mlops -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
# Test health check endpoint
curl http://[LOAD_BALANCER_URL]/health
# Test API documentation
curl http://[LOAD_BALANCER_URL]/docs
# Test prediction endpoint với data format thật
curl -X POST http://[LOAD_BALANCER_URL]/predict \
-H "Content-Type: application/json" \
-d '{
"BASKET_SIZE": "M",
"BASKET_TYPE": "MIXED",
"STORE_REGION": "LONDON",
"STORE_FORMAT": "LS",
"SPEND": 125.50,
"QUANTITY": 3,
"PROD_CODE_20": "FOOD",
"PROD_CODE_30": "FRESH"
}'
AWS Console → EKS → Clusters → mlops-retail-cluster

mlops-retail-cluster → Resources → All namespaces → Filter: mlops

Resources → Deployments → retail-api


Nếu Pods ở trạng thái Pending:
Nếu LoadBalancer timeout/connection refused:
Kiểm tra Events trong EKS Console:
Resources → Events → Filter namespace: mlops
# Port forward service đến localhost (nếu LoadBalancer chưa ready)
kubectl port-forward service/retail-api-service 8080:80 -n mlops
# Test qua port forward
curl http://localhost:8080/health
# Kiểm tra model info endpoint
curl http://[LOAD_BALANCER_URL]/model/info
# Kiểm tra model metrics từ SageMaker Registry
curl http://[LOAD_BALANCER_URL]/model/metrics
# Expected response: Accuracy 84.7%, F1-Score 83.2% từ Registry
# Load test với data format đúng
for i in {1..100}; do
curl -X POST http://[LOAD_BALANCER_URL]/predict \
-H "Content-Type: application/json" \
-d '{"BASKET_SIZE":"M","BASKET_TYPE":"MIXED","STORE_REGION":"LONDON","STORE_FORMAT":"LS","SPEND":125.50,"QUANTITY":3,"PROD_CODE_20":"FOOD","PROD_CODE_30":"FRESH"}' &
done
# Theo dõi HPA scaling
kubectl get hpa retail-api-hpa -n mlops -w
# Theo dõi pods được scale up (từ 2 → max 5)
kubectl get pods -n mlops -w
| Thành phần | Ước tính | Ghi chú |
|---|---|---|
| EKS Pod (2 replica Spot node) | ~0.012 USD/h | Chi phí compute |
| ALB/NLB (public) | ~0.02 USD/h | Chỉ bật khi demo |
| Tổng (1h demo) | ≈ 0.03–0.04 USD | Cực thấp nếu tắt ngay sau demo |
Chi phí tính toán dựa trên Spot instances t3.medium và NLB tại region ap-southeast-1. Chi phí thực tế có thể thay đổi tùy theo cấu hình và thời gian sử dụng.
🎯 Task 9 Complete - API Deployment on EKS
# Xóa tất cả resources trong namespace mlops
kubectl delete namespace mlops
# Hoặc xóa từng resource riêng lẻ
kubectl delete deployment retail-api -n mlops
kubectl delete service retail-api-service -n mlops
kubectl delete hpa retail-api-hpa -n mlops
kubectl delete serviceaccount retail-api-sa -n mlops
# Kiểm tra LoadBalancer đã bị xóa
aws elbv2 describe-load-balancers --query 'LoadBalancers[?contains(LoadBalancerName, `k8s-mlops`)].LoadBalancerArn'
# List images trong repository
aws ecr describe-images --repository-name mlops/retail-api --region ap-southeast-1
# Xóa specific image tag
aws ecr batch-delete-image \
--repository-name mlops/retail-api \
--image-ids imageTag=v3 \
--region ap-southeast-1
# Xóa tất cả images
aws ecr batch-delete-image \
--repository-name mlops/retail-api \
--image-ids "$(aws ecr describe-images --repository-name mlops/retail-api --region ap-southeast-1 --query 'imageDetails[].imageDigest' --output text | tr '\t' '\n' | sed 's/.*/imageDigest=&/')" \
--region ap-southeast-1
# Kiểm tra không còn pods nào
kubectl get pods -n mlops
# Kiểm tra không còn services nào
kubectl get svc -n mlops
# Kiểm tra LoadBalancer đã bị terminate
aws elbv2 describe-load-balancers --query 'LoadBalancers[?contains(LoadBalancerName, `k8s-mlops`)]'
| Resource Type | Request | Limit | Cost Impact |
|---|---|---|---|
| CPU | 250m | 500m | ~25% of node CPU |
| Memory | 512Mi | 1Gi | ~25% of node memory |
| Storage (EBS) | - | - | From EBS pricing |
Với t2.micro node (1 vCPU, 1GB RAM):
| Load Balancer Type | Giá (USD/hour) | Giá (USD/month) | Data Processing |
|---|---|---|---|
| Classic LB | $0.025 | $18.25 | $0.008/GB |
| Application LB | $0.0225 | $16.43 | $0.008/LCU-hour |
| Network LB | $0.0225 | $16.43 | $0.006/NLCU-hour |
| Service Type | AWS Resource | Monthly Cost | Use Case |
|---|---|---|---|
| ClusterIP | None | $0 | Internal communication |
| NodePort | EC2 Security Groups | $0 | Development testing |
| LoadBalancer | ELB/ALB/NLB | $16.43+ | Production external access |
| ExternalName | None | $0 | External service mapping |
Horizontal Pod Autoscaler (HPA):
Cluster Autoscaler:
Basic Deployment (2 replicas):
| Component | Quantity | Resource Usage | Monthly Cost |
|---|---|---|---|
| API Pods | 2 replicas | 500m CPU, 1Gi RAM | Included in node cost |
| LoadBalancer Service | 1 ALB | Base + LCU usage | $16.43 + usage |
| HPA | 1 autoscaler | Controller only | $0 |
| Ingress | Optional | Same ALB | $0 additional |
| Total | ~$16.43 + LCU |
With Auto-scaling (2-5 replicas):
| Scenario | Pods | Node Requirements | Additional Cost |
|---|---|---|---|
| Low load | 2 pods | 2x t2.micro (free) | $0 |
| Medium load | 3-4 pods | 1x t3.small | $15.18 |
| High load | 5 pods | 1x t3.medium | $30.37 |
| Transfer Type | Cost | Use Case |
|---|---|---|
| Pod-to-Pod (same AZ) | Free | Internal communication |
| Pod-to-Pod (cross-AZ) | $0.01/GB | Multi-AZ deployment |
| LoadBalancer to Internet | $0.12/GB | API responses to clients |
| VPC Endpoints | Free | S3/ECR access |
# Example PVC for model storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-storage
namespace: mlops
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp3
Storage pricing:
Resource Right-sizing:
resources:
requests:
memory: "256Mi" # Start smaller
cpu: "100m" # Minimal CPU request
limits:
memory: "512Mi" # Reasonable limit
cpu: "250m" # Allow bursting
Efficient Pod Scheduling:
# Node affinity for cost optimization
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/instance-type
operator: In
values: ["t3.micro", "t3.small"] # Prefer cheaper instances
LoadBalancer Optimization:
# Use single ALB for multiple services
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: shared-alb
annotations:
kubernetes.io/ingress.class: alb
spec:
rules:
- http:
paths:
- path: /api/v1/*
backend:
service:
name: retail-api-service
port:
number: 80
- path: /admin/*
backend:
service:
name: admin-service
port:
number: 80
# Monitor pod resource usage
kubectl top pods -n mlops
# Check actual vs requested resources
kubectl describe pod <pod-name> -n mlops
# Monitor HPA behavior
kubectl get hpa -w -n mlops
# Check LoadBalancer usage
aws elbv2 describe-load-balancers --names <alb-name>
Cost tracking commands:
# ELB costs
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Elastic Load Balancing"]}}'
# EC2 costs for nodes
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=INSTANCE_TYPE
💰 Cost Summary cho Task 8:
Next Step: Task 09: Elastic Load Balancing