
Ollama with WebUI on AWS ECS with GPU Support
Table of Contents
This project deploys Ollama and Open WebUI on AWS ECS with GPU support, allowing you to run any large language models that are supported by Ollama and serve it as an api endpoint to a powerful web ui.
Architecture Overview
The infrastructure consists of:
- ECS cluster running on EC2 instances with GPU support (g4dn.xlarge)
- Ollama service for running LLMs
- Open WebUI service for the user interface
- Application Load Balancer for routing traffic
- Service Connect for service discovery
- CloudWatch for logging
graph TD Internet[Internet] --> ALB[Application Load Balancer] subgraph "Public Subnets" ALB end subgraph "Private Subnets" subgraph "ECS Cluster" EC2[EC2 with GPU] subgraph "Ollama Service" OC[Ollama Container] end subgraph "WebUI Service" WC[WebUI Container] end end end ALB -- "Default Route" --> WC ALB -- "Host Header: api.*, ollama.*" --> OC WC -- "Service Connect: ollama.internal:11434" --> OC style EC2 fill:#f9d,stroke:#333,stroke-width:2px style OC fill:#bbf,stroke:#333,stroke-width:1px style WC fill:#bfb,stroke:#333,stroke-width:1px
Features
- GPU Acceleration: Uses g4dn.xlarge instances with NVIDIA T4 GPUs
- Secure Architecture: Services run in private subnets with public access through ALB
- Service Discovery: Uses ECS Service Connect for internal communication
- Logging: CloudWatch logs with 1-day retention
Prerequisites
- AWS Account
- VPC with public and private subnets
- NAT Gateway for outbound internet access from private subnets
- Terraform installed
Deployment
- Clone this repository
- Update
terraform.tfvars
with your configuration - Initialize Terraform
terraform init
- Apply via
terraform apply
Accessing the WebUI
After deployment, you can access the WebUI using the URL provided in the Terraform outputs:
terraform output webui_url
Click on the link and this will direct you to webui interface where you can register as an admin and start using the app.
Loading Models
To load a model, simply go to <alb-url>/admin/settings
and click on models tab on the left. Once on the model page click on the download button to select the model you’d like to download.
Scaling
To adjust the number of instances:
# Update min, max, and desired capacity
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ecs-gpu-asg --min-size 2 --max-size 4 --desired-capacity 2
Troubleshooting
Common Issues
- GPU not detected: Check NVIDIA driver installation with
nvidia-smi
- Services not starting: Check ECS service events in AWS Console
- Cannot connect to WebUI: Verify security groups and load balancer health checks
Viewing Logs
# View WebUI logs
aws logs get-log-events --log-group-name /ecs/webui-service --log-stream-name <LOG_STREAM>
# View Ollama logs
aws logs get-log-events --log-group-name /ecs/ollama-service --log-stream-name <LOG_STREAM>
Cost Optimization
- g4dn.xlarge instances cost approximately $0.526/hour
- CloudWatch logs are configured with 1-day retention to minimize storage costs