Ollama with WebUI on AWS ECS with GPU Support

Serkan H
Llm , Aws
March 30, 2025

Table of Contents

This project deploys Ollama and Open WebUI on AWS ECS with GPU support, allowing you to run any large language models that are supported by Ollama and serve it as an api endpoint to a powerful web ui.

Architecture Overview

The infrastructure consists of:

ECS cluster running on EC2 instances with GPU support (g4dn.xlarge)
Ollama service for running LLMs
Open WebUI service for the user interface
Application Load Balancer for routing traffic
Service Connect for service discovery
CloudWatch for logging

    graph TD
	    Internet[Internet] --> ALB[Application Load Balancer]
	
	    subgraph "Public Subnets"
	        ALB
	    end
	
	    subgraph "Private Subnets"
	        subgraph "ECS Cluster"
	            EC2[EC2 with GPU]
	
	            subgraph "Ollama Service"
	                OC[Ollama Container]
	            end
	
	            subgraph "WebUI Service"
	                WC[WebUI Container]
	            end
	        end
	    end
	
	    ALB -- "Default Route" --> WC
	    ALB -- "Host Header: api.*, ollama.*" --> OC
	    WC -- "Service Connect: ollama.internal:11434" --> OC
	
	    style EC2 fill:#f9d,stroke:#333,stroke-width:2px
	    style OC fill:#bbf,stroke:#333,stroke-width:1px
	    style WC fill:#bfb,stroke:#333,stroke-width:1px

Features

GPU Acceleration: Uses g4dn.xlarge instances with NVIDIA T4 GPUs
Secure Architecture: Services run in private subnets with public access through ALB
Service Discovery: Uses ECS Service Connect for internal communication
Logging: CloudWatch logs with 1-day retention

Prerequisites

AWS Account
VPC with public and private subnets
NAT Gateway for outbound internet access from private subnets
Terraform installed

Deployment

Clone this repository
Update terraform.tfvars with your configuration
Initialize Terraform terraform init
Apply via terraform apply

Accessing the WebUI

After deployment, you can access the WebUI using the URL provided in the Terraform outputs:

terraform output webui_url

Click on the link and this will direct you to webui interface where you can register as an admin and start using the app.

Loading Models

To load a model, simply go to <alb-url>/admin/settings and click on models tab on the left. Once on the model page click on the download button to select the model you’d like to download.

Scaling

To adjust the number of instances:

# Update min, max, and desired capacity
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ecs-gpu-asg --min-size 2 --max-size 4 --desired-capacity 2

Troubleshooting

Common Issues

GPU not detected: Check NVIDIA driver installation with nvidia-smi
Services not starting: Check ECS service events in AWS Console
Cannot connect to WebUI: Verify security groups and load balancer health checks

Viewing Logs

# View WebUI logs
aws logs get-log-events --log-group-name /ecs/webui-service --log-stream-name <LOG_STREAM>

# View Ollama logs
aws logs get-log-events --log-group-name /ecs/ollama-service --log-stream-name <LOG_STREAM>

Cost Optimization

g4dn.xlarge instances cost approximately $0.526/hour
CloudWatch logs are configured with 1-day retention to minimize storage costs