Ollama with WebUI on AWS ECS with GPU Support

Ollama with WebUI on AWS ECS with GPU Support

Table of Contents

This project deploys Ollama and Open WebUI on AWS ECS with GPU support, allowing you to run any large language models that are supported by Ollama and serve it as an api endpoint to a powerful web ui.

Architecture Overview

The infrastructure consists of:

  • ECS cluster running on EC2 instances with GPU support (g4dn.xlarge)
  • Ollama service for running LLMs
  • Open WebUI service for the user interface
  • Application Load Balancer for routing traffic
  • Service Connect for service discovery
  • CloudWatch for logging
    graph TD
	    Internet[Internet] --> ALB[Application Load Balancer]
	
	    subgraph "Public Subnets"
	        ALB
	    end
	
	    subgraph "Private Subnets"
	        subgraph "ECS Cluster"
	            EC2[EC2 with GPU]
	
	            subgraph "Ollama Service"
	                OC[Ollama Container]
	            end
	
	            subgraph "WebUI Service"
	                WC[WebUI Container]
	            end
	        end
	    end
	
	    ALB -- "Default Route" --> WC
	    ALB -- "Host Header: api.*, ollama.*" --> OC
	    WC -- "Service Connect: ollama.internal:11434" --> OC
	
	    style EC2 fill:#f9d,stroke:#333,stroke-width:2px
	    style OC fill:#bbf,stroke:#333,stroke-width:1px
	    style WC fill:#bfb,stroke:#333,stroke-width:1px

Features

  • GPU Acceleration: Uses g4dn.xlarge instances with NVIDIA T4 GPUs
  • Secure Architecture: Services run in private subnets with public access through ALB
  • Service Discovery: Uses ECS Service Connect for internal communication
  • Logging: CloudWatch logs with 1-day retention

Prerequisites

  • AWS Account
  • VPC with public and private subnets
  • NAT Gateway for outbound internet access from private subnets
  • Terraform installed

Deployment

  1. Clone this repository
  2. Update terraform.tfvars with your configuration
  3. Initialize Terraform terraform init
  4. Apply via terraform apply

Accessing the WebUI

After deployment, you can access the WebUI using the URL provided in the Terraform outputs:

terraform output webui_url

Click on the link and this will direct you to webui interface where you can register as an admin and start using the app.

Loading Models

To load a model, simply go to <alb-url>/admin/settings and click on models tab on the left. Once on the model page click on the download button to select the model you’d like to download.

Scaling

To adjust the number of instances:

# Update min, max, and desired capacity
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ecs-gpu-asg --min-size 2 --max-size 4 --desired-capacity 2

Troubleshooting

Common Issues

  1. GPU not detected: Check NVIDIA driver installation with nvidia-smi
  2. Services not starting: Check ECS service events in AWS Console
  3. Cannot connect to WebUI: Verify security groups and load balancer health checks

Viewing Logs

# View WebUI logs
aws logs get-log-events --log-group-name /ecs/webui-service --log-stream-name <LOG_STREAM>

# View Ollama logs
aws logs get-log-events --log-group-name /ecs/ollama-service --log-stream-name <LOG_STREAM>

Cost Optimization

  • g4dn.xlarge instances cost approximately $0.526/hour
  • CloudWatch logs are configured with 1-day retention to minimize storage costs
comments powered by Disqus