Building a Hybrid AI, Plex, and Virtualization Home Server (Looking for GPU Advice)

Overview

I’m currently building a multi-purpose home lab server designed to run several workloads simultaneously, with the goal of consolidating media, virtualization, storage, local AI, and home automation into a single always-on platform.

Planned workloads:

• Plex Media Server
• Virtual machines and cybersecurity labs
• Network File Server (NAS)
• Local AI stack using OpenClaw paired with Llama models
• Home automation and camera integration

The idea is to have one system act as a media hub, development environment, and private AI infrastructure without relying on cloud services.

I’d like feedback on the build overall, but especially GPU recommendations toward the end.


Design Goals

The system is being built around a few priorities:

  • Run multiple services concurrently without bottlenecks
  • Support local LLM inference
  • Reliable Plex streaming and hardware transcoding
  • Dedicated environment for cybersecurity labs and testing
  • Easy future GPU expansion
  • Enterprise-level stability for 24/7 uptime

Rather than maintaining multiple smaller machines, I decided to consolidate everything onto a high-core enterprise platform.


Hardware Specifications

System Platform

Fully Built & Tested (ASRock Platform)


CPU

AMD EPYC 7452
32 Cores / 64 Threads

Chosen primarily for virtualization density and parallel workloads. The high core count allows containers, VMs, and AI services to run concurrently without stepping on each other.

Primary Responsibilities

  • VM hosting
  • Container orchestration
  • AI services
  • Background media processing

Memory

128 GB RAM (2 × 64 GB DIMMs)

Large RAM capacity is intended to support both virtualization and local LLM workloads.

Planned Allocation

  • AI workloads: 32–64 GB
  • Virtual machines: 32–48 GB
  • Plex + system services: 8–12 GB
  • Remaining memory reserved for caching

Power Supply

Corsair HX1500i (1500W Platinum)

Oversized intentionally to allow future GPU expansion without power constraints.


High-Speed Storage

Samsung 990 Pro NVMe (1TB ×2)

Workloads separated to prevent contention:

Drive Role Purpose
NVMe 1 Cache Pool A Docker containers + appdata
NVMe 2 Cache Pool B VM storage + AI models

This layout should keep responsiveness consistent even when Plex, VMs, and AI services are active simultaneously.


Workload Architecture

Plex Media Server

Role: Media Processing Node

  • Streaming
  • Metadata processing
  • Hardware transcoding (GPU planned)

Virtualization Layer

Role: VM Host Environment

Used for:

  • Cybersecurity labs
  • Testing environments
  • Windows/Linux VMs
  • Development workloads

The EPYC platform shines here due to thread availability.


File Server (NAS)

Role: Primary Storage Services

Provides:

  • Network shares
  • Media storage
  • Backups
  • Archive/ISO storage

NVMe cache accelerates frequently accessed data while bulk storage lives on the array.


Local AI Stack (OpenClaw + Llama)

Role: Local AI Inference Node

Components:

  • OpenClaw gateway
  • Ollama / Llama models
  • Local vector memory

Running everything locally keeps data private and eliminates API costs while allowing experimentation with agents and automation workflows.


Logical Architecture

AI Services

Plex | Virtual Machines | File Server

AMD EPYC Compute Layer

128 GB Memory Pool

NVMe Cache Tier

Storage Array

The goal is workload isolation while maximizing hardware utilization.


GPU Recommendations? (Main Question)

I’m currently trying to decide what GPU setup makes the most sense for this system and would really appreciate input from others running similar homelab or local AI environments.

Primary GPU Goals

  • Accelerate local Llama inference (OpenClaw + Ollama)
  • Handle Plex hardware transcoding
  • Maintain good power efficiency for 24/7 operation
  • Work reliably inside a server chassis
  • Leave room for future multi-GPU expansion

Questions I’m Trying to Answer

  • Is a single GPU enough for combined Plex + AI workloads?
  • Does separating AI and Plex onto different GPUs actually help in practice?
  • Is VRAM capacity more important than raw GPU speed for local LLMs?
  • Enterprise cards vs consumer GPUs for long-term homelab reliability?
  • What GPUs have worked best specifically with Ollama or local LLM setups?

Curious what others in the homelab or local AI space are running and what has held up well long term.

My 2¢

For the NAS functionality, you do not mention what OS(s) you plan to use. You do not seem to allocate any RAM for a file system like ZFS.

With the Epyc chips you have access to 8 channel memory, using only two RAM sticks will severely throttle potential performance. For 128GB you are much better to use eight 16GB sticks than two 64GB sticks. But, as I said above, you probably need to add in RAM for ZFS ARC.

Whether you can combine Plex and AI really depends on the concurrent workloads you plan to run. If this system is just for you and you are a single tasker, either experimenting with AI only, or watching Plex only, at a time that is fine. If this system is for a family (or even yourself) where you expect to be able to do both concurrently (stream a video about AI to your phone while running models being referenced in the video on the machine), that won’t work very smoothly unless you let the CPU do the transcode rather than the GPU.

VRAM is more important than speed up to the point where the models you want to run will fit in VRAM, because in order to run a model it effectively all has to fit in VRAM. The more intelligent the models you want to run, the more VRAM you need, but beyond that you want a faster generation of GPU chip. So, for example, if you know the types of models you want to run will fit in 16 GB, you want the fastest 16 GB card you can get; getting a 24 or 32 GB card won’t make things faster. Whereas, if you want to run a 24 GB model, getting a 16 GB card isn’t just slower, it is useless.

I am most familiar with Unraid and will most likely use that as the OS. I will definitely populate all 8 RAM slots. Thanks for catching that mistake. I would like to have the option to stream and utilize AI concurrently, but I don’t want to worry about performance if I do. At maximum, the Plex server will have about 5 to 6 connections, with 1 local stream and 4 to 5 remote users. I appreciate the feed back and your time as well :slight_smile:

Although I do not use it, Unraid may be a good choice here as (assuming you use XFS for the main pool) its file system/OS RAM requirements are minimal. And I believe Unraid has an easy button for the Nvidia Container Toolkit that allows you to share a single GPU across multiple Docker containers. So, as long as your AI and Plex are in containers not VMs that would allow you to share a single NVIDIA GPU across those workloads. That could also be done with AMD or Intel GPUs, but the mechanics would be a bit different and not have an easy button.

You didn’t say (perhaps intentionally) whether you are looking at the HL15 Least or Beast. With the HL15 2.0 you may be looking at something like an A4000 or A5000 (with a 180-degree power plug adapter). Whereas with the Beast you have a lot more GPU options. I think the questions are; do you want more than 16GB of VRAM and are you comfortable buying a GPU second hand? Unfortunately, GPUs that fit in the HL15 2.0 are generally limited at 16 GB, and even then those are older generation cards like the 4070 TI Super dual fan. Unless you are going for some sort of passively cooled server GPU. That’s not necessarily a bad thing technically–certain generation-over-generation improvements have been minimal–but just a general statement that consumer GPUs have been getting physically bigger with less VRAM. So it’s harder or impossible to find a 16GB 50-series card that will fit in an HL15 2.0.

My thoughts on the AI and GPU side of things…

This is something I am working on right now. I am working on trying to convert my HL15 into an actual home lab rather than just using it as a NAS. It’s got the room to be much more, and I am actively building a system much likes yours.

I will be stuffing my HL15 with as much flash storage as I can fit, running a similar series EPYC, and rocking a few GPUs. I talk more in the above linked post about the modified 4090 I will be installing.

No! Split them up! Use your big vRAM GPU for AI workloads and get a different GPU for Plex or any other smaller workloads! Use the NVIDIA Encode/Decode Support Matrix to find an inexpensive GPU that will allow you unlimited encode/decode options. I have been using a Quadro P2200 for a long time and have been able to run way more transcodes than I’ll ever need.

Ollama is ruthless and will eat as much RAM as it can get when running work load and it does not care about other processes using the card. If using both at the same time, it will hurt Plex performance, even if they use different parts of the GPU. It’s a bandwidth issue. However Plex will not hurt Ollama performance.

Yes and No. What is your priority? Do you want to run GPT OSS 120b all the time but have no issue waiting 5-10 min for your response running some Tesla P40s for a fraction of the cost? Then more vRAM might be more important to you.

If you are expecting Chat Gippity speed results, then you’ll need at least 1-2 3090s going. From my direct testing you can run at least GPT OSS 120 on one 3090 BUT INFRANCE ONLY! You cannot do a model that large and do any additional features like web search, image generation, etc. But buy fast they are dramatically increasing in price as they get scooped up! If you’re a baller then get a Modified 4090 with 48GB of VRAM or better yet a first party NVIDIA Pro 5000 or 6000 cards. Then you can start telling people: “I’m running models you’ve never heard of, on GPUs you can’t afford.”

Buy what you can afford. As long as you have good cooling they will all last a long time. There is hardly a difference between commercial focused cards vs the consumer cards these days.

r/LocalLLaMA

Anything you build now will be obsolete in 2 - 5 years. Plan for obsolescence. The whole reason I am re-building my labs right now is because I started with dual 3090s to get as much VRAM as possible and between speed and model complexity, especially now that you can create skills for agents, is already hitting the limit of 48GB of VRAM.

1 Like