Building a Hybrid AI, Plex, and Virtualization Home Server (Looking for GPU Advice)

My thoughts on the AI and GPU side of things…

This is something I am working on right now. I am working on trying to convert my HL15 into an actual home lab rather than just using it as a NAS. It’s got the room to be much more, and I am actively building a system much likes yours.

I will be stuffing my HL15 with as much flash storage as I can fit, running a similar series EPYC, and rocking a few GPUs. I talk more in the above linked post about the modified 4090 I will be installing.

No! Split them up! Use your big vRAM GPU for AI workloads and get a different GPU for Plex or any other smaller workloads! Use the NVIDIA Encode/Decode Support Matrix to find an inexpensive GPU that will allow you unlimited encode/decode options. I have been using a Quadro P2200 for a long time and have been able to run way more transcodes than I’ll ever need.

Ollama is ruthless and will eat as much RAM as it can get when running work load and it does not care about other processes using the card. If using both at the same time, it will hurt Plex performance, even if they use different parts of the GPU. It’s a bandwidth issue. However Plex will not hurt Ollama performance.

Yes and No. What is your priority? Do you want to run GPT OSS 120b all the time but have no issue waiting 5-10 min for your response running some Tesla P40s for a fraction of the cost? Then more vRAM might be more important to you.

If you are expecting Chat Gippity speed results, then you’ll need at least 1-2 3090s going. From my direct testing you can run at least GPT OSS 120 on one 3090 BUT INFRANCE ONLY! You cannot do a model that large and do any additional features like web search, image generation, etc. But buy fast they are dramatically increasing in price as they get scooped up! If you’re a baller then get a Modified 4090 with 48GB of VRAM or better yet a first party NVIDIA Pro 5000 or 6000 cards. Then you can start telling people: “I’m running models you’ve never heard of, on GPUs you can’t afford.”

Buy what you can afford. As long as you have good cooling they will all last a long time. There is hardly a difference between commercial focused cards vs the consumer cards these days.

r/LocalLLaMA

Anything you build now will be obsolete in 2 - 5 years. Plan for obsolescence. The whole reason I am re-building my labs right now is because I started with dual 3090s to get as much VRAM as possible and between speed and model complexity, especially now that you can create skills for agents, is already hitting the limit of 48GB of VRAM.

1 Like