• 0 Posts
  • 55 Comments
Joined 1 year ago
cake
Cake day: July 23rd, 2023

help-circle








  • I’ve been testing Ollama in Docker/WSL with the idea that if I like it I’ll eventually move my GPU into my home server and get an upgrade for my gaming pc. When you run a model it has to load the whole thing into VRAM. I use the 8gb models so it takes 20-40 seconds to load the model and then each response is really fast after that and the GPU hit is pretty small. After I think five minutes by default it will unload the model to free up VRAM.

    Basically this means that you either need to wait a bit for the model to warm up or you need to extend that timeout so that it stays warm longer. That means that I cannot really use my GPU for anything else while the LLM is loaded.

    I haven’t tracked power usage, but besides the VRAM requirements it doesn’t seem too intensive on resources, but maybe I just haven’t done anything complex enough yet.











  • The last few years have been really bizarre. In 2019 it really felt like my org was moving away from Microsoft. I’d just retired Skype and we were moving over to this new Microsoft Teams thing but the executive team was asking me about moving to Google Apps and dropping Outlook/Exchange/Sharepoint entirely, maybe we expand our Slack usage too? Then Covid happened and Teams turned into essential infrastructure overnight.

    Fast forward a few years and the entire Microsoft experience is now basically built around a Teams-first strategy. It’s the main thing that my users care about and use on a daily basis. They want more things integrating with it and use it as a pathway into other Office products. Microsoft is making a real mess of things, but it’s kind of crazy how fast they pivoted to meet the new needs of their users and keep them locked in.