@theunknownmuncher

theunknownmuncher@lemmy.world · 8 days ago

on an iPhone

Lol. No

theunknownmuncher@lemmy.world · 3 months ago

You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file

theunknownmuncher@lemmy.world · 3 months ago

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching

theunknownmuncher@lemmy.world · edit-2 3 months ago

Can you try setting the num_ctx and num_predict using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter

theunknownmuncher@lemmy.world · edit-2 3 months ago

Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization

I have no problems with changing num_ctx or num_predict

theunknownmuncher@lemmy.world · 3 months ago

Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit

theunknownmuncher@lemmy.world · edit-2 3 months ago

Ummm… did you try /set parameter num_ctx # and /set parameter num_predict #? Are you using a model that actually supports the context length that you desire…?

theunknownmuncher@lemmy.world · 3 months ago

That’s great! Hopefully it shows up on F-Droid sometime soon

theunknownmuncher@lemmy.world · 3 months ago

Lmao “I didn’t install the Google Play Store so my phone is now a minimalist dumbphone” ???

theunknownmuncher@lemmy.world · 3 months ago

There is a jellyfin app for common smart TV platforms

theunknownmuncher@lemmy.world · 3 months ago

https://html.duckduckgo.com/html/?q=%s

theunknownmuncher@lemmy.world · edit-2 3 months ago

Meanwhile the GOP regularly losing their shit about “woke” corporations. Good try though

Also, not if P1 is a fascist, no, absolutely not.

theunknownmuncher@lemmy.world · edit-2 3 months ago

Make maintainable changes to the services you use and your behavior/habits related to privacy. Go at a gradual pace that won’t interupt your daily life.

Worry about things that you can control. You can only do your best. It doesn’t have to be perfect. You might not be 100% secure and private, but that doesn’t mean you have to make it easy to be tracked.

Start with low hanging fruit and easy changes.

Like switching web browsers; installing adblocking and privacy oriented addons (jShelter is a good one); switching to a more private search engine; setting privacy settings in apps and services; using strong, unique passwords and a password manager; replacing more and more software that you use with FOSS alternatives; use a good VPN.

If you’re ready for it, get or build a NAS and self host instead of using cloud-based services. Set up a pihole server for network-level protection from trackers and ads.

theunknownmuncher@lemmy.world · 3 months ago

Well shit, I was about to make the switch to Proton for email… glad I’ve been lazy about it, now. I’ll figure something else out, mail-in-a-box looks like what I really want anyway

theunknownmuncher@lemmy.world · edit-2 4 months ago

Weird, with just jShelter alone, I get “Your browser has a randomized fingerprint” on both desktop and mobile. Firefox browser

theunknownmuncher@lemmy.world · 4 months ago

My guess is an x86 32bit machine

theunknownmuncher@lemmy.world · 4 months ago

4690k was solid! Mine is retired, though. Now I selfhost on ARM