Qwen2.5: A Party of Foundation Models!

brucethemoose@lemmy.world · edit-2 14 hours ago

Yeah.

People turn their nose at this, but devs have to develop for windows. If they can give their users a better experience targeting Proton, with less time and more refinement and better support than a native port, that’s a-okay with me.

A hilarious situation would be linux superseding Windows for desktop gaming… And Proton still being the standard target. I would love that future.

brucethemoose@lemmy.world · 14 hours ago

Whoops, yeah, should have linked the blog.

I didn’t want to link the individual models because I’m not sure hybrid or pure transformers is better?

brucethemoose@lemmy.world · edit-2 16 hours ago

It’s horrendously worse, just look at the TPS on the same save.

But specifically, I used the Dub’s Performance Analyzer frametime graphs. It’s nice since it separates out rendering and simulation.

One note, I am on Nvidia. It’s possible AMD (or Intel?) cards would behave differently.

brucethemoose@lemmy.world · edit-2 16 hours ago

Tested with rocketman, performance fish and performance optimizer. And modded in general, on a big colony save.

It wasn’t super recent though, not 1.5. But should still be applicable, I suspect.

brucethemoose@lemmy.world · edit-2 16 hours ago

Depends on the game.

Linux-native Rimworld and Stellaris are (by my measurements) 1.5x-2x slower than Windows. Not by pure FPS, but by simulation speed, which is much more detrimental. The frametimes spikes are awful, tool.

Running them though Proton seems fine, but they still aren’t any faster.

Modded minecraft and Starsector are the opposite. Old java games freaking love linux, apparently.

For reference, I’m running CachyOS (a distro focused on optimization) and used game-native measurement tools.

brucethemoose@lemmy.world · edit-2 17 hours ago

Gemini 1.5 used to be the best long context model around, by far.

Gemini Flash Thinking from earlier this year was very good for its speed/price, but it regressed a ton.

Gemini 1.5 Pro is literally better than the new 2.0 Pro in some of my tests, especially long-context ones. I dunno what happened there, but yes, they probably overtuned it or something.

brucethemoose@lemmy.world · edit-2 17 hours ago

For local LLMs, this is an issue because it breaks your prompt cache and slows things down, without a specific tiny model to “categorize” text… which few have really worked on.

I don’t think the corporate APIs or UIs even do this. You are not wrong, but it’s just not done for some reason.

It could be that the trainers don’t realize its an issue. For instance, “0.5-0.7” is the recommended range for Deepseek R1, but I find much lower or slightly higher is far better, depending on the category and other sampling parameters.

brucethemoose@lemmy.world · edit-2 16 hours ago

Lemmy is understandably sympathetic to self-hosted AI, but I get chewed out or even banned literally anywhere else.

In one fandom (the Avatar fandom), there used to be enthusiasm for a “community enhancement” of the original show since the official DVD/Blu-ray looks awful. Years later in a new thread, I don’t even mention the word “AI,” just the idea of restoration, and I got bombed and threadlocked for the mere tangential implication.

brucethemoose@lemmy.world · edit-2 17 hours ago

Temperature isn’t even “creativity” per say, it’s more a band-aid to patch looping and dryness in long responses.
Lower temperature is much better with modern sampling algorithms, E.G., MinP, DRY, maybe dynamic temperature like mirostat and such. Ideally, structure output, too. Unfortunately, corporate APIs usually don’t offer this.
It can be mitigated with finetuning against looping/repetition/slop, but most models are the opposite, massively overtuning on their own output which “inbreeds” the model.
And yes, domain specific queries are best. Basically the user needs separate prompt boxes for coding, summaries, creative suggestions and such each with their own tuned settings (and ideally tuned models). You are right, this is a much better idea than offering a temperature knob to the user, but… most UIs don’t even do this for some reason?

What I am getting at is this is not a problem companies seem interested in solving.They want to treat the users as idiots without the attention span to even categorize their question.

brucethemoose@lemmy.world · edit-2 17 hours ago

Zonos just came out, seems sick:

https://huggingface.co/Zyphra

There are also some “native” tts LLMs like GLM 9B, which “capture” more information in the output than pure text input.

brucethemoose@lemmy.world · edit-2 17 hours ago

What temperature and sampling settings? Which models?

I’ve noticed that the AI giants seem to be encouraging “AI ignorance,” as they just want you to use their stupid subscription app without questioning it, instead of understanding how the tools works under the hood. They also default to bad, cheap models.

I find my local thinking models (FuseAI, Arcee, or Deepseek 32B 5bpw at the moment) are quite good at summarization at a low temperature, which is not what these UIs default to, and I get to use better sampling algorithms than any of the corporate APis. Same with “affordable” flagship API models (like base Deepseek, not R1). But small Gemini/OpenAI API models are crap, especially with default sampling, and Gemini 2.0 in particular seems to have regressed.

My point is that LLMs as locally hosted tools you understand the mechanics/limitations of are neat, but how corporations present them as magic cloud oracles is like everything wrong with tech enshittification and crypto-bro type hype in one package.

brucethemoose@lemmy.world · 19 hours ago

Ugh, you know what I mean, just like nod or something if it feels appropriate. It doesn’t have to be “hi” the form of “lets start a conversation.”

brucethemoose@lemmy.world · 1 day ago

Uh, I feel like you are missing a ton of context.

Relentless heckling is a thing, so it’s understandable that this is a touchy subject.
Appearance is also more tied to a person’s perception in society. It’s like telling someone “Hey, you look wealthy today! Good job making money!” Not like commenting on a casual hobby.
Even taking the violin or sports example, wording it like “good on you for putting in the effort” would still sound very condescending.

brucethemoose@lemmy.world · edit-2 3 days ago

Uh, I wouldn’t comment on passing strangers like that, especially not wording it like “so good for you putting in the effort.” The issue of randomly bringing up their appearance aside, it sounds condescending.

Like… just say hi.

brucethemoose@lemmy.world · edit-2 4 days ago

Yeah.

I mean, people aren’t perfect, they’re hypocritical, but I draw the line at loudly preaching hypocriticism, and would want a chewing out if I was this person.

brucethemoose@lemmy.world · edit-2 6 days ago

You joke, but the pro version won’t be far. The Pro 4090 (aka the RTX 6000 ada)was already $7000 MSRP, and the pro 5090 is rumored to have far more VRAM.

brucethemoose@lemmy.world · 9 days ago

You know what I meant, by no one I mean “a large majority of users.”

brucethemoose@lemmy.world · edit-2 9 days ago

The bigger problem is AI “ignorance,” and it’s not just Facebook. I’ve reported more than one Lemmy post the user naively sourced from ChatGPT or Gemini and took as fact.

No one understands how LLMs work, not even on a basic level. Can’t blame them, seeing how they’re shoved down everyone’s throats as opaque products, or straight up social experiments like Facebook.

…Are we all screwed? Is the future a trippy information wasteland? All this seems to be getting worse and worse, and everyone in charge is pouring gasoline on it.

brucethemoose@lemmy.world · edit-2 5 months ago

Qwen2.5: A Party of Foundation Models!