Have we been doing LLM inference wrong the whole time?!?!

Have we been doing LLM inference wrong the whole time?!?!

Tunadorable

1 месяц назад

6,258 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@davidfeldman2553
@davidfeldman2553 - 04.02.2025 06:30

Big fan of the subway surfers. Wouldn't have watched through the whole video without it.

Ответить
@NeoKailthas
@NeoKailthas - 04.02.2025 07:48

Does this produce more hallucinations or less

Ответить
@tylerdancey6085
@tylerdancey6085 - 04.02.2025 07:51

The subway surfers repeats so now I've memorized the paper as well as the sequence of movements of the subway surfer

Ответить
@bot.
@bot. - 04.02.2025 08:34

The subway surfers is a novel idea for such science oriented channels, although based on the vocal audience I think we can tell that it is not the best idea. Maybe replace it with some kind of slower, more relaxing visuals? I have a theory that due to how viewer attention should be focused on listening to best ingest such video content, the visuals need to be less stimulating than the audio itself. If stimulus A is more stimulating than stimulus B, then people will have a harder time paying attention to B, even if they want to. So yes, perhaps some slow visuals might be your best path forward to give viewers something unintrusive to look at.

That aside, I would like to point out how this hypertuning seems to be a rather promising candidate to be combined with reasoning models. I can see a potential human level reasoning LLM emerge simply by combining this paper with the recent publications made by DeepSeek, especially when you make the reasoning focus more on depth rather than breadth. (Perchance even a hypertuned 8b llama model could achieve such results once more optimisations get figured out? Who knows!)

As a kind of final note, I would just like to bring out how this seems like what "grokking" promised to deliver, but never really did. Overfitting to the point of ludicrous validation set performance increases.

All in all, while I am new here, you definitely earned a subscriber. Thank you for this good video! (By the way, kudos to you for engaging with the community to such an extent, glad to see you legitimately care for and talk with viewers!)

Ответить
@Teth47
@Teth47 - 04.02.2025 08:49

This is the zoomerest video I have ever seen. Thank you. I'm 32 and you've made me feel like a Victorian peasant transported into the modern day.

Ответить
@Koroistro
@Koroistro - 04.02.2025 09:40

Dude, as somebody with ADHD with the subway surfers thing I cannot look at the video, I can listen but the video is way too distracting to focus on the actual text you're reading.

Ответить
@Koroistro
@Koroistro - 04.02.2025 09:44

Overall this makes sense, as humans we definetly are hyperfitted on the information we experience.
The fact that there's no excessive repetition is however fascinating and surprising, I wonder what mechanism is at play there.

Ответить
@Bokbind
@Bokbind - 04.02.2025 10:47

I'm genuinely struggling to follow the main content because my attention keeps drifting to the mobile game...

Ответить
@DorthLous
@DorthLous - 04.02.2025 11:12

FFS, never do the subway surfer again...

Ответить
@yannickpezeu3419
@yannickpezeu3419 - 04.02.2025 12:58

This game has nothing to do here

Ответить
@SuperLazyCat
@SuperLazyCat - 04.02.2025 13:46

Can the guy stop talking I'm trying to watch subway surfers

Ответить
@MultiYlin
@MultiYlin - 04.02.2025 13:46

I test one interesting question to ChatGPT, Gemini, DeepSeek, Grok about a same question:
Here is how would you interpret this law based on logic/semantic/syntactics.

The statue allow for a child to be adopted without a written concern of the parent If the non concerning mother or father:
(a) has been adjudged guilty by a court of competent jurisdiction of cruelty abuse or mistreatment of the child; or
(b) Has been judicially deprived of parental rights and had parental rights terminated with respect to the child; or
(c) Who has willfully abandoned such child;
(d) if it is proven to the satisfaction of the court that set father or mother if able has not contributed to the support of the sad child during a period of one year immediately prior to the feeling of the petition for adoption
Taken from Anothony Scalia and Gardner Book: Interpreting Laws

The answer they all somehow ALL "hallucinates" to agree upon the logical representation as a+b+c+d (so they are all connected by OR; however, c and d is clearly not connected by OR). The highest possibility is that they are all trained on scientific writing which polysyndeton is discouraged and asyndeton is used very often.

Ответить
@HenryLoenwind
@HenryLoenwind - 04.02.2025 14:00

I think what's happening is that the model gets trained to produce a string of coherent tokens instead of a cloud of possible tokens with no clear path. So, it picks a token that provides a better path ahead instead of just listing everything that would be possible. In the latter case, the selection code has no idea what makes a 2.4%-probability token better than a 2.3%-probability one---or if it really is better. Shifting the responsibility to the AI, making it select which token is best, gives a better result.

Ответить
@hanyanglee9018
@hanyanglee9018 - 04.02.2025 15:49

isn't the game the sponsor? Or, is it?

Ответить
@brisonmondry712
@brisonmondry712 - 04.02.2025 15:59

Subway surfers: an icon of lazy, meaningless content and an icon of brainrot. Keep it if you think its fitting, i guess, but im not here to be insulted.

Ответить
@AllanSavolainen
@AllanSavolainen - 04.02.2025 16:09

Yeah the subway is very annoying, forced me to just listen to the video as it was distracting trying to read/see the paper while the video was playing on the side. Only way it could work is to have the video fullscreen with transparent text on top of it, otherwise it causes my eyes to look to the side.

Ответить
@ravimohankhanna4317
@ravimohankhanna4317 - 04.02.2025 16:32

Why didnt they test on some maths or coding bench? I think they are trying distract us from deepseek r1. If the results are so good then lets see the coding and maths bench results.

Ответить
@MeinDeutschkurs
@MeinDeutschkurs - 04.02.2025 21:54

I cannot watch longer because of the game. My autistic mind is distracted from the content. A pity.

Ответить
@quantumspark343
@quantumspark343 - 04.02.2025 21:55

subway surfers is ok, btw by the way you associated loss with certainty of output i kinda had the idea of usind some loss based method to give the models the ability to assess how certain they are about their output so solving hallucinations basically, idk how much sense it makes

Ответить
@Diamcreeper
@Diamcreeper - 04.02.2025 22:15

I put tape on the subway surfers part of my screen to be able to watch this properly xddd

Ответить
@Diamcreeper
@Diamcreeper - 04.02.2025 22:17

I wonder if this video will perform much better than average just because of all the people commenting about subway surfers

Ответить
@monkemode8128
@monkemode8128 - 04.02.2025 23:19

Personally, I'm a big fan of the fact that your videos are officially heavy because I'd like to listen to them while driving. Anyways, regarding the video, it seems like a mechanism like this would allow the model to better plan what it's going to say into the future.

Ответить
@alexjensen990
@alexjensen990 - 05.02.2025 00:50

Its called Groking... Pushing through overfitting into intuitive knowledge.

Ответить
@ATH42069
@ATH42069 - 05.02.2025 02:01

we can use AI tools to render diagrams of what you're talking about in your video instead of playing subway surfer

Ответить
@kimcosmos
@kimcosmos - 05.02.2025 04:39

Dude. Your ADD viewers who can handle it watch at double speed. Subway surfers is for boredom. So no please. Can't have it both ways. Slow ADD loves the surfers. Fast ADD only likes it when you are quoting statistics

Ответить
@timmygilbert4102
@timmygilbert4102 - 05.02.2025 04:54

With very low bit quantization, it push toward a bag of word like hypothesis, ie it operate on matched set

Ответить
@lukekhh
@lukekhh - 05.02.2025 05:55

Please for the love of god get rid of subway surfer

Ответить
@p1ugged
@p1ugged - 05.02.2025 06:37

loved the subway surfer stuff but i get why most wouldn't like it tho

Ответить
@CMonkeyRun
@CMonkeyRun - 05.02.2025 06:45

Wonder how this will work on nGPT, padding random tokens, etc

Ответить
@Ginto_O
@Ginto_O - 05.02.2025 13:09

Bro i cant read the paper while something is moving in the side. This is just disrespectful to viewer. This seems like a really interesting topic and i cant watch the video. Disliked, unsubscribed

Ответить
@loicgregoire3058
@loicgregoire3058 - 05.02.2025 13:57

Subway surfer is too distracting when you try to read the highlighted text

Ответить
@graham8316
@graham8316 - 05.02.2025 18:37

Subway surfer rips

Ответить
@alexxxcanz
@alexxxcanz - 05.02.2025 18:45

Do not put again the game. It makes hard to listen and understand. It makes me distracted

Ответить
@The9thDoctor
@The9thDoctor - 05.02.2025 21:02

please please please please do not do the subway surfers thing

Ответить
@GNARGNARHEAD
@GNARGNARHEAD - 05.02.2025 21:25

okay the Subway Surfers was way to nice 👌

also congrats on the video blowing up

Ответить
@wild1000022
@wild1000022 - 05.02.2025 23:58

No subways, but really interesting paper!

Ответить
@acasualviewer5861
@acasualviewer5861 - 06.02.2025 00:52

Ok... I hate that distracting animation. The objective of your videos is showing the paper, not distracting us w/videos. This isn't the second date update.

Ответить
@federicolois3344
@federicolois3344 - 06.02.2025 04:03

Very distracting... had to hear you and do something else instead of reading the paper.

Ответить
@eldoprano
@eldoprano - 06.02.2025 05:59

Boooo, booooooo 🍅🍅 boooooooooooo 🍅

Ответить
@overloader7900
@overloader7900 - 06.02.2025 06:47

what if instead of hyperfitting, we just directly add loss to all but top rank predictions of the model during backpropagation?

Ответить
@tornyu
@tornyu - 06.02.2025 08:56

Isn't this still "just" overfitting? I'd expect it to score highly for human preference because it looks like normal text (or images). I'd also expect the output to be uninteresting. ... Then again, maybe the test time input (e.g. prompt) is all you need to differentiate the output? So you get entropy from the world, not from the model itself 🤔

Ответить
@josephvictory9536
@josephvictory9536 - 06.02.2025 10:11

Incredible paper, and what a wildly unpredictable result.

There is so much weirdness here. There is obviously some principle behind WHY these are so unintuitive, and once we formalize that it might start making some sense.

Ответить
@gustavheinrich5565
@gustavheinrich5565 - 06.02.2025 10:16

I don't have TikTok brain, so... no subwaysurfer for me.
Moved that part of the window outside my monitor so I don't have to see it. Is that what the internet without Ad blocker looks like?

Ответить
@sayedammanakhtar1225
@sayedammanakhtar1225 - 06.02.2025 10:52

The subway surfers thing is quite distracting.

Ответить
@luke.perkin.online
@luke.perkin.online - 06.02.2025 12:02

My vote is that the crappy game screen is an awful distraction.

Ответить
@allurbase
@allurbase - 06.02.2025 13:51

The animation is a bit too distractive, maybe something moving slower, abstract colorful patterns slowly changing.

Ответить
@cornevanzyl5880
@cornevanzyl5880 - 06.02.2025 14:00

I almost cried laughing at the subway surfer addition. I guess it can work for some, but if you draw our attention to visual things in the paper, then you might not have to add it

Ответить
@seriosersimon3347
@seriosersimon3347 - 11.02.2025 23:32

Pls no subway surfers. Thats not a tiktok video for poorly concentrated 10 year old

Ответить
@hatacoyama1246
@hatacoyama1246 - 03.02.2025 20:59

Please make subway surfers the entire screen 😎🍦🤞🔥🔥🔥🔥

Ответить