Sigma-GPTs: A New Approach to Autoregressive Models

Sigma-GPTs: A New Approach to Autoregressive Models

Tunadorable

55 лет назад

2,888 Просмотров

https://arxiv.org/abs/2404.09562

Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
https://patreon.com/Tunadorable
https://account.venmo.com/u/tunadorable

Discuss this stuff with other Tunadorks on Discord
https://discord.gg/64fWcSDGsJ

All my other links
https://linktr.ee/tunadorable
Ссылки и html тэги не поддерживаются


Комментарии:

@The9thDoctor
@The9thDoctor - 24.07.2024 14:07

I love seeing papers that just take these just out there ideas and see what's there to find

Ответить
@narutouzumaki2157
@narutouzumaki2157 - 24.07.2024 14:25

Noice😊

Ответить
@kevon217
@kevon217 - 24.07.2024 14:57

Love the Pac-Man-esque figure. Cool method here.

Ответить
@technokicksyourass
@technokicksyourass - 24.07.2024 21:28

The reason for aircraft altitude rate prediction is that aircraft must file a flight plan which is a trajectory they must fly. In addition, aircraft equipped with ADS will send a 3 point trajectory segment to the ground station, their current position, next and and next + 1. So infilling is relevant to this use case I guess.

Ответить
@thorvaldspear
@thorvaldspear - 24.07.2024 22:45

They are messing with us at this point with names like that

Ответить
@islandfireballkill
@islandfireballkill - 25.07.2024 03:29

If you look at "What algorthims can transformers learn" by Hattie Zhou, you will find that some tasks like addition are vastly improved in generalization just by generating output tokens in reverse order. (Because standard addition with carry actually is a right to left algorithm).
This could have implications for reasoning and code generation skills.

Ответить
@GNARGNARHEAD
@GNARGNARHEAD - 25.07.2024 05:56

I'm really struggling to appreciate how obfuscating the information makes it more effecting at modeling the global view 🤔why not just train against a hidden A* path?

oh and the code videos sound fun

Ответить
@nomadicsynth
@nomadicsynth - 25.07.2024 07:10

PURPLE!!!

Ответить
@Dedjkeorrn42
@Dedjkeorrn42 - 25.07.2024 08:15

What the sigma?

Ответить
@jesperlauridsen4538
@jesperlauridsen4538 - 25.07.2024 13:35

Cool stuff, and I like the channel. I've been reading this paper on pi-PrimeNovo. It's in the context of mass spec proteomics, so the data can be difficult to understand. But their method is promising, and they get dramatic improvements over existing methods. Would love to hear your take on it!

Ответить
@revimfadli4666
@revimfadli4666 - 28.07.2024 01:18

Reminds me of the transformer/attention permutation invariance that David Ha used for reinforcement learning

Ответить
@tornyu
@tornyu - 29.07.2024 14:10

Did they mention whether their technique reduces information squashing, like you covered in that other paper this week?

Ответить
@youngman5890
@youngman5890 - 02.08.2024 19:41

what the sigma

Ответить