Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Attention Residuals: Rethinking depth-wise aggregation [pdf] (github.com/moonshotai)
12 points by salkahfi 4 days ago | hide | past | favorite | 1 comment
 help



In [1] I think a commenter actually speculated about a design just like this, where later layers can directly access outputs of previous layers instead of having to store it in the residual stream

[1] https://news.ycombinator.com/item?id=46362579




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: