Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
woadwarrior01
5 days ago
|
parent
|
context
|
favorite
| on:
Show HN: Duplicate 3 layers in a 24B LLM, logical ...
Reminds me of Solar 10.7B, which was a very good model for its size ~2 year ago and the "Depth Up-Scaling" technique behind it. Although, that involved continued training after repeating the layers.
https://arxiv.org/abs/2312.15166
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://arxiv.org/abs/2312.15166