Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For in-editor like copilot you can try this locally - https://github.com/smallcloudai/refact

This works well for me except the 15B+ don't run fast enough on a 4090 - hopefully exllama supports non-llama models, or maybe it'll support CodeLLaMa already I'm not sure.

For general chat testing/usage this works pretty well with lots of options - https://github.com/oobabooga/text-generation-webui/



>This works well for me except the 15B+ don't run fast enough on a 4090

I assume quantized models will run a lot better. TheBloke already seems like he's on it.

https://huggingface.co/TheBloke/CodeLlama-13B-fp16


Unfortunately what I tested was StarCoder 4bit. We really need exllama which should make even 30b viable from what I can tell.

Because codellama is llama based it may just work possibly?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: