Hi!
This project is very useful for using with llama.cpp in python. However, it would be great if we could have some features even before the main llama.cpp has full support for them. By that I mean TurboQuant/RotorQuant and dFlash speculative decoding. It would make this entire library AMAZING to use.
Hi!
This project is very useful for using with llama.cpp in python. However, it would be great if we could have some features even before the main llama.cpp has full support for them. By that I mean TurboQuant/RotorQuant and dFlash speculative decoding. It would make this entire library AMAZING to use.