Post-transformer inference: 224× compression of Llama-70B with improved accuracy

Article URL: https://zenodo.org/records/17873275

Comments URL: https://news.ycombinator.com/item?id=46212969

Points: 59

# Comments: 16