Hacker News


Channel's geo and language: not specified, English
Category: Technologies


Top stories from https://news.ycombinator.com (with 100+ score)
Contribute to the development here: https://github.com/phil-r/hackernewsbot
Also check https://t.me/designer_news
Contacts: @philr

Related channels  |  Similar channels

Channel's geo and language
not specified, English
Statistics
Posts filter


Cyc: History's Forgotten AI Project (Score: 156+ in 8 hours)

Link: https://readhacker.news/s/66bdc
Comments: https://readhacker.news/c/66bdc












Solving the minimum cut problem for undirected graphs (Score: 152+ in 11 hours)

Link: https://readhacker.news/s/669x5
Comments: https://readhacker.news/c/669x5






Show HN: Speeding up LLM inference 2x times (possibly) (Score: 156+ in 4 hours)

Link: https://readhacker.news/s/66aGf
Comments: https://readhacker.news/c/66aGf

Here's a project I've been working on for the last few months.
It's a new (I think) algorithm, that allows to adjust smoothly - and in real time - how many calculations you'd like to do during inference of an LLM model.
It seems that it's possible to do just 20-25% of weight multiplications instead of all of them, and still get good inference results.
I implemented it to run on M1/M2/M3 GPU. The mmul approximation itself can be pushed to run 2x fast before the quality of output collapses.
The inference speed is just a bit faster than Llama.cpp's, because the rest of implementation could be better, but with a better development I think it can be a new method to speed up inference - in addition to quantization.
You could call it ad-hoc model distillation :)
You can change the speed / accuracy of a model at will, in real time.
Oh, and as a side effect, the data format allows to also choose how much of the model you want to load into the memory. You can decide to skip say 10-20-40% of the least important weights.
It's implemented for Mistral, it was also tested slightly on Mixtral and Llama. It's for FP16 for now, but Q8 is in the works.
The algorithm is described here, and the implementation is open source.
https://kolinko.github.io/effort/
I know these are bold claims, but I hope they survive the scrutiny :)


Future of Humanity Institute shuts down (Score: 153+ in 5 hours)

Link: https://readhacker.news/s/66a7Q
Comments: https://readhacker.news/c/66a7Q


Show HN: Render audio waveforms to HTML canvas using WebGPU (Score: 150+ in 1 day)

Link: https://readhacker.news/c/6642Y

Hey HN. I built this quick and dirty component to render audio waveforms using WebGPU. I just published it to NPM.
It's the first time I use WebGPU and it's been a while since I write shaders. Feedback is very welcome!
GitHub: https://github.com/mrkev/webgpu-waveform
Examples: https://aykev.dev/webgpu-waveform

















20 last posts shown.