I wanted to quantify the endless "which knife should I buy" debates on r/chefknives, so I built a data analysis pipeline to get some real answers.
The project is a 5-phase system built with Node.js. It first uses Fuse.js for fast, typo-tolerant fuzzy matching of ~450 known brands and ~8,700 models. The remaining text is then passed to an LLM (via OpenRouter) for discovering new, unknown entities and performing sentiment analysis on every mention. I ran it on over 1,000 threads, totaling more than 25,000 comments.
A few interesting findings:
The Underdog: Budget-friendly Tojiro has a massive 27-to-1 positive-to-negative mention ratio.
The Controversy King: Shun is by far the most polarizing brand, sparking strong love/hate discussions (59 positive vs. 24 negative mentions).
The Unloved: Dalstrong was one of the few brands to receive more negative mentions than positive.
The system isn't perfect—I'm open about a critical entity aggregation bug in the write-up. The full technical architecture, results, and raw data are available.
I'm here to answer any questions!
Blog Post (full story & visualizations): https://new.knife.day/blog/we-analyzed-25000-reddit-comments...
GitHub (technical breakdown & raw data): https://github.com/pvijeh/reddit-named-entity-recognition/bl...
Original Reddit Discussion: https://www.reddit.com/r/chefknives/comments/1o2p363/i_analy...
Comments URL: https://news.ycombinator.com/item?id=45629572
Points: 7
# Comments: 0