March 18, 2026

47M Hacker News Items Free: The Dataset Every Indie Hacker Needs

If you have ever wanted to analyze what the tech community cares about, build a trend tracker, or train an AI model on real developer discussions—you just got a goldmine.

Hugging Face just released the complete Hacker News archive: 47,358,587 items spanning from October 2006 to today. And it is free.

What You Get

This is not a sample. This is the entire dataset:

The data includes titles, URLs, scores, authors, timestamps, text content, and parent/child relationships. Everything you need to build serious analytics.

Why This Matters for Indie Hackers

Here is what you can actually build with this:

Trend Analysis Tools

See which technologies are rising and falling over time. Track AI, crypto, SaaS, hardware—whatever topic you care about. The data goes back nearly two decades, so you can spot multi-year trends.

AI Training Data

Want to fine-tune a model on developer discussions? You have 47 million data points of exactly that. The quality is high—Hacker News is notoriously curated.

Product Inspiration Engine

Every Show HN is essentially a mini startup launch. Analyze what gets upvoted, what gets rejected, and what patterns lead to product-market fit.

Personalized News Aggregator

Build your own HN client with custom filters, topic clustering, or AI-summarized daily digests.

How to Use It

No downloading required. DuckDB can query Parquet files directly from Hugging Face:

SELECT count(*) FROM read_parquet('https://huggingface.co/datasets/open-index/hacker-news/resolve/main/2026/2026-03.parquet')

Or use the datasets library in Python:

from datasets import load_dataset ds = load_dataset("open-index/hacker-news", split="train")

The data is organized by month, making it easy to grab just what you need or scale up to the full archive.

The Hidden Value

This dataset has been sitting there for 19 years. Now it is accessible in a format that actually works. No scraping, no rate limiting, no cleanup.

For indie hackers, this is the kind of raw material that turns into side projects, MVPs, and sometimes actual businesses. The barrier to entry for data-driven products just dropped significantly.

Get the dataset here: huggingface.co/datasets/open-index/hacker-news


Want More Free Resources?