Hey HN, cofounder of Artie here. I’ve been working on real-time database replication using CDC (Postgres/MongoDB into Snowflake, BigQuery, Redshift) with my wife for the last three years. Last time I posted here, people had to book a call with us to get access, but that’s no longer the case. You can connect your source and destination and start streaming immediately.
I encountered this problem firsthand as a heavy data warehouse user at prior jobs. Our warehouse data was always lagged and analytics were always stale. The most visceral version of this today: imagine an AI agent making decisions – on pricing, support routing, risk scoring – off a data warehouse that's 3-12 hours behind.
When we started, I thought the hard part was reading the WAL. The real problems:
Schema drift: CDC events carry row data but not column metadata, so when an engineer adds a column in prod, events with that column start arriving at the destination before you've run ALTER TABLE. In this case, you wouldn’t get an error – you would just silently drop data.
Backfill race conditions: the typical approach (snapshot first, then start CDC) means by the time your snapshot finishes on a large table, the stream has moved on. If you stitch them together wrong, you overwrite newer data with older snapshots.
Kafka offset commits: this sounds obvious but they’re difficult to execute. You can only commit after a successful merge into the destination, or you double-write on replay. Partial failures across a distributed system compound this quickly.
TOAST columns: Postgres omits unchanged TOAST columns (large text/JSON/bytea – think JSONB config fields, long descriptions, binary blobs) from WAL events entirely for storage optimization. A naive pipeline reads ‘missing’ as ‘set to null’ and silently wipes valid data, which can mean a customer's entire config blob gets wiped out because an unrelated column on the same row got updated. The fix is merge logic that treats absent columns as ‘don't touch’ rather than ‘set to null,’ which breaks most off-the-shelf UPSERT patterns.
Curious whether others have hit these same walls building in-house, and would love feedback.
Comments URL: https://news.ycombinator.com/item?id=48464686
Points: 1
# Comments: 0