Airtable is fantastic—until you’re trying to organize 13,000 messy movie titles that are almost the same but just different enough to drive you mad. This is the story of what pushed me to build a tool to fix a problem that Airtable’s dedupe was never meant to handle. I’ll share the post that sparked the idea—and the bigger, messier problems I hope this tool can solve next.
It All Started With a Reddit Post…
One day, I stumbled across a humble Reddit post that totally pulled me in. The user was trying to migrate 13,000 videotape inventory records from a spreadsheet into Airtable. A perfect project for today’s world of no-code tools, right?
Well… not quite.
The problem? The data was messy. I’m talking different spellings, word orders flipped around, and extra info tossed into titles. Think:
- “John Smith’s World”
- “Jon Smith’s World”
- “John Smith World”
- “The World of John Smith”
- “John Smith’s World Final Version”
They all refer to the same movie—but each one’s written just a little differently. The Redditor just wanted all those entries grouped together so it was clear they were the same program.
The Quest for Perfectly Grouped Titles
Well, it didn’t take long before they realized that manually combing through every single record would be… let’s just say, a soul-crushing experience. So, like any sane person, they turned to software for help—and came across Airtable’s Dedupe extension. Big thanks to the Airtable team for building that—it’s genuinely a handy tool!
But unfortunately, it wasn’t the right fit here. Why not? Two main reasons. First, Dedupe needs the text to be very similar to spot duplicates. Second, it merges duplicates into a single record instead of keeping them all. And in this case, preserving the originals was kind of the whole point.
Sigh. It’s just not the right tool for this job.
Why did I decide to take this on?
That post stuck with me long after I first read it. That’s how compelling it was.
One reason it caught my attention was that it posed a genuinely interesting technical challenge. What kind of algorithm could match “The World of John Smith” with “John Smith’s World Final Version”—but not with something unrelated like “John Doe’s World Final Version”? Could this be a problem that vector embeddings could solve? I couldn’t stop thinking about it. (Spoiler: I gave that approach a shot, but it’s not the one I ended up using.)
Thankfully, one commenter shared a ChatGPT-generated solution to the problem—which I really appreciated. It was a thoughtful attempt, but seeing it made one thing clear: this wasn’t going to be a simple algorithm. The problem had more nuance than that.
And okay, I’ll admit it—part of what drew me in was that it had to do with movies. I’m a huge movie fan, so that made it extra fun to explore. But beyond that, the solution had the potential to be useful in all kinds of similar situations—not just for the handful of movie collectors out there using Airtable. (If you’re one of them, please say hi to me on X!)
The bigger picture is this: finding near-duplicates in messy data is a problem many Airtable users face. And as one commenter put it, “there’s not much in Airtable that would help with this.” It felt like a clear gap—one I might actually be able to fill.
So I started by building something that would help—just for movie collectors, at least to start. And when I finally got it working and saw the cleaned-up results? I was genuinely impressed.
Got Messy Titles?
If you’ve got a pile of similar-but-not-quite-the-same entries and you want help cleaning them up, hit me up at hello@bluepocket.ca. Got a table full of movie titles? Then Movie Title Grouper was literally made for you.
👉 Get started here!