We’ve all been there before. We inherit someone’s sql query, we crack it open, and we see a long winded case when statement that uses the like operator to categorize data. Something ugly like this:
And while this approach works, it’s hard to look at, hard to debug, and hard to maintain as new stuff needs to get added.
Pro Tip - If you don’t know this, the ordering of a case when statement matters; the statement will stop evaluating once it hits a condition that is true
Now, earlier on in my career, my naive side of me would scoff when I’d see queries like the one above. I’d be like “Why would anyone do something dumb like this??”
Fast forward to 2025, and I have developed a better understanding of why this happens and I have empathy. Here are some good reasons why you will see stuff like this:
The person did not have permissions to make tables on the production box - 99% of analysts fall into that category
The analyst inherited the query from a former coworker and was told to maintain it as needed; so they saw the path of least resistance to just tack on more to the case statement as new scenarios popped up
The analyst did not really know how to write sql and spent more of their days in excel and considered the sql an afterthought
I could go on with this list a mile or so, but will leave it for now.
Bottom Line
Have empathy for those you are trying to help. Don’t tear them down because they can’t craft a query as good as you can
Alright, I’m off the soap box. So now, how do we streamline this garble and make it something easier to maintain going forward?
Enter The Pattern Matching Lookup Table
Data engineers know what lookup tables are; you did it all the time with your star schema archs; but did you know, just because you normally join cleanly to a lookup table via something like fact.id = lk.id, it does’t mean you can’t join to it on fuzzy logic. So, how do we take this long winded case statement and turn it into a lookup table?
You will notice in this lookup table, we’ve got 2 columns going on here;
item_pattern - this is where the fuzzy logic matching will kick in; we have added wildcard % signs around each value
item_category - the actual cleaned category that this fuzzy logic corresponds to
Now Let’s Build Our Transaction Table So We Can Join
This is short and sweet; it creates a duckdb table called “txns” with some phony data so we can later join:
But How Does One Join To This?
Simple!
You will notice something in the join predicate. I’m using the “iLike” operator instead of the “like” operator…and why?
the iLike operator is case-insensitive; its truly one of many superpowers of the duck
You might remember that our earlier example at the beginning of this article with the long-winded case statement had to wrap the item description in a LOWER operator. With iLike, we don’t need to do any of those extra conversions to handle this, which is great.
Conclusion
This article demonstrated a strategy I’ve used for years when I encounter a script that has a long-winded case statement doing pattern matching. Is this the approach you should take with everything?
that’s up to you
My current rule is if i’ve got 10 or more case statements, then I will go build a pattern matching lookup table
Thanks for reading,
Matt