There’s been a lot of discussion lately about systems for doing version control for data. Most recently, Ryan Gross wrote a blog post “The Rise of DataOps” where he lays out how data version control is the most obvious next step in moving data pipelines from something that’s “maintained” to something that’s “engineered.” I enjoyed this blog post and I like many of the analogies that Gross draws, but I’ve encountered this idea of “git for data” in a
…