The nuances of data projects add an extra layer of challenge to project planning and agile project management approaches. Even the most efficient data team must manage the inherent moving-target nature of data work; the unique communication needs that go alongside that; and the tradeoffs between rigor, curiosity, and urgency.
Nuances of Data Projects
Discovery, exploratory, or otherwise open-ended projects—projects where a data team is uniquely suited to make an impact—don’t always have a clear cut destination. The goal can be clearly defined (e.g. ‘What’s the root cause of this trend, and what can we do about it?’), but it’s not clear whether we’ll answer that question by investigating our first hypothesis or our twelfth.
Sometimes, the team’s ability to achieve the outcome depends upon the nature of the data itself, which often requires a fair amount of work to investigate. Sometimes, what you thought was the end of the project is really the beginning of another project. Sometimes, you just have to stop, because the incremental value of more time won’t make for richer insights.
These moving targets so common in data work can be especially difficult to communicate cross-functionally, because most other teams tend to have a pretty straightforward task completion process. A marketing team is likely to plan a campaign, agree on an approach, set up the campaign, and run it. Even in engineering, where there can be large amounts of uncertainty in how long a project might take, a team can count on clearly defined acceptance criteria for completion—the new feature works as intended, or it doesn’t. In a world where anyone can open up a spreadsheet and do ad hoc analysis, data teams have to take special care to explain why a seemingly-straightforward question may be unanswerable or the value of taking a little extra time to make something reusable or repeatable.
To further add to the project management challenge, attributes which make data teams so valuable—curiosity, rigor, and skepticism—can lead a project astray when they go unchecked. Curiosity combined with open-ended tasks can be a map to some pretty deep rabbit holes. We know or seek out the limitations of our data, and can be hesitant to take a strong stance that we have found “the answer”. That leads us to dig deeper to reassure ourselves that we haven’t missed the edge cases or that we understand all the limitations we need to caveat.
One way to guide your planning process and to manage expectations is to understand the differences between linear and circular projects. We first came across the distinction in Edwin Thoen’s Agile Data Science with R book (Thoen focuses on data science, but it’s applicable to a wider range of data projects).
A linear project is relatively easy to scope and closed-ended, with clear, objective acceptance criteria. There are lots of examples of linear projects in software engineering, like adding a button to a page or converting a hard-coded field to a configurable variable. Some data work is also very linear, like building a dashboard using a well-understood data set. Perhaps unsurprisingly, a lot of what we consider data engineering and analytics engineering are linear in nature (e.g. add `this_column` to `this_table` via existing ELT tools, or build a mechanism for data freshness alerts). When you talk to stakeholders, most of them tend to have a linear mental model – they expect a project to have a clear beginning and end and to be relatively predictable in scope. Agile approaches work reasonably well with linear projects.
A circular project, on the other hand, is open-ended and iterative, with later phases that depend on outcomes that aren’t known at the outset. These types of projects may have fuzzy acceptance criteria, like understanding why a trend in the business has changed. Most painfully to many of us, some questions just can’t be answered with the data at hand—but because of the ever-changing nature of the datasets we work with, you can’t know that outcome until you’ve put a fair amount of work into trying to answer it. Many Agile processes break down with circular projects.
When you are working on a project like “build a model to predict X,” you have a hypothesis up front about what features are likely to be predictive. However, a lot of work is required to confirm or reject that hypothesis; after a couple of weeks you may find that you’re no closer to the goal than when you started. You can then start over with a new hypothesis or decide the project isn’t viable and stop working on it. Both of those outcomes can feel very unsatisfying to both analysts and stakeholders.
Similarly, what feels to stakeholders like a basic analytical question can follow a circular path. Your leadership team asks, “why were sales down last month?” After investigating the usual suspects, you don’t have a great lead. While you’ve ruled out a lot of obvious causes, you haven’t found a silver bullet. You go back to the beginning, form a new hypothesis, investigate it, and find nothing conclusive. When working on circular projects, you constantly have to decide – is this critical to the success of this project, is it a potential opportunity, or is it just a rabbit hole?
Forming hypotheses, testing them, and iterating – that sounds an awful lot like the scientific method! Data tasks are circular when we’re actually creating knowledge about the world we operate in – when we’re on the research side of analytics. A reasonably well-understood code base behaves in predictable ways, so engineers can have a high level of confidence in the feasibility and difficulty of a project before embarking. The world outside that code base is significantly less predictable. Trying to sort out the root cause of trend changes, predicting human behavior, leveraging a large and diverse data set for basically anything – these can quickly move into uncharted waters.
One reason why planning data projects and managing stakeholder expectations can be difficult is the differing natures of linear and circular projects. Many of us have spent a lot of time communicating how data teams need to operate more like engineering teams and hiring for more engineering-focused roles. If we’ve been successful, stakeholders have a better understanding of the processes necessary to maintain data quality and reliability and how a data team’s skillsets complement each other. However, if we’re too successful, they can start to expect a similar level of throughput certainty to what you would see in software engineering, even though many data tasks are fundamentally different.
In our next post, we’ll dig into how to apply this knowledge to improve your team’s day-to-day work. In the meantime, how does your team break down projects for planning? Do you think in these linear and circular terms? Come have a chat in our Slack channel, or join us August 18 at 3pm EDT for a live conversation!