June 11, 2023
4 min read
The "PPDAC" problem-solving cycle is a handy framework to formally apply the rigor of the "Scientific Method" to your Data Science Problem. Any specific statistical technique can be seen as one small component of this complete end-to-end cycle of problem-solving.
The key thing to observe in the image above is that the entire process is iterative
i.e. we iterate through the entire PPDAC cycle as well as within each step of the cycle.
The framework was developed by R. J. MacKay and R. W. Oldford. Most recently it was popularized by David Spiegelhalter in his book "The Art of Statistics: Learning from Data".
As always the first step is to understand and define the problem you are trying to solve.
Here are some questions to help you think about the problem at hand:
While it may be tempting to get started with analysis as soon as you have the data, having a well thought out design plan for your study can save you plenty of time & rework.
Here are some questions to help you think about your plan:
At the heart of your "Data Science" problem is the data itself.
Here are some questions to help you think about your data:
How was or how will the data be collected?
How can you improve the quality of your data?
How do you plan to apply the following "first mile services"?
This is arguably the most interesting phase of the PPDAC cycle.
Here are some questions to help you think about your analysis process:
Did you find and resolve the following classes of errors?
Did you avoid common psychological fallacies? See our blog post for more on this.
Did you use one or more of the following "first mile services"?
Have you labelled, classified and sorted your data appropriately?
Are you using the appropriate data structure to represent your data such as tables, charts, graphs etc.?
What patterns do you see?
What hypothesis can you generate?
The last step of the PPDAC cycle is to finally answer the question that you set out to answer and communicate it to your audience/stakeholders.
Here are some broad questions to help you think about your conclusions:
At the end of a single iteration of the PPDAC cycle, you usually end up with a set of conclusions. This naturally gives rise to more questions and so the PPDAC cycle begins again for the next iteration.
It is likely that for a seasoned Data Scientist, most of these steps are second nature and intuitive. However, it is still highly beneficial to use this as one of the formal frameworks to solve your data science problem. This way you can also communicate more clearly with your teammates and never lose sight of the bigger picture.
On a final note, you may also find the following related article helpful: "Ten Simple Rules For Effective Statistical Practice".