Data Science Pipeline "First Mile and Last Mile problems are like the dark matter of Data Science. They get 20% of the attention but are responsible for 80% of the outcome." John Tukey considered by many as the father of Data Science, famously said the following about the importance of solving the "right question". "Far better an approximate answer to the right question, which is often vague, than…
Keep Reading →
The First Mile Problem
In data science, the highest value is beyond the first mile. Yet most projects and teams spend excessive resources solving "first mile" problems.
First Mile Services
We specialize in offering the following "first mile" services so that you can focus on solving your "real" problem.
EDA provides a quick bird's-eye view of a dataset. It enables early discovery of patterns & anomalies while testing hypothesis and checking assumptions. We understand your requirements and share the EDA results so that you can validate early on of whether you are asking the right questions.
When it comes to data analysis, “garbage in produces garbage out.” We help you avoid this pitfall by cleaning and curating your messy data so that you get precisely the data that you want.
We transform data into a structure and format that caters to your specific requirements. This includes working with unstructured data and data that is scattered across various platforms, files & data sources.
In the real world, data flow is continuous and prone to change. Managing this requires a robust data pipeline. We manage this data pipeline for you and keep the data flowing with real-time data that is cleaned, organized, and ready for you to use.
We consolidate data silos into one simple source of truth and work with you to build insightful dashboards that track, analyze and visualize data that is critical to you.
Should you allocate your valuable time and resources in pursuing the project on hand? We help you answer this critical question by providing you sufficient evidence that will help you make an informed & timely decision.
Latest stories
15 LLM Challenges 1. Data Privacy Naturally, one of the biggest concerns that a user of an LLM like "ChatGPT" has is that of Data Privacy. Some common questions users often have are: Is data submitted used to train and improve the model? How long is user data stored on servers? How are the companies behind these chatbots complying with GDPR / CCPA / HIPAA laws? How much "Personal Identifiable…
Keep Reading →
In this post, we will look at the key results from a paper published by a group of Google researchers titled "Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI". So why is this a big deal and why should you pay attention? Let's look at some examples of real-world data cascades quoted directly from the paper: "Eye disease detection models, trained on noise…
Keep Reading →
Gartner's 4 Stage Data Maturity Model Gartner's 4-Stage Data Maturity Model This is a simplistic model that is focused solely on an organization's . There is a lot to unpack from this simple graph: and are about the . and are about the . Value lies in predicting the future based on past data but this is also error-prone and hard. Let's take a finance example where one is analyzing the…
Keep Reading →
Background Streamlit is a very popular open source framework that pitches itself as a pure python framework to build and share Data Web Apps in minutes with no front-end experience needed. Snowflake - a popular cloud computing company acquired Streamlit in March of 2022 for $800 million. A closer look at this acquisition gives us some insights about the merits of the framework, the future…
Keep Reading →
Five Key Ideas About Large Language Models 1. Biomimicry Biomimicry is the practice of imitating life. It involves looking to nature for inspiration and direction to solve complex human problems. So why does this work? Well, if you think about it, nature has been constantly evolving ever since life first appeared on earth some 3.8 billion years ago. Can there be a better and proven source of…
Keep Reading →
Polya Problem Solving Framework When presented with any problem, it is very natural to go head-on into problem-solving mode. However, this is not always the most optimal strategy. Here's why: You may be solving a problem that has already been solved efficiently. You may not be aware of the second order effects and side-effects of your solution. You may be solving the wrong problem. Your…
Keep Reading →
So why exactly is JSON so popular? JSON (JavaScript Object Notation) has several advantages as seen below. JSON Advantages JSON originated from JavaScript object literals as defined by the ECMAScript Programming Language Standard. The ECMAScript standard facilitated interoperability of web pages across different web browsers. Consequently, JSON quickly became the de-facto data interchange format…
Keep Reading →