Introduction: LCRR Compliance, Predictive Modeling, and Bootstrapping
The lead and copper rule revision (LCRR) has been mandated to regulate the identification and replacement of lead service lines in water systems across the United States. In response to this, some states are permitting the use predictive models, like machine learning (ML), predict service line materials and build service line inventories. Once these models are made, we can measure how well they work by using a confusion matrix to get metrics like accuracy and recall. It’s crucial to get this right, because if we miss any lead in our water service lines (a false negative), people could be exposed to harmful lead levels. On the other hand, if we say there’s lead when there isn’t (a false positive), it could lead to additional costs due to unnecessary inspections or replacements. This is where a technique called bootstrapping helps.
Understanding ML Performance with Bootstrapping
Bootstrapping is a way to measure how well our model works by testing it many times, with different data each time. Here’s how it works: A test set is the subset of known data (i.e., previous field verifications) that is held out from training a machine learning model and used to test the model’s predictions. We take an item from our original test set, record it, and then put it back. We do this until we have a new set of data (called a bootstrap sample) that’s the same size as our original test set. Because we’re picking items at random, some may be picked more than once, and others may not be picked at all. We do this many times (sometimes even millions) to create multiple bootstrap samples. Then, for each of these bootstrap samples, we use our machine learning model to calculate metrics like accuracy and recall. This gives us lots of different values that we can use to understand the variability in our model performance metrics.
To illustrate, let’s say we have a test data set and we create 1,000 bootstrap samples, as shown in the figure below. The yellow line shows that initially, our predictive model was 96.75% accurate on the test data without using bootstrapping, i.e., single-point estimation. But how confident are we that it will be just as accurate with other, unseen data, such as the unknowns in the service line inventory?
Bootstrapping helps us answer this by giving us a range (distribution) of accuracy scores, not just one. This distribution is shown by the blue bars in the figure above, with each bar showing the number of times we got an accuracy score within a certain range. For instance, we got scores between 96.5% and 96.75% about 120 times. Note that, the blue curve shows an estimation of the bars in the form of a continuous distribution function for better illustration. The distribution gives us a better understanding of how our model might perform, and it tells us where the true accuracy is likely to fall, i.e., evaluating a confidence interval for a certain confidence level (usually 95%). For example, in the figure, confidence interval is identified by the two dashed red lines in the diagram indicating the lower bound 94.75% and the upper bound 98.25%. This implies that, we can expect the true accuracy of the predicted service line material for unknowns in the water system to be within this range (94.75% to 98.25%) 95% of the time.
Why is this Important?
Bootstrapping offers us a more reliable and unbiased way to assess our predictive model. In traditional testing, we use a single test set to evaluate our model. However, this approach might not fully expose how our model reacts to different kinds of data. Through bootstrapping, we generate numerous datasets, allowing us to examine our model’s performance under various conditions. This helps us anticipate how the model will perform with future, unseen data, increasing the reliability of its predictions for unknowns in the water system.
In addition, bootstrapping aligns with principles of responsible AI, which emphasize ethical, transparent, and accountable use of AI technologies. By using bootstrapping, we openly acknowledge and measure the uncertainty in our model’s predictions. This transparency is crucial for water systems and the communities they serve to understand the limitations of predictive models and make informed decisions based on the outputs.
Moreover, the results from bootstrapping can help in model improvement. By understanding how our model’s performance changes under different conditions, we gain insights into where and how to refine it. Such a well-established iterative process at Trinnex ensures that our predictive model continues to perform reliably and responsibly, even when faced with new or unexpected data.