By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Unbiasing your Service Line Material Prediction Model

Woman across the table from 2 men talking

As water systems continue to work towards achieving Lead & Copper Rule Revisions (LCRR) compliance, they might consider using predictive modeling to reduce unknowns in their inventory and find lead service lines more efficiently. Predictive modeling can be a powerful tool to support water systems on their LCRR compliance journey, but to be reliable, predictive models need to be built on representative data that does not contain bias. For example, building a model with field verifications only from a certain neighborhood or from homes built during a certain decade would be biased. Unbiasing your predictive model begins with getting a data set of field verifications representative of the larger water system. But don’t worry… if you already have some field verifications from prior fieldwork, you don’t necessarily need to start from zero.  

When field verification data does exist

If your water system already has a data set of existing field verifications, statistical methods can be used to determine the degree of bias in this information and how representative it is of the unknowns in your water system. Even if your existing data set does contain some bias, it may be valuable to begin predictive modeling using available field data, to support near-term planning and prioritization activities while at the same time, performing additional field verifications to improve the long-term, system-wide reliability of predictive modeling in future iterations. In this case, be aware of potential bias in the model and make decisions about using the model’s results with this in mind.

A real example of predictive modeling

Trinnex® is performing predictive modeling for a water system in the northeastern United States, using our leadCAST® Predict solution. The water system has a sizeable number of field verifications from water main and meter replacements carried out over the last five years. As the map below shows, these verifications are mostly located within a couple of areas.  

Although these verifications are not representative of the larger water system, statistical testing confirmed they are representative of the neighborhoods targeted for initial fieldwork (shown in the map below). In other words, we expect the predictive model to be reliable in the neighborhoods targeted for initial fieldwork.

inspection density map

When no existing field verifications exist

If your water system does not already have a data set of field verifications to support predictive modeling, statistical methods can be used to identify an initial list of properties to inspect that is representative with respect to attributes that are associated with service line material, such as property age, neighborhood, and demographics. This representative data set is the starting point for estimating the proportion of various types of materials in the system and predicting service line materials for other locations.

For example, here is a clip of another water system in the northeastern United States that does not yet have any existing field verifications to build a predictive model. We used the Inspection Optimizer feature in leadCAST Predict to generate a representative list of properties to inspect, which are displayed on an interactive map. This water system has over 100,000 service connections that span over several historically acquired systems.

To ensure the predictive model can deal with this nuance and variability, we generated a representative verification set from each of the historically acquired water systems that make up the larger water system of over 100,000 service connections. The data from these field verifications will provide a sound starting point for estimating the proportion of materials in the unknowns, and for building a model that is able to generalize its predictions in a reliable and responsible way across the diverse system.

How to get started with predictive modeling:

When you’re ready to get started with predictive modeling to reduce unknowns in your inventory and/or find lead service lines more efficiently, we recommend taking this approach:

  1. Determine if an existing data set of field-verified service line materials exists
  1. Assess statistical quality of inspected service line data. Is it representative of the unknown services in your water system?
  1. Identify additional properties to inspect for a representative data set, to prevent hidden bias
  1. Perform recommended field inspections to obtain a representative data set
  1. Use findings from representative data set to estimate proportion of materials in your unknowns, and evaluate accuracy of historical records
  1. Build initial model to predict service line materials at individual properties
  1. Prioritize field inspections targeting properties with the highest probability of lead (to find and remove lead faster), and properties where the model hasn’t made a strong classification yet (to help reduce unknowns)

To summarize, the three main objectives you should achieve with predictive modeling are to support budgeting and planning, to save time and resources, and to protect public health.  

Ready to take the next step?

Whether you aim to use predictive modeling to demonstrate that there is no lead or galvanized requiring replacement in your unknowns, or to find and replace these materials more efficiently, the process starts with a representative set of field-verified service line materials. Trinnex’s predictive modeling and water industry experts can help you devise a strategy that fits your organization. Reach out today!

Share post on
linkedIn
twitter
Written by
linkedIn
Shervin Khazaeli, PhD
Data Analytics Developer
|
He/him
Shervin has a PhD in artificial intelligence focused on probabilistic decision-making and is the lead data scientist for leadCAST Predict.
linkedIn
Katie Deheer, MS, MBA
Product Leader & Analytics Consultant
|
She/Her
Katie has over 12 years of experience implementing innovative tech solutions. Outside of work, Katie loves yoga & outdoor family adventures.

Subscribe to our newsletter

Insights from our experts can be yours, totally free. Join our monthly newsletter with one click.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.