The Use of Predictive Analytics in
Allegheny County’s Child Protection System
(Originally published January 2023)
By Dee Wilson and Toni Sebastian
In 2016, the Pennsylvania Department of Human Services in Allegheny County began using the Allegheny Family Screening Tool (AFST), a predictive risk modelling tool that incorporates hundreds of data elements available in public records to inform screening of General Protective Services (GPS) reports.
In Pennsylvania, child protection reports are divided into two categories:
1) CPS reports of mostly physical and sexual abuse but also (oddly, given that CPS is an abuse statute) reports of severe neglect. These reports are not subject to screen out.
2) GPS reports of less severe neglect and also of risk factors for child maltreatment such as substance abuse, caregivers’ behavioral health problems, domestic violence, children’s behavioral health problems, homelessness, and truancy/educational neglect. County child welfare agencies have a large degree of discretion in screening of GPS reports which outnumber CPS reports by approximately 4-1.
Pennsylvania’s division of CPS reports into two categories, i.e., a) abuse/severe neglect and, b) less severe neglect/risk factors is unique in the U.S. and creates an unusual opportunity for prevention/early intervention services, and for child welfare involvement with families absent a specific allegation of child maltreatment. For the past several years, county child welfare agencies in Pennsylvania have screened out approximately 47% of GPS referrals. (Annual Child Protective Services Report, 2016 & 2021)
The use of an algorithm with hundreds of data elements that was difficult to explain to the public to inform (or automate) screening decisions led to a furious outcry from many child welfare critics. Social justice minded critics viewed the use of predictive tools as a means of increasing unwanted intrusion of child welfare agencies into the lives of low-income families, with the potential of increasing racial disparities throughout child welfare systems. One critic, Virginia Eubanks, published Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor (2018) in which she argued that algorithms which utilize public benefit data would almost certainly increase the number and percentage of low-income families involved with CPS, and likely increase racial disparities in child welfare systems already characterized by extreme racial disproportionality.
The developers of AFST and Allegheny County child welfare managers have (to their credit) responded rationally to these attacks which in one instance referred to algorithms as the “nuclear weapon” of child welfare. (Wexler, 2018) Child welfare managers answered Eubank’s criticism of the algorithm’s use of public benefit data with the retort that in 45% of GPS reports in which parents had received public benefits, the AFST score was lowered rather than increased, suggesting that the algorithm reduced the AFST score when families received some types of public benefits.
In addition, Allegheny County allowed an independent study of the effects of AFST on screening accuracy, racial disparities at intake, workload impact, and screener consistency, by two scholars from Stanford. This study of AFST-1 was completed in 2019. Despite its encouraging findings (which we discuss below) the attacks on AFST continued following the announcement of a new screening tool for predicting foster care placement within two years of a GPS report. In addition, the U.S. Department of Justice recently announced an investigation of AFST use of child disability data as a possible civil rights violation.
Round 1 of the AFST debate has given way to Round 2, with the same acrimonious tone. In this commentary, we discuss both debates and reach conclusions different in some respects from critics, developers and scholars who studied AFST I. We arrive at more positive conclusions regarding AFST-2 which we will discuss in next month’s commentary.
Information re Allegheny County’s child welfare system
Some information regarding Allegheny County’s child welfare system is useful for understanding the debate over use of the AFST screening tool:
Prior to implementation of AFST in 2016, Allegheny County’s child welfare system received national recognition for developing an extraordinary array of family support services. The umbrella human services agency was able to do this because its former director, Marc Cherna, had an exceptional ability to leverage small amounts of county funds to tap federal and state funding streams, and to attract philanthropic support for innovative programs. More than most child welfare offices in the U.S., Allegheny County’s child welfare system has been able to assist families with a broad range of in-home support services.
Allegheny County’s child welfare system has racial disparities at both intake and foster care much larger than Pennsylvania agencies as a whole: 13% of the county’s population is Black vs. 44% of screened-in CPS/GPS reports and 46% of first entries into foster care. This extreme degree of racial disproportionality was present before implementation of AFST and has not changed since 2016, with the exception of a large increase in CPS reports and entries into care of “Non-Hispanic Two or More Races.” (3.6% of first entries in 2016 vs. 15.8% in 2020 according to a recent report, The State of Child Welfare)
Allegheny County has been strongly committed to support of kinship care and has had a kinship care rate much higher than the national average.
The number of first entries-into-care declined by almost 20% from 2016-20 during implementation of AFST, with most of the decline in placements of youth 13 and older.
Allegheny County’s GPS screen-out rate declined from 60% in 2016 to 47% in 2021, which is the average GPS screen-out rate in Pennsylvania. From 2016 to 2021, Allegheny County’s intake system received about 1,000 GPS reports per month.
Findings from an independent study of AFST-1
In March 2019, two Stanford University researchers, Jeremy Goldhaber-Fiebert and Lea Prince, completed an evaluation of the initial 15-17 months of implementation of AFST-I in Allegheny County from December 2016 through May 2018.
The evaluation tracked “screening accuracy” for GPS defined as (1) screened-in reports which resulted in further action beyond a CPS investigation or “no further action taken and a re-referral occurs within the 2-month time frame;” (2) screened out reports with no re-referral within 60 days; effects on workload and the consistency of screening decisions. “Further action” was defined as a case open for services or “connects to an open case…”
This evaluation described how two algorithmic scores (one for re-referral and one for placement) of 1 (lowest risk) to 20 (highest risk) were utilized by screeners: “An “auto screen in” occurs when the AFST score falls above 18 for the placement score.” Almost 25% of GPS reports were mandated screen-ins, while screeners could (and often did) ignore the algorithmic score for other reports with a lower AFST score. Given the amount of screener discretion permitted in AFST 1, it is not surprising that there was no effect on screener consistency. According to child welfare managers in their reply to Eubanks, screeners ignored the AFST-1 score in 30% of reports, i.e., screened out a report with a high-risk score or screened in a report with a low-risk score. Giving screeners a large degree of discretion, is not “automating inequality.”
The evaluation found a modest increase in “screening accuracy” for screened-in GPS reports, i.e., reports which resulted in further action, most often families opened for services; but a small decline in screening accuracy for screened-out reports, i.e., the percentage of screened out families re-reported within 60 days. Goldhaber-Fiebert and Prince comment:
“Roughly 24 more children each month screen-in accurately after the AFST, with over half these children in the 7- to 12-year-old age range, and almost all these children in the white race group. Roughly 11 more children who screen out are done so inaccurately each month… with 2/3 of these children falling into the Black/African American race group.” The decline in accuracy of screen-out decisions was not statistically significant. And: “Roughly 53 fewer children … screen in each month (though this result is not statistically significant) with over half of these falling in the 13-17-year age range and -2/3 of these children in the Black race group.”
To summarize: the evaluation found a modest increase in screening accuracy of 24 GPS reports (from about 1000 reports) per month, with the largest effect on white families, and a small decline of 11 reports per month in screening accuracy for screened-out cases, with the largest effect on Black families. This is (to put it mildly) an underwhelming outcome for use of an algorithm with hundreds of data elements. Furthermore, the improvement in screening accuracy “attenuated,” i.e., decreased, over time, which usually occurs during implementation of practice models as staff interest in the model wanes and mangers give less attention to maintaining model integrity.
However, we question whether the modest improvement in screening accuracy found by the Stanford researchers is credible. The AFST score could have motivated caseworkers to open families for services, which might have been a good thing but would not be the result of more accurate screening practices. Similarly, a GPS report that was screened-out and then re-reported within two months might have been due to a persistent reporter determined to cause a CPS investigation, not because of an erroneous screening decision.
These modest results from use of an algorithmic tool suggest a lesson from the history of risk assessment in child protection: the predictive power of algorithmic tools has been undermined by the dubious logical frameworks within which they have been employed. The use of AFST-1 to measure “screening accuracy” is even more questionable than utilizing a risk assessment tool to predict multiple substantiations or CPS re-reports. These outcome measures are poor proxies for what they purport to predict, i.e., child maltreatment. To improve predictive accuracy regarding rereport or substantiation, public agencies would need an algorithm that predicts a mandated reporter’s response to child maltreatment, or a caseworker’s decision to substantiate or not substantiate child maltreatment. No such tool is available.
Effects of AFST-I on racial equity, low-income families, and workload
Goldhaber-Fiebert and Prince found “no large or consistent differences across racial/ethnic or age specific subgroups in these outcomes,” a conclusion which should be regarded as tentative given the large increase in use of “Non-Hispanic Two or More Races” classification during the early years of AFST implementation. The percentage of screened-in families from the poorest zip codes in the county did not increase, probably because receipt of public benefit data included in the algorithm reduced families’ AFST score in almost half of GPS reports.
The main effect of AFST-I was arguably “a halt in the downward trend in pre-implementation screen-ins,” a trend that became more pronounced from 2018-21. From 2016-21, the GPS screen-out rate in Allegheny County declined from 60% to 47%, (Annual Child Protective Services Report for Pennsylvania) and in so doing, brought more Black and low-income families into contact with child protection than would have occurred with a higher screen out rate.
The debate over ASFT-1 has been framed by some journalists as a public policy choice between use of a research based predictive tool
to identify families who need services vs. a social justice perspective that views CPS interventions as damaging to low-income families and racial minorities, especially Black and American Indian families. This is not a cogent framework for understanding AFST-1 outcomes. The algorithmic tool used in AFST-1 had modest predictive powers, possibly due to the questionable way of defining “screening accuracy.” The idea that “less child welfare involvement with families is better than more,” absent information regarding CPS and other child welfare services/interventions in a specific community is not a compelling social value; it’s a bias.
The public policy choice reflected in AFST is between two visions of child welfare reform: A child protection system that provides a wide array of prevention and early intervention services to families who need them the most vs. narrowing (possibly even eliminating) CPS involvement with families. Prior to AFST-1, Allegheny County was moving gradually toward the “less CPS is better than more” option. Implementation of AFST-1 moved the county’s child welfare agency in the opposite direction. We have not found an evaluation of the support services provided to families during AFST-I, though first entries-into-foster care declined by about 20% from 2016-20. Therefore, it is uncertain whether a decline in the GPS screen out rate helped or harmed families with CPS involvement in Allegheny County.
In November 2018, AFST-1 was replaced by a second algorithm that used a LASSO model, which according to Wikipedia were “introduced to improve the predictive accuracy and interpretability of regression models.” The use of AFST-2 focused on the highest and lowest risk reports: “For each child associated with a referral, the AFST predicts the risk that, if screened in for investigation, that child will experience a court ordered removal from their home within two years.” (Rittenhouse et al, 2022)
GPS reports with an AFST score of 18-20 and at least one child, 16 or younger, are automatically screened in for investigation unless a supervisor approves a screen out decision. For referrals with a score 1-10 and no children under the age of 12, “the screener sees a “Low Risk” protocol notification, again with no numeric score.” These reports are recommended to be screened out, but supervisory approval is not required to screen them in. For other reports with scores 1-17 and children less than 12 years old, the screener is given the numeric score but without a screening recommendation. Further, “the score is not seen outside the screening process,”, i.e., by investigators.
In other words, the AFST score has a strong influence on screening decisions for approximately 25% of high-risk GPS reports, less influence on GPS reports of low-risk adolescents, and an undetermined influence for a large percentage of GPS reports. The model uses an algorithmic score to predict court ordered out-of-home placement within two years, which is viewed as a proxy indicator of severe maltreatment that often includes injuries and hospitalizations.
Two model developers, Emily Putnam-Hornstein and Rhema Vaithianathan, have recently co-authored a study (Rittenhouse, et al, 2022) of the effect of AFST-2 on Black/White disparities at intake by applying a retrospective score of 1-20 to many thousands of GPS reports received prior to July 2019, and the score applied for reports received after July 2019. Their sample size was more than 60,000 reports.
This study compares Black/White racial disparities for GPS reports before and after AFST and compares changes in racial disparities for screened in GPS reports with trends in racial disparities for CPS reports, which in Pennsylvania are not subject to screening decisions. The study found that “Among all referrals, …the AFST had no significant effect on disparities in screening decisions. However, for referrals with the highest risk scores (19-20), … the AFST significantly reduced disparities in screening decisions by 9.6 percentage points, (or 98% of the preexisting gap) … Turning to downstream outcomes, … among screened in referrals, AFST reduced the disparity in case opening rates by 5.5 percentage points (87% of the preexisting gap) and reduced the disparity in 3-month removal rates by 2.9 percentage points (76% of the pre-existing gap).” Per the evaluation of AFST 1, the most likely reason for the reduction in racial disparities was an increased rate of screened-in reports and “cases open for services” for white families.
Neither the study of AFST 1 by Stanford researchers or developers’ study of AFST-2 explain why more high-risk white families were screened in following implementation of AFST. One possibility is that screeners had given up on helping a large group of families with long histories of neglect reports. Possibly, AFST 1 and 2 directed them to “try again” with these families.
Next month’s Sounding Board will discuss use of AFST-2 to predict emergency room visits, hospitalizations, and abusive injuries for the 5% of highest risk families in GPS reports, which we view as one of the most promising and potentially valuable uses of risk assessment in recent decades. ©
“Annual Child Protective Services Report 2016,” & “Child Protective Services 2021 Annual Report,” Pennsylvania Department of Human Services. The 2016 report is available online; the 2021 report can be accessed through Child Welfare Quick Links, Feb. 1, 2023.
Eubanks, V., Automating Inequality: How High- Tech Tools Profile, Police and Punish the Poor (2018), Macmillan, New York City.
Fitzgerald, R., “Response to Automating Inequality by Virginia Eubanks,” (2018). Available on Allegheny County website.
Goldhaber, J. & Prince, L., “Impact Evaluation of a Predictive Risk Modeling Tool for Allegheny County’s Child Welfare Office,” (2019), available online.
Hurly, D., “Can an Algorithm Tell When Kids Are in Danger?”, New York Times, Jan. 2, 2018.
Rittenhouse, K., Putnam-Hornstein, E. & Vaithianathan, R., “Algorithms, Humans and Racial Disparities in Child Protective Services: Evidence from the Allegheny Family Screening Tool,” (2022) contact Katherine Rittenhouse, firstname.lastname@example.org to request study.
“2021State Of Child Welfare: Navigating The Uncertainty Of The Pandemic To Strength The System,” (December 2021), Pennsylvania Partnerships for Children; available online.
Wexler, R., “Foster care apologists shouldn’t have nuclear weapons,” NCCPR Blog, June 18, 2018.