The Tyranny of Metrics

(Originally published December 2018)

Child welfare managers in this state sometimes say (to me) that, in their offices, “it's all about the numbers.” What they mean is that managers above them in the chain of command are mainly concerned with how local child welfare offices rank on performance indicators, i.e., measures of safety, permanency and well being.  In this regard, child welfare agencies are much like schools, law enforcement agencies, hospitals and the military.  Performance indicators function similarly to grades in educational systems, i.e., they have an outsized influence on reputation, status and future prospects in the organization.


Jerry Muller's book, The Tyranny of Metrics (2018) is about the misuse of performance

measures in both business and government.  This book does not contain child welfare examples, but everything Muller has to say about measuring performance applies to child welfare. He writes:


            “… gaming the metrics occurs in every realm: in policing, in primary,

             secondary and higher education; in medicine, in nonprofit organizations

             and, of course, in business. And gaming is only one class of problems that

             inevitably arise when using performance measures as the basis of reward

             or sanction. … what gets measured may have no relationship to what we

             really want to know … The things that get measured may draw effort from

             the things we really care about. And measurement may provide us with

             distorted knowledge – knowledge that seems solid but is actually deceptive.” 


Muller goes on to say that “this book is not about the evils of measuring. It is about the unintended negative consequences of trying to substitute standardized measures of performance for personal judgment based on experience. The problem is not measurement, but excessive measurement and inappropriate measurement, not metrics

but metric fixation.”


Measuring child safety


One of peculiar feature of US child welfare systems is their stubborn refusal to develop adequate measures of child safety, despite much talk about data driven practice, evidenced based practices and a focus on outcomes. Child welfare agencies mainly employ a single measure of child maltreatment recurrence, i.e., the percentage of children with a substantiated or indicated CPS report who have a subsequent substantiated allegation of maltreatment within 12 months of the initial report, to measure their CPS performance with birth parents. However, this measure is easy to “game”, for example, by classifying some subsequent reports on open cases as “information only”,  or by requiring managerial review of some types of substantiation decisions.  Furthermore, caseworkers often have good reason (from their perspective) to refrain from substantiating reports when parents are cooperative with in-home service plans, or because a single substantiated report – regardless of severity-- could prevent a parent from employment in certain kinds of jobs; or because they are reluctant to spend hours in a hearing with an administrative law judge who has a reputation for overturning an office's substantiation decisions.  On the other hand, caseworkers are more likely to substantiate the allegations in a CPS report (as Brett Drake has argued) when they are considering filing legal action on behalf of a child. In these cases,  substantiation is part of a legal strategy; the same factual information combined with a decision to forgo legal action might lead to an unfounded investigative finding.



For these reasons, an agency's recurrence rate is as much a reflection of the pressures brought to bear on caseworkers, and of caseworkers' beliefs regarding the best use of CPS “findings” and their case plans, as it is of recurrent maltreatment. Experienced child welfare practitioners, scholars and leaders often share these views regarding the unreliability of substantiation decisions, but nevertheless defend current measurement practices because: (1) “it's all we've got” or (2) “who cares, if use of the measure

advances our agenda” or (3) “the feds require use of this measure and we can't keep adding measures to satisfy critics.” Child welfare managers, policymakers and most child advocates are not scholars or data analysts.  In addition, decision makers have been educated for 16-20 years (or more) in schools which used test scores and grades as indicators of academic merit. The point of academic competition is not to critique tests or other ways of evaluating students, but to achieve the top rank on whatever test or written assignment is used to assign grades. In other words, child welfare managers have been socialized to view evaluation as a competitive game with arbitrary rules. The point of the game is to acquire preeminence or to avoid social embarrassment, not to develop better measures of agency performance.


In previous Sounding Board's, I have argued that (a) no one stand alone outcome measure of child safety can be sufficient because, (b) child safety is a multidimensional concept that includes both severity and chronicity of child maltreatment and, (c) compromised child safety involves cumulative developmental and emotional harm, as well as immediate danger.  When child welfare agencies lack adequate measures of child safety, it is impossible to know whether these agencies' child protection programs are improving, doing worse or staying pretty much the same over time in fulfilling their mission. It is still possible to have well informed opinions regarding CPS effectiveness, but it is not possible to base these opinions on widely used measures. Process measures, for example, measures of CPS response time or compliance with procedural requirements, are not a substitute for outcome measures. 


Performance measurement can distort practice


It's one thing to use inadequate measures that shed little or no light on the effectiveness of programs; it's another order of dysfunction to use measures that distort and worsen practice. Fans of The Wire, the outstanding five year TV series regarding Baltimore police, politicians, drug gangs, schools and newspapers, will remember the reluctance of police officials to investigate or even acknowledge murders in which numerous corpses were dumped in boarded up abandoned houses due to their concern with the effect of these investigations on the department's “clearance” rate. High stakes testing in public schools has led to “gaming” and occasional corruption, and has also undermined education through a narrow focus on standardized test scores.


 In child welfare agencies around the country, there have been many examples of a single minded managerial focus on performance “targets” endangering or harming children. Common performance targets include:


  • Reducing the number of foster care placements by x percentage

  • Increasing adoptions by some arbitrary number

  • Reducing or eliminating the number of children/youth in placements who are not legally permanent regardless of their placement histories

  • Reducing the percentage of children in residential care to x percentage of all children in foster care


As a rule, it's a good thing and a worthy goal for child welfare agencies to reduce the number of children in foster care or residential care, the number of children awaiting adoption, and the number of children without legally permanent families. However, when these organizational goals are translated into an arbitrary target number or percentage, and when state agencies, regions and offices who meet these targets are held up as exemplars of “best practice”,  child welfare leaders have created incentives for poor decision making that ignores dangers and risks. The single minded pursuit of a numerical goal greatly increases the likelihood that caseworkers, supervisors and middle managers will throw caution to the winds, dispense with critical thinking and refuse to consider worst case scenarios.


Why is the single minded pursuit of numerical targets in child protection or child welfare a formula for trouble in both practice and policy? First and foremost, there are many exceptions to any rule, for example the guideline that all children in foster care need legal permanency. In some cases of youth with multiple placement histories, relational permanency and a stable foster care placement are more important than risking a move to create a legally permanent guardianship or adoption. Experienced decision makers with a child's best interest in mind must be able to balance the goal of legal permanency with a youth's more pressing needs. For some youth, a stable placement may be more important than a legally permanent plan. 


Some children need to be in foster care or residential care. No policymaker or scholar knows what the right percentage of residential care is in a foster care system, though in Washington State it's apparent that the answer to this question is not 5%, given current foster care resources.  Child welfare decision making benefits from the permission and capacity to balance conflicting goals, to recognize the need for exceptions to the rule and to refrain from setting arbitrary numerical targets in law or policy that have the potential to create havoc.


How to improve measurement and use of measurement in child welfare


1. Be cautious about creating high stakes measures for ranking units or offices, or to reward or punish managers. High stake measures will always be gamed. Muller writes that:


                 “almost inevitably, many people become adept at manipulating          

                  performance indicators, many of which are ultimately

                  dysfunctional for their organizations. They fudge the data or

                  deal only cases that will improve performance indicators.

                  They fail to report negative instances. In extreme cases, they fabricate

                  the evidence.”


The results of high stakes measurement should not taken at their face value without checking with experts involved in developing the measures and professionals who enter data in the organization's computer system. For example, a common strategy used by public child welfare agencies for creating the appearance of a reduction in child maltreatment fatalities is to change the rules for classifying child deaths as “neglect related”, or by delaying the classification of marginal cases for lengthy periods of time. Practitioners, middle managers and data analysts usually know whether the data contained in a widely disseminated report is meaningful or mainly due to manipulation of the measure.


As a rule, descriptive measures of agency functioning which are not used to evaluate performance are more useful than any measure used to reward and punish caseworker, unit or office performance, more useful because more accurate. 

2. Combine quantitative measures with qualitative measures that gather the     views of children, parents and professionals outside the child welfare agency. 

No public welfare agency should solely depend on a substantiated rate of child maltreatment for children in out-of-home care to evaluate the safety of foster care. Periodically (e.g., every two years), school age foster children and children in unlicensed kinship should be asked about their experiences and feelings regarding the care they have received.


One of the strengths of studies of differential response systems has been the insistence

on asking parents for their views of the services they have received, and if their family is better off as a result of agency involvement. Most studies of children, parents, foster parents, or the child welfare workforce, benefit from the addition of qualitative information from the subjects of the study. 


3. Child welfare agencies can learn from prevention programs and from scholars regarding proxy measures of child safety. For example, prevention research has used measures of emergency room visits for young children enrolled in home visitation programs,  a useful and revealing measure that could be utilized in child protection programs.


Emily Putnam-Hornstein's research on child maltreatment fatalities has circumvented

debates on classification of neglect-related child deaths by using measures of injury related deaths, an imperfect but reasonably reliable proxy. 


4. Include caseworkers, supervisors and middle managers in the development of measures designed to increase timely feedback regarding key decisions. Muller is impressed with hospital based research at The Cleveland Clinic that used     teams of physicians and administrators to develop measures focused on improving the quality of patient care. He writes:


                   “The metrics of performance are neither imposed or evaluated

                   from above by administrators devoid of first hand knowledge.

                  They are based on collaboration and peer review.”


Involving front line service providers in decision making “secured their buy in and made success more likely.” Muller adds that: 


“Measurements are more likely to be meaningful when

they are developed from the bottom up...”   Measurement,

like other management tools, “will work to the extent that

the people being measured believe in its worth,” he asserts.


Much of what Muller has to say in The Tyranny of Metrics is addressed to the leaders of organizations. However, this author also has advice for practitioners who have little say regarding measures or data input:


              “… you face a choice. If you believe in the goals for which the

         information is being collected, then your challenge is to provide

         accurate information in the most efficient way possible...”

         If, by contrast, you believe that the goals are dubious and the

         process wasteful, you might try to convince your superiors of

         that … If that fails, then your task is to provide data in a way that

         takes the least time, meets minimal standards of acceptability, and

         won't harm your unit."


    Muller ends his book with a comment that child welfare leaders and scholars should ponder:


        Many matters of importance are too subject to judgment and

        interpretation to be solved by standardized metrics.  Ultimately,

        the issue is not one of metrics vs. judgment but of metrics informing

        judgment, which includes knowing how much weight to give to

        metrics, … their characteristic distortions and appreciating what can't

        be measured.”


    Dee Wilson



    2018 Sounding Boards


        January – Dissecting US Foster Care Systems

        February – Foster Care for Young Children: Outcomes and Costs

        March – The Torture of Children

        April – The Power of Stories

        May – Models of Excellence in Child Welfare: Three Offices

        June – Collaborations That Work

        July – Bullying and Intimidation in Child Welfare Management

        August – Servant Leadership in Child Welfare

        September – Defining Child Neglect

        October – Effects of Differential Response on Child Protection

        November – Evaluating Differential Response

        December – The Tyranny of Metrics


Past Sounding Board commentaries are available at


© 2020 by Dee Wilson Consulting. Proudly created with