Measuring Outcomes inChildhood Disability; What Do We Want to Know, and How Do We Find Out?

Peter Rosenbaum, M.D.,F.R.C.P.(C)

Measures of function are widely used in developmental pediatric practice, and are generally relied upon to perform one or more of three distinct and separate tasks. Thus, the first important consideration for the prospective user of a measure is to be clear about what purpose they have in mind in wanting to quantify function, and to be sure the measure they have chosen can perform that task. This presentation will first address ways of thinking about the measurement process (using examples from the measurement of motor function and movement 2). Second, we will discuss the range of outcomes that should be considered as we try to evaluate the impact of our clinical services for children and youth with disabilities 3.

Measures may discriminate among people with and without a characteristic (usually using screening devices such as the Denver Developmental Screening test); or among people with varying degrees of a characteristic (such as the progress of motor development as evaluated with the motor subscale of the Bay ley Scales of Infant Development, or the level of particular motor skills as assessed with the Bruininks-Oseretsky Test). These measures are usually norm-referenced, meaning they have been developed on "normal" populations, and are designed to assess individual performance against a standard. Results are often reported by expressing the degree of difference of a subject's score form the population mean (as a proportion of the "standard deviation").

Measures may be used to predict some aspect of concurrent (or more often future) status, such as ambulatory capability based upon specific features of motor performance at a young age (as exemplified by Bleck's measure). Predictive measures must be validated against a clearly defined "gold standard" of reference, and the performance of such measures usually described by features such as sensitivity, and positive and negative predictive values.

Evaluative measures are expected to assess and quantify change over time in some aspect of function, such as gross or fine motor skills performance (as illustrated by the Gross Motor Function Measure or the Pediatric Index of Disability Inventory). This type of measure is often a criterion-referenced instrument. meaning that the expected (criterion) performance of each item in the measure is defined, with several gradations of partial performance for each item specified on on ordinal scale. Over time the reevaluated performance on the same items defines the degree of change in that aspect of function Such measures are relatively easy to create, but difficult to validate, and rely on evidence of" construct" validity collected through testing of hypotheses about how the measure ought to perform if it is with it purports to measure 4.

Having found a measure that apparently does the task(s) required (discrimination, prediction or evaluation), the would-be user must then assess the measurement (or "psychometric") properties of the instrument 5. Reliability refers to consistency of scores or results with the measure on repeated uses, when the characteristic being measured has in reality not changed. This is important because the smaller the error in measurement form one time to another (when the results should be identical), the easier it should be to detect a "real" difference if one exists from time-to-time (and this of course is especially relevant when looking for change over time with an evaluative measure). Other important aspects of reliability include test-retest and inter-rater stability.

Validity can be thought of as the extent to which a measure measures what it is supposed to measure! Put another way - what inferences will one be comfortable to make with the results of this measurement? It is vital to have evidence that the measure is valid for the purpose for which it is being used!. As has been argued elsewhere 6, the fact that a measure is a good discriminator does not automatically qualify it as a predictive or evaluative instrument. While the measure may be capable of performing these other function, one needs evidence to support the use of the instrument for those other purposes. In other words, validity is specific to a particular function, and is not automatically transferable from one function to another.

The clinical usefulness of measures thus depends upon choosing the right type of instrument, one with appropriate properties, and applying it in the standardized manner described the guidelines for that instrument The analogy of clinical measures with tools in a tool box may make the point clear. All hammers are percussive implements that work on the principle of a lever. There are, however, vast differences in the structure, materials , function and capabilities of a reflex hammer, a sledge hammer, and the hammers attached to a piano keyboard. Their applications are clearly not interchangeable, yet they all function in a somewhat similar manner - and all are "hammers".

The most apparent measurement need in developmental medicine is for measures that can validly capture change in function over time. It is essential for us to be able to assess whether our treatment do more good than harm, and we therefore need measures with a demonstrated "responsiveness" to real change. We believe 6 that the ability to detect changes in function after some interventions may occur because the measures used were either insensitive to the nature and degree of change actually produced, or were incapable of detecting change at all, despite their other measurement properties.

In parallel with a concern about how to measure is the need to decide what to measure. Using the World Health Organization's model of impairment-disability-handicap 7, we should consider the impact of our treatments and services on various dimensions of function of the individual and the family. The challenge to clinicians and researchers alike is to formulate clear hypotheses about how and where (in the ICIDH model 7) our interventions may be working; how a change in one aspect of the model affects dimensions of function; and, ultimately, what are the relevant outcomes (functional, technical and psychosocial) to target. Reference will be made to ways of measuring health-related quality of life 8,9, to illustrate additional ways of thinking about these important issues It is hoped that this review of principles regarding structure, function, properties and uses of clinical measures will be helpful to clinicians and researchers alike, by outlining some of the important considerations in choosing, using, interpreting and reporting results of the measurement of clinical function.

Professor of Pediatrics McMaster University Hamilton, Ontario.;National Health Scientist Investigator, Neurodevelopmental Clinical Research Unit

Kirshner B, Guyatt GH A methodologic framework for assessing health indices. J Chron Dis 1985; 38:27-36Rosenbaum P. Clinically based outcome measures in cerbal palsy. In: Sussman, M (ed.) The Diplegic Child: Evaluation and Management. Park ride IL: American Academy of Orthpedic Surgeons, pgs. 125-132, 1992Goldberg MJ. Measuring outcomes in cerebal palsy. J Ped Orthop 1991; 11(5): 682-685Russell D, Rosenbaum P, Cadman al. The Gross Motor Function Measure: a means to evaluate the effects of physical therapy Dev Med Child Neuroll 989; 31:34-352Guyatt GH, Kirshner B, Jaeschke R. Measuring health status: what are the necessary properties. J Clin Epi 1992; 1341-1345Rosenbaum P. Russell D, Cadman D et al. Issues in measuring change in motor function in children with cerebral palsy: a special communication. Physther 1990; 70: 125-131International Classification of Impairments, Disabilities and Handicaps. Geneva: World Health Organization, 1980Saigal, S., Rosenbaum, P, Stoskopf, B.m Hoult, L., Furlong, W., Feeny, D., Burrows, E. and Torrance G. Comprehensive assessemnt of the health status of extremely low birthweight children at eight years of age: comparison with a reference group. Journal of Pediatrics, 125,411-417,1994Saigal, S., D., Furlong, W., Rosenbaum, P., Burrows, E. and Torrance, G Comparison of the health-related quality of life of extremely low birthweight children and a reference group of children at age eight years. Journal of Pediatrics, 125,418-425,1994.Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. Oxford University Press, 1989.