|
|
||||||||
Regular Article |
Received December 3, 1999; revised May 15, 2000; accepted June 9, 2000. From the Department of Psychiatry, University of Oslo, Norway. Address correspondence to Dr. Høglend, Department of Psychiatry, University of Oslo, P.O. Box 85, Vinderen, N-0319 Oslo, Norway.
| Abstract |
|---|
|
|
|---|
Key Words: Rating Scales Outcome Brief Psychotherapy
| Introduction |
|---|
|
|
|---|
Theory-related, or so called mode-specific, instruments for measuring dynamic changes, developed by pioneers such as Karush et al.,7 May and Dixon,8 Kernberg et al.,9 Bellak et al.,10 and Semrad et al.,11 have been criticized for being too abstract, cumbersome, or unreliable or too highly correlated with symptom measures.
Idiographic (individualized) methods developed by Malan,12 Luborsky,13 Horowitz,14 and Perry15 provide important clinical information with regard to limited areas of psychological functioning. However, individualized measures have weak psychometric properties for group designs. Methods for post-treatment change ratings have been developed by Sifneos16 and Sandell.17 Change estimates from ratings made after therapy tend to be too highly correlated with post-treatment status,18 and such ratings may also be difficult to compare across cases.4
Later developments of batteries of dynamic scales such as the Patterns of Individual Change Scales (PICS),19 Scales of Psychological Capacities (SPC),20,21 and Karolinska Psychodynamic Profile (KAPP)22 have scales with poor to excellent reliability, and some aspects of their validity have been tested. These batteries are quite comprehensive and have many scales. The scales have only three to seven descriptive levels, which may impair their sensitivity for changes. The PICS scales, which have seven descriptive levels, could not capture statistically significant changes during brief psychotherapy with ordinary neurotic patients.23
On the basis of 20 years of clinical and research experience with brief dynamic psychotherapy, we have developed a new set of dynamic scales. We have been influenced by the work of several of the above-mentioned research groups. Thus, resemblances are intentional.
Like most other batteries of dynamic scales, our scales do not measure personality traits or typologies. They describe internal predispositions, psychological resources, capacities, or aptitudes that can be mobilized by the individual in order to achieve adaptive functioning and life satisfaction. Unlike most other batteries, our rating scales cover the entire range of functioning, from superior to extremely poor. Our intention has been to make the scales "fine-grained" enough to capture reliable changes during brief dynamic psychotherapy.
The scale format has been modeled after the Global Assessment Scale (GAS),24 with ten descriptive levels and scale points ranging from 1 to 100. The use of a well-known scale format should make the scales easier to learn. The descriptive levels are linked as closely as possible to the way mainstream psychodynamically oriented clinicians interpret and work with clinically observable phenomena.
Value judgments, especially with regard to higher levels of functioning, are unavoidable with scales of this type. The decision to select five dimensions is based on clinical experience and literature. Psychoanalytic theories give limited assistance in the task of choosing dimensions. Our ambition has been to construct as few scales as possible and still maintain a reasonable comprehensiveness. Several related psychological resources have therefore been incorporated within the same scale.
The content validity and Guttman scale structure have been tested with Q-sort methodology performed by a large number of psychotherapists from Norway, Finland, and Germany.25 A few global scales with many descriptive anchor points are easy to use, and such scales have been demonstrated to be among the most powerful in detecting change.24 Several studies have indicated that global scales rated by experts may be equal and sometimes superior to test batteries with many subscales.26 However, reliability and predictive validity depend on the issue under study.
Current functioning within the last 3 to 4 months should be rated on the basis of a semistructured dynamic interview that includes interpersonal functioning, tolerance for affects, insight, and the capacity to handle both the ordinary vicissitudes of life and more challenging psychosocial stressors (problem-solving capacity). The five scales are described in Appendix A.
The present study tests the interrater reliability of the five scales, the reliability of change ratings, the discriminability from global functioning (Global Assessment of Functioning [GAF])27 and subjective distress (Global Severity Index [GSI] from the Symptom Checklist-90),28 and the scales' sensitivity for change during brief dynamic psychotherapy.
| METHODS |
|---|
|
|
|---|
Therapists
The therapists were 6 psychiatrists and 1 clinical psychologist. They all had long experience in practicing dynamic psychotherapy (range 1025 years). Eachtherapist worked in a different institution. They hadreceived formal education in psychoanalytic psychotherapy from four different training institutes. All of the therapists were also clinical evaluators. Because the group of raters had such long experience, no pilot training using the dynamic scales was offered, only didactic lessons.
Evaluation
After history-taking and assessment of background variables, each patient was interviewed by one clinician in the presence of two or three other clinicians. When necessary, several of the clinicians posed additional questions after the interview to ensure adequate coverage of the patient's level of functioning in all areas. The group interviews lasted 60 to 100 minutes and included some trial interpretations. Ratings on the five dynamic scales and GAF were done independently by each clinician, before any discussion of the case. Half of the assessments were done by clinicians who had not been present at the dynamic interview. Their assessments were based on the audiotapes from the interviews. The patients filled out the SCL-90-R along with many other self-reports. At present, 36 patients have been reevaluated one year after the start of therapy. Most of them were in treatment for about one year and had recently ended therapy. The therapies were manualized (P. Høglend, unpublished manuscript). Adherence checks were done on several occasions for each case (sessions 7 and 16 plus randomly drawn sessions) in order to secure treatment integrity; details have been published elsewhere.29 The patient and therapist could agree jointly to end therapy before one year if sufficient progress was achieved. The number of treatment sessions ranged from 28 to 40 (median=36).
Data Analysis
The seven raters assessed 50 patients before therapy and 36 of the same patients after therapy. Three of the raters have assessed all of the interviews, and four others have assessed varying numbers of interviews. This design allows several versions of intraclass reliability estimates to be calculated.30 Intraclass correlation coefficients (ICC) are derived from analysis of variance components. Because our design is unbalanced (i.e., all raters did not assess all subjects), we used restricted maximum likelihood approaches. And because we did not assume lack of rater bias, we chose a two-way analysis of variance, random model (random effect of rater, random effect of subject). Average pre- treatment scores on each scale were compared with average post-treatment scores, by use of paired t-tests, on the subsample of 36 patients evaluated before and after therapy. Rating of change is generally more unreliable than status ratings.30 Therefore, repeated-measures analysis of variance with time (pre- and post-treatment) and raters (the three raters with full data sets) as factors were performed on the same subsample in order to analyze in greater detail the differences between raters in assessing change. The interrater reliability of raw change and residual gain scores is also reported for a summary measure of the dynamic scales.
After the intercorrelation matrix of average scores on all pre-treatment variables had been examined, a factor analysis of the variables, with maximum likelihood extraction, was computed.31
| RESULTS |
|---|
|
|
|---|
|
Table 2 presents mean scores on all measures at pre-treatment and post-treatment for the 36 patients evaluated on both occasions.
|
Repeated-measures analysis of variance, with the dynamic scales as dependent variables and time and raters as factors, showed five significant main effects for Time and one significant main effect for Raters, as shown in Table 3.
|
For four of the dynamic scales, 21% to 43% percent of the variance of average scores (r2) was shared variance with GAF. There was, however, a very high overlap between problem-solving capacity and GAF. The two variables shared 64% of the variance. Given the reliabilities of the two variables (>0.80), our findings may indicate that they measure nondiscriminable constructs. The two variables shared more than 71% of the reliable variance33 in this study. All other scales shared less than 48% of the reliable variance with GAF (and GSI).
A factor analysis was computed to evaluate whether or not the dynamic scales can be differentiated from global functioning and subjective distress. The number of factors was determined by the eigenvalues-greater- than-unity rule. Table 4 shows the results of the factor analysis.
|
| DISCUSSION |
|---|
|
|
|---|
The reliability estimates for individual scales reported in this study tend to be similar to or higher than reliability estimates for individual scales from other studies using new batteries of dynamic scales.3,19,21,22,36 The cited studies report that a number of individual scales had reliability coefficients below 0.50. We believe that our favorable results for individual scales are at least partly due to our scale format, which includes more descriptive levels and considerably more rating options than other dynamic scales. It is also possible that the comprehensive evaluation interviews secured more complete data for reliable assessments. Our scales are fine-grained enough to capture statistically significant and reliable changes during brief dynamic psychotherapy with ordinary neurotic outpatients. The highest ratios of patients with reliable changes were found in the areas of insight and tolerance for affects. This pattern of change is consistent with the techniques of exploratory dynamic psychotherapy, which specifically aim at endowing patients with greater insight and heightened awareness of their affects. However, this finding is only a weak indication of specificity, since we have no untreated control group or nondynamic alternative treatment group in this design. This is the first psychotherapy study to present reliability estimates of change ratings with dynamic scales.
Insight was the most difficult scale to rate reliably, especially at pre-treatment. Dynamic insight and tolerance for affect are measures of intrapsychic functioning that are closely connected to psychoanalytic theory, requiring more clinical inference or intuitive judgment in the assessment procedure.37 Personality features of different raters may contribute to rater bias. However, insight is considered a central curative factor in dynamic therapy and should be included in instruments for assessment of dynamic changes.
The raters were not blind with regard to whether the evaluation was pre-treatment or post-treatment. Time of evaluation is also frequently apparent from the content of the audiotaped interview. This may have influenced the ratings. We have not been able to detect any systematic differences between therapists rating their own patients and other evaluators' ratings.
The group interview setting may have influenced the interview process and also the clinical inferences upon which the ratings were based. We believe that having more than one interviewer for each patient secures more adequate coverage of all relevant aspects of patient functioning. Dynamic interviews, necessarily less structured than diagnostic interviews, may be unduly influenced by idiosyncratic "matches" between a single interviewer and the patient, and this may lead to a less than adequate interview when the aim is reliable assessment of complex human behavior. On the other hand, the independence of the raters may be compromised when several of them interview the patient on the same occasion. However, we detected no significant differences in the reliability estimates of the ratings between raters present at the interview and raters who only listened to audiotapes of the interview.
| CONCLUSIONS |
|---|
|
|
|---|
| Acknowledgments |
|---|
|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Hoglend, K.-P. Bogwald, S. Amlo, A. Marble, R. Ulberg, M. C. Sjaastad, O. Sorbye, O. Heyerdahl, and P. Johansson Transference Interpretations in Dynamic Psychotherapy: Do They Really Yield Sustained Effects? Am J Psychiatry, June 1, 2008; 165(6): 763 - 771. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Hoglend, S. Amlo, A. Marble, K.-P. Bogwald, O. Sorbye, M. C. Sjaastad, and O. Heyerdahl Analysis of the Patient-Therapist Relationship in Dynamic Psychotherapy: An Experimental Study of Transference Interpretations Am J Psychiatry, October 1, 2006; 163(10): 1739 - 1746. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Bornstein, K. J. Geiselman, E. A. Eisenhart, and M. A. Languirand Construct Validity of the Relationship Profile Test: Links With Attachment, Identity, Relatedness, and Affect Assessment, December 1, 2002; 9(4): 373 - 381. [Abstract] [PDF] |
||||
![]() |
A. G. Hersoug, P. Hoglend, J. T. Monsen, and O. E. Havik Quality of Working Alliance in Psychotherapy: Therapist Variables and Patient/Therapist Similarity as Predictors J Psychother Pract Res., October 1, 2001; 10(4): 205 - 216. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ALL ISSUES | SEARCH | TABLE OF CONTENTS |