Comparison of Subjective and Objective Intradermal Allergy Test Scoring Methods in Dogs with Atopic Dermatitis
An intradermal allergy test (IDT) is an important diagnostic tool for identifying offending allergens in canine atopic dermatitis. No standardized method of scoring an IDT has been described. The purpose of this study was to determine whether there is a correlation between a conventional, subjective IDT scoring method based on perceived wheal diameter, erythema, and turgor (0–4+) and an objective scoring method based on measuring wheal diameter alone. Thirty-four atopic dogs were skin tested with 68 different allergens. All skin tests were performed according to standard procedures, and any IDT score ≥2+ was considered clinically significant. When the subjective IDT scores were compared with the objective IDT scores in all dogs, there was a moderate level of correlation overall (r=0.457; P <0.0001). The highest level of agreement between subjective and objective scores was noted with the reactions assigned subjective scores of “0” and “2+.” Overall, there was a slight level of agreement between subjective and objective scores based on clinical significance (i.e., subjective scores ≥2+; κ=0.20; P <0.0001). In conclusion, the authors believe that the objective scoring method used in this study may provide a point of reference for inexperienced individuals (dermatology residents, veterinarians, technicians) when learning to grade an IDT.
Introduction
Canine atopic dermatitis (AD) is a genetic, inflammatory, and pruritic allergic skin disease with specific clinical features that are associated most commonly with immunoglobulin-E antibodies to environmental allergens.1 A diagnosis of AD is made based on the patient's history, clinical signs, and ruling out other causes of pruritic skin disease (e.g., secondary infections, flea allergy dermatitis, scabies, cutaneous adverse food reactions, contact dermatitis). After a clinical diagnosis of AD has been made, an intradermal allergy test (IDT) can be used to identify the offending allergens. The IDT is an important tool in veterinary dermatology as it allows for the development of allergen-specific immunotherapy in atopic patients and helps determine allergen-avoidance measures.
IDT is the practice of introducing small amounts of antigen into the dermis of atopic patients to cause a visible hypersensitivity reaction (wheal).1 Reactions are then compared with a positive (histamine) and negative (saline) control. Most IDT results are read within 15–30 min of the injections to evaluate the presence of a wheal (i.e., an immediate phase reaction).
IDT reactions can be scored subjectively, objectively, or by a combination of both methods. Subjective scoring is based on the perceived diameter, erythema, and turgidity of the wheal, and is a visual and tactile assessment of each reaction.2–6 In contrast, objective scoring is based on the diameter of the wheal alone, measured in millimeters.2–6 By convention, the subjective scores assigned to these skin test reactions range from 0 to 4+ as described in Table 1.3–5,7 The subjective scoring method appears to be used more routinely in clinical veterinary dermatology.3,4,8 In the authors’ practice, a subjective scoring method is used and a score ≥2+ is considered clinically significant.2,3,8 Each individual reaction is compared with the positive and negative controls. Only the clinically significant, positive reactions are considered for inclusion in immunotherapy protocols.
A reaction was assigned a score of 2+ when a combination of erythema, turgidity, and wheal diameter was equal to a score midway between that of the positive and negative controls.
In a survey performed by DeBoer in 1989, it was discovered that 80% of veterinary respondents from a total of 255 surveys reported their IDT scores on a 0–4+ scale (subjective scoring), 13% of respondents reported their scores based on wheal diameter in millimeters (objective scoring), and the other 7% reported their results as positive or negative reactions only.9 A similar survey of human allergists was performed by Oppenheimer in 2006.10 In that survey, of the 539 physicians that completed the survey, 53.8% reported their IDT scores on a 0–4+ scale (subjective), 28.3% used wheal diameter in millimeters (objective), and 17.9% reported positive or negative results only.10
To date, there is no standardized method of scoring an IDT using either the subjective or objective method in dogs.8,9 There is also no standardized method of teaching an inexperienced individual to grade an IDT. Many veterinary dermatologists acquire this skill by observing a more experienced veterinary dermatologist's technique or by measuring the wheal diameter in millimeters and comparing it with the wheal diameter of the positive and negative controls in millimeters. Reedy et al stated that if two wheals were identical in diameter, but one was more erythemic and turgid, it would be considered more reactive.7 In contrast, Willemse et al stated that due to the experience needed to reliably interpret the subjective portion of the IDT, the maximal wheal diameter in millimeters is considered by some investigators to be a more accurate indicator of the intensity of an IDT reaction.11 These studies illustrate that there are varying opinions among veterinarians as to which scoring method is more effective, providing a good argument for grading an IDT using a combination of subjective and objective scoring methods. The purpose of this study was to evaluate and compare a subjective IDT scoring method performed by an experienced veterinary dermatologist with an objective IDT scoring method performed by a veterinary dermatology resident and to determine whether there was any correlation and/or agreement between the two methods. Demonstration of at least a moderate level of correlation and/or agreement between the subjective and objective scoring methods would be needed to provide a good argument for using the objective scoring method as an initial training tool to help an inexperienced dermatology resident learn to grade an IDT. The objective measurements would provide a stepping stone for the inexperienced resident to build the skill of grading IDTs and prevent them from exceeding the time allowed (15–20 min) to grade the IDT.
Materials and Methods
Dogs
Between December 2006 and September 2007, dogs entered into this prospective study were diagnosed with AD based on their history and clinical signs. Other causes of pruritic skin disease such as sarcoptic mange, flea bite hypersensitivity, dermatophytosis, demodicosis, pyoderma, and Malassezia dermatitis were identified by performing routine skin scrapings, skin cytologies, fungal cultures, and flea combings on each dog, and were negative or eliminated. Although dermatophytosis and demodicosis are not traditionally considered pruritic dermatoses, they can cause pruritus in some dogs, particularly when associated with a secondary infection; therefore, they need to be ruled out to diagnose AD. A strict 8 wk elimination diet was completed in all dogs with a history of nonseasonal clinical signs (28/34) to rule out cutaneous adverse food reactions. Malassezia and bacterial infections were treated with antifungals and antibiotics at appropriate dosages and for at least 3 wk before allergy testing. All dogs were on a monthly topical or oral veterinary approved flea preventative before entering the study. Glucocorticoids (topicals 2–3 wk, orals 4–6 wk, injectables 6–12 wk) and antihistamines (2 wk) were appropriately withdrawn before IDT as recommended.9 The dogs had not received any products/diets containing ω-3/ω-6 fatty acids, nonsteroidal anti-inflammatory drugs, or any other immunosuppressive drugs 30 days before the IDT. Informed consent was obtained from pet owners for animals included in the study.
Intradermal Allergy Test and Allergens
Injections of allergens were performed by the dermatology resident. Before testing, dogs were sedated with medetomidinea at 0.007–0.02 mg/kg and atropineb at 0.003–0.01 mg/kg intravenously. All allergens used in this study were purchased from Greer Laboratoryc as concentrates in glass vials stored at 4°C as recommended by the manufacturer. New aliquots of the concentrates were diluted to testing strength every 6 wk. Histamine was used as a positive control (1:100,000 w/v) and saline (0.9% phosphate buffered saline) as a negative control (also supplied by Greer Laboratoryc). All other allergens were tested at 1000 protein nitrogen units (PNU)/mL unless otherwise indicated (Table 2). Positive and negative controls were injected at the beginning and the end of each IDT to ensure an appropriate level of reactivity, to assess repeatability in each dog, and to allow each dog to serve as its own control. A total of 78 injections (i.e., histamine [2], saline [2], and 68 allergens regionalized to Northern Georgia) were administered to each dog. Some allergens (insects and dust mites) were tested at more than one concentration.
Allergens listed were tested at 1000 protein nitrogen units (PNUs) unless otherwise indicated.
The fur over the left lateral thorax was gently shaved with a number 40 clipper blade and a waterproof marker was used to mark each injection site. Each intradermal injection consisting of 0.1 mL of histamine, saline, or allergen was delivered using 0.5 mL syringes with 27 gauge (0.5 mm) needles. Each skin test site was scored both subjectively and objectively 15–20 min post-injection. Injections were not randomized and investigators were not blinded to the identity of the allergens; however, the investigators were blinded to each others’ scores. The objective scoring method was performed first, followed by the subjective scoring method.
Interpretation of Intradermal Allergy Test
Each skin test reaction was graded subjectively on a scale of 0–4+ (Table 1) and objectively by measuring the length and width of each reaction in millimeters using a two-sided ruler (Figure 1). The same ruler was used to measure each reaction in every dog. A mean wheal diameter was assigned by taking an average of the length and width of each reaction. The objective mean wheal diameter of the negative control in millimeters was compared with a subjective score of “0” and the objective mean wheal diameter of the positive control in millimeters was compared with a subjective score of 4+. A subjective score of 2+ was compared with the objective mean wheal diameter between the positive and negative control measurements. A clinically significant positive reaction was any objective measurement in millimeters greater than or equal to the diameter between the positive and negative control measurements.



Citation: Journal of the American Animal Hospital Association 47, 6; 10.5326/JAAHA-MS-5638
Statistical Analysis
The objective wheal diameter measurements of both histamine (H1 and H2) and saline (S1 and S2) controls were averaged in each dog. These repeated objective control measurements were then compared using a paired t-test. Objective measurements included the allergens and the controls. Subjective scores were tested for correlation to the objective measurements using the Spearman rank correlation. The Spearman rank correlation was evaluated on a scale of −1 to +1, where −1 represented negative correlation and +1 represented positive correlation. Descriptive statistics (mean, standard deviation, minimum, maximum, 5th percentile, and 95th percentile) were performed on the objective measurements for each subjective scoring level (0–4+). Mean objective measurements for the five subjective scoring groups (0–4+) were tested for differences using an analysis of variance and multiple comparisons were adjusted for using Tukey's test. Objective measurements greater than or equal to the mean of the positive and negative control measurements were considered clinically significant. Subjective scores ≥2+ were considered clinically significant. The agreement of clinical significance based on objective measurements and subjective scores was calculated using a κ test. All hypothesis tests were 2-sided, and the significance level was P <0.05. All analyses were performed using SAS version 9.2d.
Results
Between December 2006 and September 2007, 81 privately owned dogs residing in the southeastern United States with a clinical diagnosis of AD were entered into the study. Of these, 47 were excluded because the data were incomplete (i.e., both subjective and objective scores were not completed in these patients). A total of 34 dogs were enrolled, including 4 intact females, 14 spayed females, 2 intact males, and 14 neutered males. Their ages ranged from 1 to 13 yr of age (mean age 4.8 yr). There were 12 mixed-breed dogs, 5 Labrador retrievers, 3 cocker spaniels, 2 pugs, and 1 each of bassett hound, boxer, English bulldog, shih tzu, beagle, Yorkshire terrier, bichon frise, American Staffordshire terrier, miniature schnauzer, German shepherd, golden retriever, and miniature poodle. Body weights ranged from 2.7 kg to 40.7 kg.
Wheal diameters of H1 and H2 ranged from 12 mm to 19 mm when evaluated objectively in all 34 dogs. Mean wheal diameters of H1 and H2 were 16 mm and 15 mm, respectively. Wheal diameters of S1 and S2 ranged from 7.5 mm to 13.5 mm when evaluated objectively in all 34 dogs. Mean wheal diameters of S1 and S2 were 10.5 mm and 9.75 mm, respectively. Although identical concentrations and volumes of each control were injected, the mean wheal diameters of H1 and S1 in all dogs were larger than H2 and S2. In addition, there was an overlap between the individual wheal diameters for histamine and saline. The H2 measurements were on average 0.10 mm smaller in diameter than the H1 measurements; however, this difference was not statistically significant (P=0.7206). The S2 measurements were on average 0.59 mm smaller in diameter than the S1 measurements (P=0.0298).
When the subjective IDT scores were compared with the objective measurements, there was a moderate level of correlation (P=0.0001; r =0.457) as described in Table 3. All objective mean wheal diameters for each grade of subjective scores were significantly and statistically different (P <0.05). A significant difference was noted between all mean wheal diameters for all other scoring categories. Statistically, the objective wheal diameters increased as the subjective scores increased.
When assessing clinical significance and its agreement between the subjective scores and objective IDT measurements, the κ test revealed only a slight level of agreement (κ=0.20, P <0.0001). As summarized in Table 4, there was a 9.46% agreement on positive clinical significance of IDT scores and almost a 61% agreement on negative clinical significance of IDT scores. Twenty-nine percent of the subjective scoring concluded clinical significance when the objective scoring did not, but only 2% of the objective scoring concluded clinical significance when the subjective scoring did not. The clinical significance rate was 37% for subjective scoring and 11% for objective scoring. When combined, there was approximately 70% agreement in clinical significance (both positive and negative) for the subjective and objective scores.
Discussion
IDT measures the wheal and flare response to intradermally injected allergens. Most experienced veterinary dermatologists can score an IDT visually (subjectively) without any objective measurements. This is the more commonly used method of scoring an IDT in veterinary medicine.7,9 To date, there is no standardized method of scoring an IDT, which makes it difficult to train an inexperienced individual (i.e., dermatology residents, veterinarians, technicians). Because IDTs are a very important diagnostic tool in veterinary dermatology, identifying an effective reference point or training method to help inexperienced individuals acquire the skill of subjectively scoring an IDT would be ideal.
Wheal diameter plays a role in determining whether a reaction is positive or negative, but only accounts for one-third of the visual and tactile assessment used in the subjective scoring method. A recent study by McCann and Ownby asked human allergists to interpret photographic copies of skin test reactions subjectively (except for turgidity); a great degree of variability was seen.12 These findings indicated that a significant amount of variability might occur among individual scorers using visual wheal size and erythema. The same study showed the most reliable method to record a skin test reaction was to measure the wheal diameter (objective method).12 In 1982, Willemse et al considered maximum wheal diameter as a more accurate indication of the strength of the skin reaction due to the inaccuracy of converting objective measurements in millimeters to subjective scores (0–4+).8,11
In the present study, an objective scoring method was used to provide a reference point to help determine whether there was a correlation between a strictly objective scoring method and a subjective scoring method. There was a moderate level of correlation between the subjective and the objective scores and a great deal of variability in the objective positive and objective negative control wheal diameters. Consequently, this variability made it impossible to establish a standard range for the objective scores. The dermatology resident did note, however, that throughout the 9 mo duration of the study, the objective assessment of wheal diameter alone did provide a point of reference, which helped her to develop the skill of scoring an IDT.
Over the course of the study it became clear that the objective assessment of IDT was difficult to standardize. Because there was an overlap in mean wheal diameters between the positive (H1 and H2) and negative (S1 and S2) controls, it was also difficult to differentiate between a positive and negative reaction using the objective method alone. Likewise, some of the negative control measurements were the same diameter or larger than some of the positive control measurements. For this reason, it was imperative that each dog served as its own control. Taken together, the subjective assessment of erythema and turgidity may provide a more accurate assessment of the IDT. These results may also indicate that a combination of subjective and objective scoring methods may provide more precise IDT scores.
The subjective method is used in most clinical settings.9 Previous studies indicated that erythema and turgidity provided additional support for positive reactivity and increased the incidence of positive reactions.7,8,13,14 If erythema, turgidity, and perceived wheal diameter were scored independently, the authors could have determined how much each individual parameter contributed to the subjective scores. In 1984, Nesbitt stated that a subjective assessment of reactions increased the number of positive reactions by 20–30%.14 Likewise, in a clinical study conducted by Graham et al, there were significantly more positive reactions with the subjective method rather than the objective method of interpretation.8 Comparatively, the present study also supported these findings as there were more positive reactions scored with the subjective scoring method than with the objective scoring method. This was probably due to the additional weight of erythema and turgidity with the subjective scoring method.
As mentioned previously, the H2 measurements were an average of 0.1 mm smaller than the H1 measurements and the S2 measurements were an average of 0.59 mm smaller than the S1 measurements. These differences in wheal diameter might have been a result of the injection technique (being administered subcutaneously instead of intradermally) and variability in time/ volume injected/reactivity among each individual dog. Variations in reactivity might have been attributed to each dog's response to cutaneous injections (mast cell degranulation and release of inflammatory mediators, dermal mast cell numbers, variations in vasodilation) and/or the location of the injection site on the thorax. Nimmo Wilke et al demonstrated more mast cells in the skin of atopic dogs compared with normal dogs in skin samples obtained from the lateral neck, dorsal rump, and abdomen, whereas Asti et al and Auxilia et al both demonstrated more mast cells in the superficial dermis compared with the deeper dermis of the pinna, thorax, and thigh regions.15–17 Variations in mast cell distribution might have contributed to the variations in wheal diameter noted among the positive control injections; however, this did not explain the variability in the negative control wheal size. There was also an overlap between the largest diameter of the negative controls (13.5 mm) and the smallest diameter of the positive controls (12 mm). These results exposed the difficulty involved with standardizing IDTs based on diameter and emphasized the importance of each dog serving as its own control relative to its own cutaneous response to the positive (histamine) and negative (saline) controls.
When comparing the two scoring methods, there was better agreement among the nonclinically significant scores and measurements. There was also a higher level of agreement among the subjective scores and objective measurements for the 0 and 1+ reactions. Twenty-eight percent (739/2,652) of the time, the subjective scoring method concluded clinical significance, whereas the objective scoring did not. Only 2% (45/2,652) of the time did the objective scoring conclude clinical significance, whereas the subjective scoring did not. Unfortunately, this data could indicate a high level of false-positive results with the subjective scoring method or a high level of false-negative results with the objective scoring method. A combination of the subjective and objective scoring methods could help eliminate the possibility of false-positive and false-negative results by providing more reliable IDT scores and eliminating a high level of variability between the scoring methods.
It should be taken into consideration that different individuals scoring an IDT may define or understand the subjective outcomes as different weighted combinations. For example, one individual may score an IDT reaction based on 40% wheal diameter, 40% erythema, and 20% turgidity, whereas another individual may grade the exact same IDT reaction based on 20% wheal diameter, 20% erythema, and 60% turgidity. Hence, one individual's subjective assessment of a score of 2+ may be equal to another individual's subjective score of 3+ for that same reaction or vice versa. This leads to a significant amount of variability within the subjective scoring method and contributes to the difficulty in standardizing this scoring method. The authors also made the assumption that the experienced dermatologist's results were the standard for most IDTs. It would be interesting to compare the subjective scoring method of three experienced (≥20 yr) dermatologists, in the same dog, to see how close their results would be. This certainly would bring into question the validity of assigning a subjective score (0–4+) to the IDT reactions. Perhaps instead of assigning a measurement or overall score to each reaction, indicating whether the reaction is positive or negative may be more appropriate.
The difficulties related to standardizing IDTs are associated with defining levels of agreement in categorical scales.18 In this study, the authors selected a method, based upon ease of repeatability, to help in setting up what amounts to be a Litmus test of how close in agreement an objective scoring method based only on wheal diameter would be compared with a conventional subjective scoring method.
Ideally, this study could have been designed in a randomized, double-blinded fashion to prevent any bias on behalf of the investigators’ scoring each IDT. Additionally, the study could have been designed to assign an individual score to each component of the subjective scoring method (wheal diameter, erythema, and turgidity). This would have allowed a direct comparison of the objective wheal diameter to the subjective wheal diameter. This technique was attempted in a few dogs, but it was quickly discontinued because the time it took to read and score erythema, turgidity, and wheal diameter individually far exceeded the 15–20 min allowed to read the IDT, especially with the sizeable number of injections (78) used in this study. To achieve this goal, all IDTs would have to be performed with fewer injections to prevent any time constraints and to ensure more reliable results. This may be possible if the subjective method only assigns a score based on wheal diameter and is then compared with the objective wheal diameters. Although seemingly straightforward at the outset, this was a difficult experiment to evaluate statistically and emphasizes why a standardized method of scoring an IDT has not been developed.
Conclusion
It was determined that overall there was a moderate level of correlation between the subjective IDT scoring method, performed by an experienced veterinary dermatologist, and the objective IDT scoring method performed by a veterinary dermatology resident. Additionally, the objective scoring method used in this study might be partially effective in training a veterinary dermatology resident or inexperienced individual to score an IDT. The objective scoring method used in this study did provide a point of reference for the inexperienced investigator who then learned how to evaluate an IDT subjectively over time. Hence, the authors still believe that a combination of subjective and objective IDT scoring methods would provide more reliable IDT results than subjective alone because erythema and turgidity in combination with wheal diameter appeared to provide a more accurate assessment of the IDT. Future controlled studies are needed to standardize and critically evaluate both IDT scoring methods.

The two-sided ruler used to measure each test reactions (in millimeters) for the objective scoring method.
Contributor Notes


