The videotape records of consecutive patients diagnosed between July 2006 and August 2012 with clinically definite PMD involving gait as a primary or associated impairment, but without pain as a major symptom to avoid this source of confounding, were edited to only include segments corresponding to the standing and walking tasks documented during their neurological examination. Similar video material was collected from patients with cerebellar, spinocerebellar, and sensory ataxia during July 2011 and August 2012 (control group). One clinician rated the severity of the gait impairment combining items 27 (standing) and 28 (gait) of the motor part of the UPDRS and item 16 (turning) of the Gait and Balance Scale (Appendix 1 in the Supporting Information). Three clinicians blinded to subjects' diagnoses and study purpose rated these standing and walking video segments for severity, duration, and main effort-associated features: breath holding, vocalizations (moaning or groaning), grimacing, or any other manifestation of disproportionately excessive labor, herein figuratively labeled huffing and puffing (H-P). Severity was rated on a scale from 0 to 4 (0 = none, 1 = minimal, 2 = mild, 3 = moderate, and 4 = severe). Duration was rated on a scale from 0 to 4 (0 = none, 1 = <25% of time, 2 = 25%–50% of the time, 3 = 50%–75% of the time, and 4 = >75% of the time). The total score was derived as the product of severity by duration (see Appendix 2 in the Supporting Information, see Videos 1 and 2).
Analysis Based on Combined Raters
We determined kappa agreement among three expert clinical raters for both rated cohorts. The median rank of H-P behavior across three raters was used to define the overall severity, duration, and total score. H-P was considered positive if the median score across three raters was greater than or equal to 2 (“mild”). H-P scores between PMD and control groups were compared using Wilcoxon's rank-sum test. We determined the diagnostic performance of H-P presence in classifying subjects with PMD, as compared to controls. Diagnostic performance was summarized using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and receiver operating characteristics ([ROC] area; defined as the average of sensitivity and specificity), including 95% confidence intervals (CIs).