We followed standard systematic review methods as described in the Agency for Healthcare Research and Quality (AHRQ) Methods Reference Guide for Effectiveness and Comparative Effectiveness Reviews . A full technical report describing these methods in detail, including literature search strategies, and presenting our findings in full (with evidence tables) is available elsewhere .
We searched the MEDLINE and Cochrane Central Trials Registry databases from study inception to September 2010 for English language studies examining adults (older than 16 years) with OSA. Our search, available in the full technical report , included terms for OSA, sleep apnea treatments and relevant research designs. The full literature search was performed for a range of key questions about OSA diagnosis, treatment with any intervention and predictors of outcomes. Six reviewers independently screened the abstracts. We used a computerized screening program, abstrackr, to automate the screening of abstracts for the selection of eligible articles for full-text screening . The abstrackr software uses an active learning algorithm to screen for relevant articles. Relevance was established by manually double-screening 1,000 abstracts to train the program. Subsequently, abstracts selected by the program were screened by one researcher. The results of screening were iteratively fed into the program for further training. This process continued until the program was left with only abstracts it rejected. Using abstrackr, we reduced by 50% the number of abstracts we needed to manually screen prior to starting the subsequent steps of the systematic review. Later, all abstracts rejected by abstrackr were manually screened for confirmation and were eventually rejected. Full-text articles were rescreened for eligibility by the same six reviewers.
We included peer reviewed, randomized controlled trials (RCTs) that compared APAP with fixed CPAP in ≥10 patients per intervention with confirmed diagnoses of OSA, including a formal sleep study demonstrating an apnea-hypopnea index (AHI) ≥5 events/hour. We included studies of any duration, though CPAP had to be used by the patients at home. Outcomes of interest included: objective clinical outcomes (death, cardiovascular events, hypertension, non-insulin dependent diabetes, depression); sleep and wakefulness related clinical outcomes (quality of life, sleepiness measures, neurocognitive tests, accidents, productivity); sleep study measures (AHI, arousal index, deep sleep, sleep efficiency, minimum oxygen saturation); comorbidity intermediate outcomes (hemoglobin A1c, blood pressure); compliance; and adverse events or harms.
Data from each study were extracted by one of six reviewers and confirmed by another. Extracted data included information on study and patient characteristics, details concerning the CPAP devices used, outcomes and study quality. For most outcomes, only data from the last reported time-point were included. We assessed the methodological quality of each study on the basis of predefined criteria in accordance with AHRQ's suggested methods for systematic reviews . The primary data extractor determined the study quality (rated with the letter grades A, B or C), and at least one other reviewer confirmed it. Quality A studies adhered most closely to the commonly held precepts of high quality, including clear descriptions of the population, setting, interventions, outcomes and design; no obvious reporting omissions or errors; fewer than 20% dropouts; and no obvious source of bias. Quality B studies had some deficiencies in these criteria that were, however, unlikely to engender a major bias. Quality C studies had inadequate descriptions of their studies or had substantial flaws in reporting or design, such that a major bias could not be excluded.
We performed random effects model meta-analyses of differences of selected continuous variables between interventions where there were at least three unique similar studies . Based on available data and our a priori assessment of the clinical importance of specific outcomes, we performed meta-analyses for the AHI, the Epworth Sleepiness Scale (ESS), arousal index (per hour frequency of arousals from sleep), minimum oxygen saturation (during sleep), the multiple sleep latency test (measurement of how quickly a subject will fall asleep during the day), the quality of life measure Functional Outcomes Sleep Questionnaire and compliance (measured as time per night using the device). When necessary, standard errors of the net change (difference between the within-arm changes) were calculated from CIs, P values or from the standard errors of the within-arm changes. When necessary, standard errors of the within-arm changes were estimated from the standard errors of the baseline and final values, assuming a 50% correlation between the two. Studies that compared two different forms of APAP to CPAP were treated as independent despite the common CPAP arm. Due to limitations of the reported data and for consistency, in cross-over studies we treated the difference in final values as equivalent to the net change, under the assumption that the baseline values were equal and would thus cancel out. Heterogeneity among effect sizes was assessed using the I2 index, and the chi-square test. An I2 index ≥50% was used to indicate medium-to-high heterogeneity .
To explore sources of heterogeneity in between-study findings, all forest plots were drawn with subgroup meta-analyses of trials stratified by baseline OSA severity (as determined by the minimum AHI threshold required in each study for the diagnosis of OSA). Forest plots sub-divided by study design are presented in the full technical report . The decision to subgroup studies by minimum AHI and by study design was made a priori; however, the minimum AHI categories were based on thresholds reported in the studies. We performed meta-regressions separately with AHI thresholds and study design to determine statistically significant differences among subgroups.
We graded the strength of the body of evidence based on the AHRQ Methods Reference Guide . We took into account the overall study quality, the consistency across studies, the applicability of the studies to the general population of patients treated for OSA, the magnitude and precision of the treatment effects and the relative clinical importance of the different outcomes assessed . The overall strength of evidence was rated as high, moderate, or low - which each indicate the level of confidence that the evidence reflects the true effect - or insufficient.