Data extraction for complex meta-analysis (DECiMAL) guide

As more complex meta-analytical techniques such as network and multivariate meta-analyses become increasingly common, further pressures are placed on reviewers to extract data in a systematic and consistent manner. Failing to do this appropriately wastes time, resources and jeopardises accuracy. This guide (data extraction for complex meta-analysis (DECiMAL)) suggests a number of points to consider when collecting data, primarily aimed at systematic reviewers preparing data for meta-analysis. Network meta-analysis (NMA), multiple outcomes analysis and analysis combining different types of data are considered in a manner that can be useful across a range of data collection programmes. The guide has been shown to be both easy to learn and useful in a small pilot study. Electronic supplementary material The online version of this article (doi:10.1186/s13643-016-0368-4) contains supplementary material, which is available to authorized users.


Background
Data collection is a vital part of a systematic review. It bridges the gap between a review and a meta-analysis. Making this as easy, understandable and accurate as possible hugely speeds up the process of data cleaning and checking for the data analyst/reviewer. Lack of coordination between reviewers and analysts can lead to errors which may feed through to produce incorrect results and inferences in systematic reviewing.
As more complex techniques such as network and multivariate meta-analyses become increasingly common in systematic reviews, further demands are placed on reviewers to extract data in a systematic and consistent manner. Learning from the experience on conducting systematic reviews and complex meta-analyses to inform decision-making for the development of UK National Institute for Health and Care Excellence (NICE) guidelines, this guide was developed after discussions with senior reviewers, with the intention of improving the consistency and accuracy of data collection.
Further development and initial testing of the usefulness of this guide was performed in a pilot study involving reviewers from two UK NICE clinical guideline development teams and centres. Reviewers with a wide range of experience in systematic reviewing from across the centres were invited to participate in the study. Fifteen out of 25 reviewers (60% response rate) completed two mock data extractions (one network metaanalysis (NMA) and one multivariate extraction) and then evaluated the guide using a modified version of the 10-item System Usability Scale [1]. Feedback from reviewers was used to further improve the guide.
An initial review of available data extraction guides in systematic reviewing identified a paucity of tools to guide data collection for complex evidence synthesis. Brown et al. report on a framework for developing a coding scheme for data extraction for meta-analysis, but the authors did not cover the more technical issues that can arise during complex meta-analysis, such as multiple arms and correlated outcomes [2]. We also identified several data extraction templates developed by the Cochrane Collaboration which provides guidance on topics to be covered in data extraction and quality assessment at a study level but does not suggest methods for organising multiple studies [3].
In order to cover this gap in the literature, we have developed a guide (data extraction for complex metaanalysis (DECiMAL)) to assist reviewers extracting data from systematic reviews in a consistent way for use in meta-analyses. The guide was not designed with the aim to be exhaustive but to address most of the problems faced when collecting various types of data, such as time-to-event, binary or continuous, for complex analyses such as NMA and multivariate meta-analyses. Since it is much easier to identify and correct data collection issues before all data are collected, this guide aims to raise early awareness of these issues so that they can be discussed and addressed from the outset of the process.
This guide is intended to assist reviewers only with the data extraction aspects of meta-analysis. It does not provide instructions on statistical techniques of meta-analysis in systematic reviews, such as handling of missing data or converting summary statistics, as reviewing them is not the aim of this paper. It also is intended to assist only with data extraction for aggregate data meta-analyses, as methods will differ for individual patient data meta-analyses.
Many different database programmes are available for managing data. Microsoft Excel or Microsoft Access are often used for smaller datasets, whilst more specific statistical software, such as STATA or R, may be used for larger projects which require more complex data manipulation. Some software will have inbuilt functions that restrict input to certain types of data, such as string or numerical, depending on how each variable has been pre-specified. For instance, programmes such as Review Manager already have built-in functions to address many of the issues discussed in this guide, though as a result, the procedures for analysis are more limited.
The points suggested here will be relevant for almost any software that is used for data collection, provided they can be visualised in the format of rows of observations (studies in this case) and columns of variables.
The guide is structured as follows: The "Background" section contains information on data extraction for different types of analysis Suggestions 1-4 apply mainly to data collection for network meta-analysis Suggestions 5-6 describe issues with data collection involving multiple outcomes which may inform a multivariate meta-analysis The "Discussion" section contains information on data extraction for different types of data Suggestions 7-14 describe ways of collecting data of different types, such as time-to-event data or relative effect data The "Conclusions" section contains general information on data extraction Suggestions 15-27 make some general points reviewers should be aware of, regardless of the type of data or meta-analysis their data collection will inform.
Additional file 1 is an Excel workbook containing five worksheets: One study per row (arm): example data extraction for a meta-analysis of arm-based (absolute) data in the one study per row format One study per row (relative): example data extraction for a meta-analysis of relative data in the one study per row format Rate data: example data extraction for a meta-analysis of rate data in the one study per row format Diagnostic test accuracy: example data extraction for a diagnostic test accuracy meta-analysis Codebook: example of a glossary worksheet to demonstrate the coding of different variables in a data extraction

DECiMAL guide Data extraction for different types of analysis
Network meta-analysis 1. When collecting data for a network meta-analysis (NMA), always note in a separate numerical column how many arms the trial had. 1.1. Also (in another column) note the arm number that the observation/row in the database refers to and keep these consistent when collecting data with multiple outcomes or at multiple time points (e.g. keep placebo in arm 1 for all outcomes). 2. Decide on a sensible treatment numbering and classification in advance. This will help with correctly numbering the arms when extracting data. By ensuring that the highest numbered treatment is always compared to the lowest, the effect estimates will be consistent (Additional file 1 -Codebook).

Different combinations or doses of interventions can
be added as separate treatments, with separate numbers/classifications to distinguish between them, depending on how the protocol specifies these should be analysed. 4. A one study per row format can be useful to prevent duplication of study ID, treatments, numbers randomised and other characteristics (e.g. risk of bias), provided the data are not too complex. 4.1. Multiple outcomes and time points can be collected onto the same row in new columns (though this can become cumbersome with many time points and outcomes). 5. It can be easier to collect arm-based (absolute) data on one worksheet and relative data on a different worksheet, since they will require different columns and different analysis approaches (Additional file 1-One study per row (arm) and One study per row (relative)). 5.1. For relative effects, extra columns will be needed to clarify which treatment is being compared to which. Care should be taken to identify which treatment is the "comparator" and which is the "experimental" (see Suggestion 19). 5.2. When extracting relative effects for ratio outcomes, these should be extracted on the natural-logarithm scale (e.g. log-hazard ratios) with their standard errors.
Multiple outcomes and multivariate meta-analysis 5. These can either be collected with a separate row for each outcome, or (preferably) in the one study per row format, with an additional set of columns for each additional outcome (Additional file 1-One study per row (arm) and one study per row (relative)). 6. Multiple time points can be extracted similarly to multiple outcomes, with each time point from the same study extracted as either a separate row or in the one study per row format. 7. Joint distributions may be reported in some studies-this is where the number of patients with each outcome is reported for each level of another outcome. 7.1. For instance, "gestational age" and "mode of birth" are reported as outcomes. Their joint distribution can be obtained if gestational age is reported separately for each mode of birth (e.g. vaginal: mean = 39.5 weeks, SD = 5 weeks; caesarean: mean = 40.7 weeks, SD = 4.7 weeks). 7.2. If data for joint distributions are reported, then a simple note that this is the case should be written consistently in a notes column, as this information can be used for multivariate metaanalysis or for health economic modelling (Additional file 1 -Rate data). The full data can then be extracted more easily at a later date when and if it is needed. 7.3. Diagnostic accuracy studies should be analysed using a multivariate approach to account for the correlation between sensitivity and specificity. Rate data (e.g. frequency of migraine episodes) 9. When rates are reported, the total number of person-years at risk should also be collected (Additional file 1-Rate data). 10. If this is not available, then the average length of follow-up and the total number of patients at the end of the study should be collected instead, as these can be used to approximate the total person-years (by making some extra assumptions). 11. Sometimes, rate data are reported either as the number of first events or the total number of events, in a given time period. It is important to distinguish between these as they may need to be modelled separately. This can be done by having separate columns to collect each type of data (usually the most appropriate option), or by including a column which states which data type it is.
Binary and categorical variables (Additional file 1 -One study per row (arm)) 12. If you are dealing with binary responses, it is normally easier to use numbers than letters or text (Additional file 1-One study per row (arm)

Discussion
Although there are previous examples of guides and forms available for evidence synthesis [2,3], these are aimed more at those wishing to perform data extractions for standard pairwise meta-analyses. Currently, no such guide exists for more complex evidence synthesis techniques, such as NMA or multivariate meta-analyses, which often require larger and more complex data extractions. The DECiMAL guide aims to address this by providing a series of relevant suggestions for how to improve data extraction for complex meta-analysis, supporting the suggestions for how to extract different types of data with several different examples. It is intended to help support reviewers when embarking on a complex meta-analysis and to prepare them in advance for situations they might encounter during data extraction that might lead to inconsistency in the way results are extracted and coded. It does not provide advice on good statistical practice but suggests steps to ensure that sufficient information is extracted to allow any type of analysis (e.g. missing data using either complete case analysis or imputation).
Results from the pilot study showed that the guide was both easy to learn and useful, though the type and format of data to be extracted can add complications when developing a data extraction template. Reviewers found that whilst the DECiMAL guide gave them useful advice in a form that was easy to refer to whilst working, starting a complex data extraction without support from someone with experience was challenging, and the guide could not be a replacement for technical expertise.