Systematic review and meta-analysis of diagnostic accuracy of detection of any level of diabetic retinopathy using digital retinal imaging

Background Visual impairment from diabetic retinopathy (DR) is an increasing global public health concern, which is preventable with screening and early treatment. Digital retinal imaging has become a preferred choice as it enables higher coverage of screening. The aim of this review is to evaluate how different characteristics of the DR screening (DRS) test impact on diagnostic test accuracy (DTA) and its relevance to a low-income setting. Methods We conducted a systematic literature search to identify clinic-based studies on DRS using digital retinal imaging of people with DM (PwDM). Summary estimates of different sub-groups were calculated using DTA values weighted according to the sample size. The DTA of each screening method was derived after exclusion of ungradable images and considering the eye as the unit of analysis. The meta-analysis included studies which measured DTA of detecting any level of DR. We also examined the effect on detection from using different combinations of retinal fields, pupil status, index test graders and setting. Results Six thousand six hundred forty-six titles and abstracts were retrieved, and data were extracted from 122 potentially eligible full reports. Twenty-six studies were included in the review, and 21 studies, mostly from high-income settings (18/21, 85.7%), were included in the meta-analysis. The highest sensitivity was observed in the mydriatic greater than two field strategy (92%, 95% CI 90–94%). The highest specificity was observed in greater than two field methods (94%, 95% CI 93–96%) where mydriasis did not affect specificity. Overall, there was no difference in sensitivity between non-mydriatic and mydriatic methods (86%, 95% CI 85–87) after exclusion of ungradable images. The highest DTA (sensitivity 90%, 95% CI 88–91%; specificity 95%, 95% CI 94–96%) was observed when screening was delivered at secondary/tertiary level clinics. Conclusions Non-mydriatic two-field strategy could be a more pragmatic approach in starting DRS programmes for facility-based PwDM in low-income settings, with dilatation of the pupils of those who have ungradable images. There was insufficient evidence in primary studies to draw firm conclusions on how graders’ background influences DTA. Conducting more context-specific DRS validation studies in low-income and non-ophthalmic settings can be recommended. Electronic supplementary material The online version of this article (10.1186/s13643-018-0846-y) contains supplementary material, which is available to authorized users.


Background
Diabetes mellitus (DM) is one of the most prevalent non-communicable diseases and has significant impacts on health systems [1]. The International Diabetes Federation (IDF) estimated that there were 425 million people with DM (PwDM) in the world in 2017 which is projected to increase to 629 million by 2045 [2]. The greatest impact affects low-and middle-income countries (LMIC) (overall increase 69%) due to ageing population, obesity and sedentary life style [3]. This is exacerbated by weak health systems coupled with slow economic development [4]. Diabetic retinopathy (DR) is a common microvascular complication of DM caused by chronic hyperglycaemia [5]. A pooled meta-analysis using population-based studies conducted in the USA, Australia, Europe and Asia showed that the prevalence of any DR in PwDM aged 20 to 70 years was 34.6% (95% CI 34.5-34.8%): proliferative DR affected 6.96% (95% CI 6.87-7.04%) and sight-threatening DR (STDR) affected 10.2% (95% CI 10.1-10.3%), globally translating to approximately 28 million PwDM affected by STDR [6]. DR is a leading cause of blindness among the young and middle-aged adults in most of the high-income countries (HIC).
Many studies have shown that control of risk factors, early DR screening (DRS) and appropriate treatment can reduce the risk of blindness and visual impairment due to DR [7][8][9][10][11][12]. Digital retinal imaging has been widely practiced and an accurate method for DRS [13]. Providing appropriate training to photographers is of paramount importance, and with enough practice, high levels of competence can be achieved by those taking imaging regularly. Non-mydriatic digital imaging methods cause less discomfort and are more convenient for service providers. However, poor image quality is an important limitation of digital retinal imaging, particularly if non-mydriatic systems are being used, in countries where cataract is common [14].
In current literature, a systematic review showed that dilated imaging aided by fundoscopy for ungradable images was an effective modality to screen for DR [15]. This review included studies from 1985 to 1998 when digital retinal imaging technology was not available. Shi et al. concluded that accuracy of detecting presence/absence of DR by tele-medicine using digital imaging is high (pooled sensitivity 80%, 95% CI 84-88%; pooled specificity 89%, 95% CI 88-91%) [16]. Another metaanalysis concluded that dilatation of the pupils did not have a bearing on the diagnostic test accuracy (DTA) for any level of DR (sensitivity: odds ratio (OR) − 0.89, 95% CI 0.56-1.41, p = 0.61; specificity: OR 0.94, 95% CI 0.57-1.54, p = 0.80) [17]. A limitation of this review was that results from different imaging methods (i.e. polaroid, film and digital) and clinical examination were pooled into one estimate.
A DRS modality which is suited to the health system and its context is a key factor in the success of a programme [18]. A screening programme requires substantial investment in infrastructure and workforce development. LMICs have low capacity to implement a population-based DRS programme (DRSP) with routine call/recall and full DR patient list. Yet there is a high burden of unmet need, with higher levels of uncontrolled DM leading to higher rates of DR progression. Weak health systems require a DRSP where detection of any DR using most effective and efficient instruments would be most useful. In addition, resources are scarce, and so efficient use of both equipment and human resources are essential. The detection of clinic-based PwDM with any DR will enable identification and stratifying risk groups early and screen safely at a lower threshold at non-ophthalmic settings. Therefore, a feasible way of providing accessible services is to offer digital photographic DRS when PwDM present for routine medical care at diabetologist/physicians' clinics. In a low-income setting, identification of a person with any DR/no DR would be a helpful stratification for the providers. In a practical programme guideline, we would suggest performing mydriatic imaging or refer to the next level for those with ungradable images. There is also a lack of understanding among the PwDM about the benefits of mydriasis. Discomfort experienced after pupil dilatation has led to low uptake in dilated examination [19]. Therefore, it is important to understand the best method to detect any DR in non-specialist settings that will be suited to LMICs [18].
The objectives of this review were to evaluate how using or not using pharmacological dilation of the pupil and the number of fields captured influence DTA and how well different ophthalmic and non-ophthalmologist health care professionals perform DR grading compared to seven-field image grading or mydriatic ophthalmoscopy by ophthalmologists in different clinical settings. This will inform decision-making for choosing strategy in those aspects of a DRSP. This is an assessment of accuracy of instruments for a systematic clinic-based screening rather than a population-based screening tool. We plan to propose most efficient modality for provision of DRS to PwDM at non-ophthalmic settings (i.e., medical clinic, endocrinology clinic) using this evidence.

Methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines were followed in reporting (The PRISMA checklist is available as Additional file 1).

Eligibility criteria and study context
We included studies of cross-sectional study designs that aimed to evaluate the accuracy of DRS using digital imaging as the index test, in PwDM at permanent healthcare facilities. We used the Early Treatment Diabetic Retinopathy Study (ETDRS) seven-field image interpretation as the gold standard and mydriatic bio-microscopy/ophthalmoscopy by an ophthalmologist/ retinologist as the clinical reference standard where the gold standard was not performed. The primary context considered for this review was institutional DRS clinics/ programmes using digital imaging. We categorised the context as either primary or secondary/tertiary. We excluded studies conducted in informal health facilities, used automated analysis systems, used non-digital imaging methods in index test, used mobile screening methods or did not report on DTA as an outcome measure.

Primary outcome
The outcome examined was sensitivity and specificity of detection of 'any level of DR'. It is important to understand the optimal method to detect any DR in non-specialist settings, especially in LMICs where PwDM have higher risk of progression, due to poorly controlled risk factors and irregular follow up. ' Any level' of DR was considered appropriate as we felt that such an approach would have collateral benefits like raising awareness among the providers as well as augmenting awareness of PwDM regarding the importance of regular follow-up and control of the risk factors minimising the progression to STDR.

Search and study selection
We developed a comprehensive search strategy to obtain published articles by consulting an information specialist and searched MEDLINE (Ovid), Cochrane Database of Systematic reviews (CDSR) and CENTRAL in the Cochrane Library. The databases were searched from the date of inception of the databases to September 2016, to identify any published reviews on this topic and to see whether relevant trials where included in the CENTRAL database. The search terms and strategy are shown in Table 1 and Fig. 1 respectively. Two reviewers (PN and SK) independently assessed the eligibility of the titles and abstracts, and discrepancies were solved by consulting a third reviewer (GV). Full papers of the eligible articles (n = 122) were obtained from the publishers/authors.

Data collection process
A data extraction form was prepared, and data were extracted and entered into a formatted MS Excel® database. Data from all the full reports of filtered citations (n = 122) were extracted. We used a modified Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for cross-sectional studies to identify the components to extract [20]. The modifications made were based on Cochrane guidelines on conducting systematic reviews of studies of DTA [21]. Two independent reviewers extracted the data (PN and SK) from full reports. In the piloting stage, data were extracted from 10% (12/122) of the articles by two reviewers and consistency was checked (SH). Corrections to the data extraction sheets and databases were done at this stage. The data extracted of all the included articles (n = 26) were checked by the co-reviewer (SK) for consistency.

Data items
The data extracted from each study included country, study design, study setting, sample size and participant characteristics (mean age with standard deviation and range, male to female ratio, number of years with DM). The next section of the extraction included study objectives, sampling strategy, methods of index test (degree of view, number of fields, pupil status and type of camera) and method of reference standard. Finally, data on DTA (sensitivity with 95% CI, specificity with 95% CI, number of true positives, true negatives, false positives and false negatives, kappa value and gradability) were extracted. Studies were categorised according to the status of pupils, number of fields in imaging, type of index test grader and type of reference standard.

Meta-analysis
Meta-analysis of the data was conducted to examine differences in outcome due to pupil status (mydriatic and non-mydriatic), number of retinal fields (one field, two fields, greater than two fields), type of index test grader (ophthalmologist, retinologist, retinal reader, ophthalmic registrars) and by the context (primary and secondary/tertiary). A sub-group meta-analysis was undertaken to determine the DTA of 'any level' of DR by non-ophthalmic personnel. Further sub-analyses were conducted by considering the studies which reported on DTA using the same participant imaged before and after pupil dilatation.

Risk of bias in individual studies
We assessed the variations in bias using the Quality assessment of diagnostic accuracy studies -2nd version (QUADAS-2) framework [22]. The methodological quality and applicability of the studies was considered using signalling questions under the four domains of patient selection, index test, reference standard and flow and timing [22]. We examined the differences in reported DTA estimates based on QUADAS-2 quality assessment guidelines, and given results in the meta-analysis were based on the studies identified to have low risk of bias. The methodological quality of the studies included in the review and meta-analysis are described in Table 2. All included studies were cross sectional in design as these demonstrated less bias in the QUADAS assessment. We considered the signalling questions according to the QUADAS-2 guidelines as examples, masking of the graders, inclusion of range of spectrum to reduce the spectrum bias, all participant undertaking all tests etc. when assessing the bias.   DR positives and negatives reported in classification of findings under different categories of DR. The meta-analysis was conducted using the DTA of any DR, after excluding the ungradable images. Sub-analyses were conducted using the estimates that reported DTA on same participant groups before and after pupil dilatation and by non-ophthalmic index test graders. Heterogeneity was assessed between the studies and between different modalities in the same study. Due to differences in definitions of the ungradable image category, we decided to exclude all ungradable images to minimise heterogeneity. At a practical programme level, all PwDM with ungradable images will be referred to the ophthalmologist's clinic for further assessment. However, in this study, we were interested in the accuracy of the intervention to detect any DR, rather than any referable PwDM in a programme model.

Results
The electronic database search yielded 6646 titles and abstracts, and 122 studies were selected to review full reports. Twenty-six studies were included in the review (Fig. 1). The details of the excluded articles are available as Additional file 2. We included 26 cross-sectional studies, and 88% (23/26) were conducted in HICs . The remaining studies (3/26, 11%) were conducted in South East Asian upper middle-income countries (Thailand (one) [46], China (one) [47] and Taiwan (one) [48]). There were 6 studies (10 estimates) which reported DTA in which the same participant underwent imaging before and after pupil dilatation [25,35,40,42,44,47].

Studies included in secondary output analysis
Four studies were eligible for secondary output of metaanalysis of DTA of DRS as they used different non-ophthalmologist personnel [27,28,46,48]. However, there were no adequate number of studies to meta-analyse by pupil status and field strategy. The details of these studies are described in Additional file 3 (participants' characteristics) and Additional file 4 (DTA).

Risk of bias and applicability concerns within studies
The methodological quality and applicability assessment of the included studies ( Table 2) were according to the QUADAS-version 2 guidelines. In the assessment of bias, it was minimal (15.38% high risk) in conducting the index tests and reference tests. Nineteen percent of the studies showed high risk of bias in selection and 30.7% in participant flow and timing (Fig. 2). In the assessment of applicability, risk was minimal in reference standard (3.8%) and 34% of the studies showed high risk in applicability with regard to patient selection and 50% in index test (Fig. 3).

Risk of bias in the included studies
There was selection bias in some studies: Baeza et al. excluded patients who had visited an ophthalmologist within 6 months of screening and those with hyper-mature cataract [44] and Boucher et al. purposively selected participants who had a greater risk DR [31]. There were also applicability concerns when authors reported the DTA of referable level of DR [38-40, 43, 47]. The study conducted by Hansen et al., which selected people with diabetes through a record review, was weighted towards less severe retinopathy, as mentioned by the authors [25]. Two studies attempted non-mydriatic methods and ended up dilating the pupils due to high proportion of ungradable images [23,32]. In the study by Lopez-Bastida et al., the time interval between the index and reference tests was not stated, nor whether participants with ungradable images (90/773, 10%) underwent mydriasis while performing the index test [45]. Similarly, time and flow was not mentioned in the study by Ku et al. [37]. Two studies selected indigenous populations which lead to generalizability concerns [32,37]. Furthermore, some studies were conducted in eye/retinal clinics where there was a possibility of high prevalence of advanced DR [39,43,48].
Reporting of DR was not uniform. In several studies, DTAs were reported for different levels of DR leading to Fig. 3 Proportion of included studies with applicability concerns some heterogeneity [25,26,31,[38][39][40]43]. In these studies, we considered results for the detection of any level of DR. For example, Phiri et al. had defined DR including the macular signs which other authors had not considered and which would have an impact on the analysis [38].

Diagnostic test accuracy in mydriatic imaging
The highest pooled sensitivity of detection of any level of DR using different mydriatic digital imaging field strategies was for the greater than two field strategy (92%, 95% CI 90-94%). The sensitivity of the one-field strategy was 80% (95% CI 77-82%), and it was 85% (95% CI 84-87%) for the two-field strategy (Fig. 6, Table 3). The mean proportion of ungradable images for the mydriatic method was 6.2% (SE± 2.2, 95% CI 1.7-10.8%). The summary estimation of specificity in Fig. 4 Forest plot of summary estimates of sensitivity of nonmydriatic imaging using different field strategies (1: one field, 2: two fields, 3: greater than two fields) Fig. 5 Forest plot of summary estimates of specificity of non-mydriatic imaging using different field strategies (1: one field, 2: two fields, 3: greater than two fields) 7F ETDRS-early treatment diabetic retinopathy study seven-field strategy detection of any level of DR using mydriatic digital imaging was highest in the greater than two field strategy at 94% (95% CI 93-96%) followed by the one field, 93% (95% CI 92-94%) and then two field 82% (95% CI 81-83%) (Fig. 7, Table 3).
The optimum level of referable DR will depend on the accuracy of the screening strategy chosen and the resources available in the specific screening setting in order to strike a balance between screening PwDM at non-ophthalmic settings safely, but without overloading the eye clinics for further assessments. Annual DRS, followed by timely treatment Fig. 7 Forest plot of summary estimates of specificity of mydriatic imaging using different field strategies (1: one field, 2: two fields, 3: greater than two fields) of those confirmed to have STDR is the recommended screening pathway [51]. The current method of DRS in most LMICs is an opportunistic screening using mydriatic bio-microscopic ophthalmoscopy by an ophthalmologist [18]. This is not an efficient way of screening for DR considering the limitations in human resources and access barriers. In contrast, DRS using digital imaging requires specific training and skills, but these can be obtained by non-medical personnel, and as such the pool of potential workforce is much larger than for trained ophthalmologists.
In this meta-analysis, we aimed to show the effect of pupil status on DTA for any DR. For those images sets with gradable images, the pooled sensitivity of nonmydriatic strategies was the same as that of the mydriatic strategies. However, only six studies (6/21) used the same participants before and after pupil dilatation [25,35,40,42,44,47]. The non-mydriatic method results were primarily dominated by one larger study (sample size n = 1549) conducted in a HIC [40] and another study used wide field (Optomap® 180-200°field view) imaging [26]. Therefore, the outcome of this review should be applied to LMICs cautiously. A similar result was reported in a meta-analysis by Bragge et al. although heterogeneity among those studies was high due to pooling of different examination techniques in one estimation [17]. In the current meta-analysis, heterogeneity was minimised by including studies which used digital retinal imaging only in the index test.
A DRS method which is suited to the health system is a key factor in the success of a programme. Non-mydriatic imaging can be used in settings where there are fewer ophthalmic personnel and avoiding pupil dilatation reduces screening time and causes less perceived inconvenience to PwDM. A concern, however, is variability in image quality, particularly in populations with a high prevalence of cataract and corneal opacities [14,52]. The Scottish National Health Services DRSP now uses non-mydriatic imaging systems, with minimal need for pupil dilatation in screened patients [53]. This is an evidence-based pragmatic approach with greater convenience for PwDM and lower cost to service providers [54,55]. However, implementation of non-mydriatic test in DRS will depend on population characteristics such as the prevalence of cataract. Selection of suitable personnel for DRS and grading depends on workforce capacity and availability. DRS by ophthalmologists is not an efficient way of screening for any setting [55]. DM-related blindness is still on the rise everywhere in the world and is a public health concern in LMIC settings as well [18]. These countries will have to rapidly adopt clinically safe and cost-effective strategies to address this issue, using the limited resources available and establish such a programme quickly [56]. In this analysis, retinal image graders could achieve the recommended level of 80% sensitivity and specificity closer to 95% in both mydriatic and non-mydriatic strategies. Therefore, it is justifiable to train non-ophthalmic personnel in DR grading, just as it was done in the UK national programme. DR screening's success depends on the gradability of images, as such most of the studies included only gradable images. High population coverage with good quality gradable images is an important pragmatic consideration to achieve high DTA and high acceptability of a DRSP. Therefore, interpretation of the results shown in this study requires judgement of the context and objectives of a specific DRSP. PwDM with ungradable images are a special category of people whose fundus is not visible due to some other ocular pathology like dense lenticular opacities. These people therefore not only need the management that test negatives receive in terms of management of diabetic retinopathy but will also need additional management of ocular pathology which is obliterating the fundus image. Therefore, this metaanalysis highlights the concerns as to how to manage data on ungradable images, as studies differ in their approach of dealing with such a concern. Most authors (13 studies) had excluded ungradable images from their analysis while others included them as having screened positive (six studies). In addition, reporting of ungradable by study authors was heterogeneous, which imply requirement of standardised reporting of ungradable images in DRS.
The mean proportions of ungradable images in nonmydriatic and mydriatic imaging were 17.8% (95% CI 10.8-24.8%) and 6.1% (95% CI 3.7-8.4%) respectively. The decisions made by each study authors may have introduced reporting bias in their measures of DTA. Considering ungradable images as test positives may have led to inflated estimates of DTA in some studies [25,26,40,[42][43][44]. The mean proportions of ungradable images included by study authors as test positives in non-mydriatic and mydriatic imaging were 12.5% (95% CI 9.0-16.1%) and 2.5% (95% CI 1.0-3.9%) respectively. Therefore, we adjusted DTA to take account of ungradable images by excluding those to reduce heterogeneity. This was possible for four of the six studies in which ungradable images were included as screening positive [25,26,40,43], but two did not report adequate data to allow for this [42,44]. As an example, we made adjustment (calculated sensitivity 42/49 = 85.7%, specificity 227/262 = 86.6%) for the inflated DTA (reported sensitivity 98%, specificity 100%) in the study of Ahmed et al. using the 2 × 2 table data reported by study authors [29]. In another two studies, it was not clear how ungradable images had been managed [28,38]. The proportions of ungradable images and DTA after adjustments in each strategy are available in Additional file 9.

Limitations
The definition of ungradable images was not uniform in the studies included in the current review We minimised the heterogeneity by excluding the ungradable images and by sub-group analysis.
The studies which used non-mydriatic imaging techniques were more recent, being conducted after rapid advancements in technology for such imaging technology leading to better quality images using non-mydriatic systems without pupil dilatation as well and a major confounder in the meta-analysis.   The results of the different strategies described in this review are to be considered fully if a comprehensive DRSP facilitating greater screening coverage with improved accessibility and good quality imaging is to be set up. However, due to lack of relevant good quality data, sub-analysis by countries' income setting was not possible to perform due to absence of studies from LMICs.
We excluded three articles which were not in English due to practical barriers in translations and assessment of methodological quality.
The DTA of detection of maculopathy had not been considered. The maculopathy is also an important aspect in DRS, and it may have to be considered in a separate review.

Conclusions
Diagnostic test accuracy for the detection of any level of DR showed that DRS using two fields delivered at non-primary care settings is a feasible approach. Dilatation of the pupils did not improve the detection of any level of DR for those with gradable images, but such a wide range of ungradable were presented in these studies that this aspect must be taken into account when setting up DRSP. There was no adequate evidence in primary studies to comment on DTA of non-ophthalmological human resources on DRS, so this aspect requires further research. Good quality digital imaging has the potential for real-time interpretation of retinal images, which together with counselling for risk factors may improve the acceptability of DRS and uptake of referral for ophthalmic assessment if conducted in a culturally acceptable way.

Recommendations
Diagnostic test accuracies of the newer non-mydriatic imaging systems should be further explored in different environments and using a different skill-mix of graders, especially in LMICs.
Studies should focus on the accuracy of non-ophthalmic graders and non-ophthalmic settings to explore the potential of initiating DRSP especially in low-income settings. This will reduce the number of referrals to eye departments, many of which are already over-burdened with cataract and other eye conditions, particularly in LMIC where resources are limited.
The reporting definitions of technical failures or ungradability of the images should be standardised using a reporting guideline.
A systematic review and meta-analysis of DTA of different levels of DR and maculopathy can be recommended in future research.