Skip to main content

Table 3 Summary of reliability and validity measures used in included studies

From: Can automated content analysis be used to assess and improve the use of evidence in mental health policy? A systematic review

Author Study design Reliability and validity measures Bias
Baek, Cappella and Bindman* [29] Formal evaluation of Wordscores. Good description of methods. Testing of two automated content analytic methods to assess validity in comparison to manual coding. Specified method of study as ‘testing’. Completed reliability and validity tests. Used Krippendorff’s alpha (.61 using 7% of reference tests and > .70 using 50% of reference tests), concurrent validity and comparative predictive validity tests. Only two methods tested for reliability including the method of affective intonation created by authors of this article. This may potentially mean favourable results are expected for this method. Low risk of bias.
Baumann, Debus and Müller [41] Used Wordscores without a formal evaluation. Some study design issues. Lack of background literature for Wordscores. Descriptive analysis without references to statistical significance. Results tabulated. None reported. No limitations to study design or using Wordscores mentioned. Unclear risk of bias.
Bernauer and Bräuninger* [38] Formal evaluation of Wordscores. Good description of the study design. Descriptive statistics, results tabulated. Used expert scoring for reliability comparison with Wordscores. Compared results of validity tests. Established strong face validity for Wordscores. Good discussion of strengths and limitations of study design. Low risk of bias.
Budge and Pennings* [37] Formal evaluation of Wordscores. Focus is a comparative evaluation of methods. Descriptive statistics, results tabulated. Used the Comparative Manifesto Project and expert scoring for reliability comparison with Wordscores results. Emphasis on validity and reliability testing. Established some reliability issues for analysing policy positions using Wordscores. Article is part of a debate series. Starts with premise that computerised content analysis does not work and continues to build a case against the Wordscores. High risk of bias.
Coffé and Da Roit [39] Used Wordscores without a formal evaluation. Good description of the study design. Descriptive statistics, results tabulated. Used the Comparative Manifesto Project for reliability comparison with Wordscores results. No limitations to study design mentioned. Unclear bias.
Costa, Gilmore, Peeters, McKee and Stuckler [28] Used Wordscores without a formal evaluation as its aim. Study design described in detail and advised as quantitative automated content analysis. Descriptive statistics, results tabulated. Used various reference texts to test the reliability of Wordscores. No expert comparisons or validity tests performed. Accounted for potential issues with reliability and validity for Wordscores. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias.
Debus [35] Editorial. Good description of different methods of content analysis, including Wordscores. Good overview of strengths and benefits of Wordscores compared to other content analysis methods. No data sets analysed. No result tables. None reported. Editorial for a special issue dedicated to content analysis methods including automated content analysis and therefore potential bias in favour of automated content analysis methods. High risk of bias.
Hug and Schulz [40] Used Wordscores without a formal evaluation as its aim. Good description of methods. Several analyses conducted with different reference texts to test method. Descriptive statistics, results tabulated. Measures to statistically significant levels. Completed reliability and validity tests. Used expert data and the Comparative Manifesto Project for reliability comparison with Wordscores, accounted for limitations in all methods. Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias.
Klemmensen, Hobolt and Hansen* [30] Formal evaluation of Wordscores. Good description of methods. Several analyses conducted with different reference texts to test method. Descriptive statistics, results tabulated. Measures to statistically significant levels. Good quality analysis. Completed reliability and validity tests. Used expert data, the Comparative Manifesto Project for reliability comparison with Wordscores and another automated method (Spearman’s rho used for correspondence analysis). Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias.
Laver, Benoit and Garry* [25] Formal evaluation of Wordscores. Stated as study design cross-validation of different methods to validate policy estimates. Used both English and non-English speaking texts for cross-validation. Good description of methods. Completed reliability and validity tests using expert data, the Comparative Manifesto Project and Wordscores for comparison. Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. These are authors of the Wordscores method. Low risk of bias.
Lowe* [31] Formal evaluation of Wordscores. Good overview of the mechanics of how Wordscores works. Focuses on processes which Wordscores algorithm uses for score estimation and its reliability. Core focus is on reliability testing of Wordscores. Starts with a hypothesis that there are issues with Wordscores and continues to build a case against the method. High risk of bias.
Volkens* [36] Formal evaluation of Wordscores. Tabulated overviews of the strengths and the weaknesses of three methods evaluated. Used expert data, the Comparative Manifesto Project for reliability comparison with Wordscores and another automated method (CACA). Good overview of method reliability and validity in comparison. Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias.