Bayesian factor analysis in primary care: a systematic review protocol

Background: Bayesian factor analysis allows for ecient use of preliminary data and information that corresponds to the increasing needs of questionnaire construct validation in primary care research. This systematic review will summarise evidence on the current use of Bayesian factor analysis in primary care. Methods: A comprehensive search strategy will be adopted to identify relevant literature (research studies in primary care) indexed in Medline, Scopus, EMBASE, CINAHL and Cochrane Library. The search strategy includes terms and synonyms for Bayesian statistics, factor analysis and primary care. Forward and backward searches will be conducted manually on the references of articles that meet the inclusion and exclusion criteria to further identify eligible studies. Multiple reviewers will conduct data extraction independently. The analyses will include descriptive synthesis summarizing features about the use and reporting of the respective Bayesian factor analysis approach in primary care. Discussion: This systematic review will provide a comprehensive overview of the current use of Bayesian factor analysis in primary care and provide recommendations for its proper future use in primary care and beyond. of the study, data types, key information about the data (e.g. sample sizes, number of questions in a questionnaire, or number of domains or factors), the type of Bayesian method used, and different estimation procedures and software routines (e.g. analytical solutions vs. sampling-based solutions). The results will be compared, summarized and reported using content analysis based on the research themes in CIHR guidebook to examine the use of Bayesian factor analysis (16).


Background
In the past few decades, there has been a proliferation of primary care research studies in many countries and with publications in the eld increasing by about 75% and major primary care research databases approximately tripling the amount of information stored between 2004 and 2013 in just the UK (1,2).
Despite this noticeable growth of available health information in various health domains, ensuring su cient quality, i.e., validity and reliability, of relevant data remains a major challenge (3,4). In the context of primary care practice and research, e cient validation of questionnaire instruments is critical in order to respond timely to emerging needs for effective clinical assessments or for addressing urgent research questions and, more general, a learning healthcare system that builds upon evidence-based medicine.
Factor analysis examines the strength of correlation of each individual questionnaire item with respect to the latent domain or construct (i.e. the "factor") and is widely used for questionnaire construct validation in education, psychological, social, and healthcare research. (5). The empirical validation of measurement properties of an instrument through factor analysis typically involves a relatively large sample of completed questionnaires that can be prohibitively expensive and time consuming to obtain. The feasibility of these necessary pilot inquiries therefore strongly depends on practicalities surrounding recruitment. The suggested minimum sample sizes for conventional con rmatory factor analysis in the literature range between approximately 100 and 300 responses (6). An additional limiting element is that individuals who participate in pilot validation studies can typically not be involved in subsequent phases of the questionnaire instrument validation. This is critical as primary care research often applies in community settings and may be aiming at addressing the needs of populations that are comparatively low in numbers and di cult to identify and recruit.
Bayesian methods offer promising solutions to this impasse as the incorporation of prior information can increase e ciency compared to conventional methods, arriving at posterior estimates with meaningful precision (6)(7)(8)(9). Bayesian factor analysis for instrument development enables inclusion of knowledge and opinions of health professionals and patients, potentially increasing practical value and applicability of the instrument.
Nevertheless, when screening the literature, there is an underutilization of Bayesian factor analysis in primary care and the application and reporting of Bayesian factor analysis seem to vary largely. Up until now, no systematic review has been undertaken to quantify and qualify the use of Bayesian factor analysis in primary care. Therefore, the aim is to provide the rst comprehensive systematic review on this matter.
The objectives of this systematic review are: (1) to identify and consolidate the existing Bayesian factor analysis approaches in primary care practice and research; (2) to assess the quality of the implementation and reporting of Bayesian factor analysis in primary care practice and research settings; and (3) to conduct a synthesis of used approaches for prior elicitation and Bayesian inference, including different estimation procedures and software routines.

Inclusion Criteria
To be included in the systematic review, articles must report on Bayesian methods and factor analysis in the context of primary care practice or research. In application of statistics, "Bayesian methods" refer to any inferential method that employs prior distributions in conjunction with observed information (i.e. the likelihood) to arrive at an estimate for the parameter of interest.
For the purpose of this literature review, the de nition of primary care follows that of the American Academy of Family Physicians as being "comprehensive", " rst contact' and "continuing" meanwhile it covers "any undiagnosed sign, symptom, or health concern" (10). Family medicine is characterized as "an academic and scienti c discipline" and "a clinical specialty" "orientated to primary care" (11). Studies cover the aspects of primary care as in family medicine, epidemiology, health services and policy research. A high sensitivity search strategy is used to identify all potentially relevant records to family medicine and primary care (12,13).
Types of studies review articles, conference abstracts and thesis or dissertation documents will be included. Research studies using similar model structures such as structure equation model and latent variable model, as well as item response theory, factor loading and item domain correlation with Bayesian methods in primary care context will be included. The review of review articles will serve for identifying literature covered within these reviews. Some conferences publish full text papers i.e. conference proceedings alongside with abstracts. Only conference abstracts with respective full text access will be considered in the systematic review. The full text paper of the included studies must be available. There is no language limitation for the articles to be included.

Exclusion Criteria
Editorials, commentaries, book reviews, hypotheses, critical appraisals, re ections, surveys, case reports or studies using other frequentist statistical methods (such as hypothesis tests, con dence intervals, and p-values) or studies that do not employ Bayesian methods will be excluded. Studies that include some of the key words but use them under different connotations or references will be excluded. Examples of ineligible use of words include "primary studies", "prior to", "human epidermal growth factor" and "genetic factor".
Bayesian methods used in other types of analyses, such as Bayes rule, Bayes or Bayesian factor studies, variational Bayes, Bayesian Information Criterion/Criteria, Bayesian random effects models, Bayesian/Bayes network, belief network and Bayes(ian) model or probabilistic directed acyclic graphical model will be excluded. Instead of factor analysis, studies using hierarchical Poisson models with latent variables, Gaussian process and risk factor analysis will be excluded. Studies not in family medicine or primary care but use related words will be excluded. Examples include: "a family of methods" and "exponential family".
Search methods for identi cation of studies A comprehensive search strategy will be adopted to identify potential studies, indexed in Medline and translated to other databases. The search strategy includes the terms and synonyms for Bayesian, factor analysis and primary care as shown in Table 1. The search strategy is developed with a specialized librarian and will be conducted by at least two reviewers independently. Searches of electronic databases with hand searches of reference lists will be combined. The computer-based searches will combine medical subject headings (MeSH) terms, free text and full text searches related to Bayesian factor analysis in primary care and family medicine. Studies using Bayesian factor analysis reporting on the (i) data characteristics and (ii) use of Bayesian factor analysis from inception to date will be systematically searched for.
Databases and time frame articles published before January 1 st , 2020 will be considered.

Searching other resources
Google Scholar will be manually scanned for the rst 100 records for supplementary information. Reference lists and the future citation of the retrieved articles will be manually searched with two additional rounds. The PRISMA-P chart is used for constructing the systematic review (14).

Data collection and analysis
Selection of studies Titles and abstracts of studies will be sequentially screened using the search strategy by at least two independent reviewers using the software Rayyan applying the inclusion and exclusion criteria (15). If no information is given in the title or abstract about any of the three criteria, i.e. no indication about whether the study is Bayesian, using factor analysis or in primary care, those studies will be included at the initial stage of screening. When not sure, especially the term "factor analysis" is mentioned but not speci ed whether it is Bayesian or not Bayesian, the article will be kept for the next round of full-text review. The full text of articles that meet the inclusion criteria will be retrieved and examined independently by seven reviewers. Any disagreement between the reviewers about the eligibility of speci c studies will be discussed and additional reviewer will be involved if necessary, until consensus is reached. For studies with multiple publication records, the most comprehensive or up-to-date record will be used.

Data extraction and management
Data extraction will be independently conducted by at least two reviewers. All records will be coded and categorized under the prede ned themes in the codebook from the Canadian Institute of Health Research (CIHR) grants and rewards guide (16). Despite existent guidelines and recommendations on reporting of general Bayesian methods, con rmatory factor analysis and questionnaire development, no single comprehensive recommendation was found on the reporting of Bayesian con rmatory factor analysis (17)(18)(19). Where applicable, the following data will be extracted: the types of journal, the publication dates, the geographical locations, the sample sizes, the number of items or questions used for the Bayesian factor analysis, the number of factors, domains or constructs, the reported item-domain correlations, regression parameters, factor loadings, or parameters of structural equation models, the use of prior information and assumed prior distributions, and their primary care settings. A standardized predesigned data collection form will be used for data extraction. The assessment criteria below will be followed: 1. Did the authors use either a Bayesian con rmatory factor analysis or Bayesian exploratory factor analysis or Bayesian latent variable model or Bayesian structure equation model?
2. If they used one of those, what was the parameter of interest they were aiming to estimate using a Bayesian approach: item-to-domain correlation, factor loading or latent model regression parameter?
In other words, for which parameter did they impose a prior distribution?
3. How did they inform their prior distribution of the respective parameter? What was the prevalence of studies that employed non-informative priors?
4. If they mention the term "factor loading", did they explain it and if, how did they interpret it i.e. as item-to-domain correlation or as model parameter (latent variable coe cient)?
6. Were credible intervals or con dence intervals reported for factor loading, item-to-domain correlation, model parameter, or regression coe cients? 7. What software or libraries were used? Were software codes or original data available? (reproducibility)

Assessment of quality of implementation and reporting of Bayesian methods
Risk of bias in individual studies is not applicable to and will not be assessed in the current study since the goal is to summarize the use of Bayesian methods, i.e. there is no single effect parameter that is of primary interest. The data collected across studies will indicate the presence or absence of each of the seven criteria for assessing appropriate design, conduct and reporting of Bayesian factor analysis. The quality of implementation and reporting of Bayesian methods for each eligible study will be assessed and rated on a scale of very low, low, moderate and high on the following aspects: reporting about methodology, Bayesian model, estimated parameters, factor loading, informed prior, and basic information about the publication. The assessment of quality will be presented in tables in the nal publication of the systematic review. No available critical tools yet exist to appraise the use of Bayesian factor analysis, however, the proposed quality appraisal (i.e. a methodological 'peer review') by the authors will help to identify prevalent issues and initiate discussions of better reporting standards. A recommended procedure or tool will be developed for the use and reporting of Bayesian factor analysis.

Strategy for data descriptions and synthesis
A descriptive-analytical synthesis of the ndings from the included studies with graphs and tables will be provided detailing the use of Bayesian factor analysis based on a common analytical framework on authors, years of publication, estimates, the number of publications as changing over time, geographical locations, the study populations, the aims of the study, data types, key information about the data (e.g. sample sizes, number of questions in a questionnaire, or number of domains or factors), the type of Bayesian method used, and different estimation procedures and software routines (e.g. analytical solutions vs. sampling-based solutions). The results will be compared, summarized and reported using content analysis based on the research themes in CIHR guidebook to examine the use of Bayesian factor analysis (16).

Statistical Synthesis
Descriptive analyses will be conducted to summarize relative frequencies of Bayesian factor analysis approaches being used as well as general and speci c reporting issues.

Anticipated Results
Description of studies The current use of Bayesian factor analysis will be summarized through descriptive statistics, for example, frequency distributions displaying the prevalence of the seven pre-de ned assessment criteria across studies and content analysis. A subjective quality appraisal developed in this review will be useful in initiating discussions of better reporting standards based on the review.

Discussion
This systematic review will provide a detailed summary of how Bayesian factor analysis methods are applied in primary care practice and research settings. It will enable identi cation of shortcomings in application and reporting of Bayesian factor analysis studies within the context of primary care and will help to improve practice through discussing and re ning current reporting standards. No one single agreed de nition of the research domain of primary care and family medicine yet exists, which might affect the search results. Another weakness is the lack of a standard appraisal instrument for assessing the appropriateness of design, conduct and reporting of Bayesian factor analysis. However, the quality appraisal conducted by the authors will be helpful in identifying major gaps and potentially informing the future development of such an appraisal tool.

Declarations
Ethics approval and consent to participate