Following the searches, 903 citations were identified. The results of the search were reported here in a flow diagram (Fig. 1), adapted from the PRISMA-ScR [44] structure.
There were 25 studies [13, 45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68] that did not meet the full inclusion criteria but contained important information with regard to our topic. We placed these in an exclusion table (Additional file 4: Table 4 and 5). Of these 25 studies, 20 [13, 45,46,47, 49,50,51,52, 54, 55, 57,58,59,60,61,62,63,64,65,66] included late-preterm and term infants but did not delineate them as a specific group as it relates to their diagnosis of NE and their CP outcomes and when using GMA as a predictive tool. Additional file 4: Table 4 presents the summary of the characteristics of the excluded studies. There was a wide variety in their key characteristics. We summarized these characteristics here. These studies had a wide date range from 1997 to 2021. They were mainly prospective studies [13, 45, 46, 48,49,50, 52,53,54, 57, 59, 60, 62,63,64,65] (16 of the 25), and the majority used clinical assessments only to identify infants at high risk [46, 49, 50, 52,53,54, 60, 61, 63,64,65,66,67,68] (14 of the 25). With regard to the GMA tools used, of the 25 studies, 23 used the Prechtl GMA and the other two used the Hadders-Algra method [48, 65]. In the study by Dekkers et al. [48], all the children with an abnormal score had either CP or another severe developmental delay.
For the age at which CP was diagnosed, 16 of the excluded studies used the same criteria as we did in this study, that is, CP diagnosis by at least 2–3 years [13, 46, 48, 49, 53, 54, 56,57,58,59,60,61, 63, 64, 66]. For the method of CP diagnosis, a variety of standardized assessments were used, with the most frequent being by Amiel-Tison and Grenier [69] (5 of the 25) [13, 47, 54, 60, 63] and Touwen Infant Neurological Examination (TINE) [70] (5 of the 25) [46, 47, 49, 54, 64] with other assessments [71,72,73,74,75,76,77,78,79,80,81] detailed in Additional file 4: Table 4.
Eight studies either used non-standardized methods or did not clearly state their method [45, 50, 51, 56, 57, 59, 62, 65].
Additional file 4: Table 5 presents the summary of the key findings of these excluded studies and reasons for their exclusion. These findings showed that in high-risk infants, including those with NE, GMA is a strong predictor of CP [45], especially when used in the fidgety period [46, 53, 55, 57, 61, 63, 65, 66]. In 1997, Prechtl et al. [13] demonstrated that movement quality was important. Abnormal quality and absent fidgety movements, in a mixed group of preterm and term infants, predicted neurological abnormalities with a sensitivity of 96%. The majority of these were diagnosed as CP. We see in our results that over time, this result has been repeatedly duplicated showing that CS [49] and absent fidgety [58] GM are highly predictive of CP. The trajectory of the GMA is more important as a predictor of CP [46].
The GMA is more sensitive than the traditional neurological examination [47, 49, 52], and the sensitivity increases with the combined use of other modalities such as electroencephalogram (EEG) [52], neuroimaging [62], HINE [27], and neuroimaging [58].
For these excluded studies, sensitivity values were as high as 100% [45, 46, 49, 54, 57] and specificity similar close to or at 100% [45, 53, 58, 61, 66]. We contacted the authors, Solemani et al. [61] and Goyen et al. [53], of the studies closest to our inclusion criteria. Solemani et al. [61] delineated their populations by NE and by GA, but their outcome was reported as “neurodevelopmental outcomes” and not CP. They reported to us that they did not specifically report CP and so could not be included for us. Similarly, Goyen et al. [53] reported their outcomes not specifically divided for preterm versus term as their aim was to describe the NICU experience. We were unable to include this study in our final count. Both studies, however, reported on the high predictive validity of the Prechtl GMA at 3 months as it relates to neurodevelopmental outcomes at 2–3 years, and the Goyen et al. [53] study specifically for CP at that age. Nine of the excluded studies [13, 52, 55, 58,59,60,61,62, 66] reported on positive predictive value (PPV) and negative predictive value (NPV) with some studies reporting PPV as high as 98% when used in combination with HINE and neuroimaging [58] or 75% with combined with EEG and ERP [66]. Negative predictive value was reported close to 98.31% [66] or at 100% [54]. Themes for the limitations identified by the authors can be summarized as limited external validity due to small population size [48,49,50, 56, 58, 63, 64, 66] selection bias related to recruitment from high-risk populations [13, 51, 58], and practice variation between sites [45, 54, 57]. The most common reasons for the exclusion of these studies were failure to delineate their participants for the diagnosis of NE, most quoting their participants as high-risk infants, or not delineating their GA into the groups relevant to our questions (late-preterm and term) [45, 49,50,51,52,53,54, 57, 58, 60, 62,63,64,65,66].
Only three articles, therefore, Ferrari et al. [82] Glass et al. [83], and Prechtl et al. [84], were identified as meeting the selection criteria and were included in the final review. The results of the search were reported here in a flow diagram (Fig. 1). The final studies included one prospective cohort study from the USA [83], and the other two were case series [82, 84] from Italy. The total number of participants was only 118 term neonates (58, 34, and 26 participants); none included late-preterm neonates. Neonatal encephalopathy was reported as a single group by Glass et al. [83] and Ferrari et al. [82] but divided into mild-moderate and severe by Prechtl et al. [84]. The high-risk groups in the cohort study [83] used a combination of clinical diagnosis, EEG, and MRI where possible to identify the NE population while for both case studies, NE was identified by history only. The GMA used by all three studies was Prechtl. Additional file 4: Table 6 presents the characteristics of these three studies. The prospective cohort study was published in 2021 [83] and reflected data collected within the previous 6 years, which would be reflective of the current management practices, especially as they did report on 68% of their population having received therapeutic hypothermia. Both of the case studies were published more than 5 years ago with data collected in excess of 15 years ago during which time the standard of care for NE is likely to have been different from the current practices. All three studies reported on sensitivity, specificity, PPV, and NPV.
A variety of standardized tools were used for CP diagnosis between the three studies. Additional file 4: Table 7 details the key findings and the outcomes evaluated, with the limitations identified by the authors. Glass et al. [83] reported on the absence of FM for the prediction of CP. Their findings were higher for the specificity 96–98% than sensitivity 29–50% for any CP and for moderate to severe CP, respectively. Notably, their NPV was 90% for any CP and 98% for moderate to severe CP, indicating that the presence of FM at 3 months is a strong indicator of an infant at low risk for a later diagnosis of CP, especially in the moderate to severe category. Although not part of the specific aims of our study, in the study by Glass et al. [83], it was significant that when they combined the Prechtl GMA and MRI findings, the NPV increased to 100%. In the study by Ferrari et al. [82] they reported that the presence of any CS movements between term and 4 to 5 months post-term had a sensitivity of 100% and a specificity of 68.7%, with a PPV 100% and a NPV of 78.3% for predicting CP. In the oldest study by Prechtl et al. [84], the predictive ability in terms of the timing of the GMA was determined, that is, if done early, in the first 2 weeks of life versus late assessments between 15 and 22 weeks of life. Their findings were: sensitivity 100% and specificity 46.2%, with PPV 65.0% and NPV 100% for the early assessments, compared to late assessments with 84.6% across the board for sensitivity, specificity, PPV, and NPV. Neither of the case studies included infants receiving therapeutic hypothermia for NE which was not yet the standard of care. Ferrari et al. [82] identified selection bias as a limitation, where mild HIE as a contributor to NE may have been underrepresented due to these infants not being referred for evaluation. Prechtl et al. [84] did not state their limitations.
Risk of bias
Even though this was a scoping review and did not require the critical appraisal of the three included articles, the critical appraisal tool for JBI [85, 86] helped to assess the quality of the articles and identify the differences and similarities between these two case studies. These main points are summarized here, and details are presented in Additional file 5: Table 8.
The quality of evidence derived from a review is largely dependent on the quality of the studies included.
This observational prospective cohort study by Glass et al. scored 100% in 10 of 11 questions [83]. This therefore assesses this study to be of high quality. The single question for which the study did not score 100.0% was that of the strategies used to address incomplete follow-up. Incomplete follow-up may result in selection bias. According to the JBI method, it is important that all the outcomes are assessed and participants with unequal follow-up periods must be taken into account in the analysis. For this study, patients with incomplete follow-up were not analyzed. If the analysis was not statistically feasible, this was not stated in the study.
Neither of the case studies scored 100% on all ten questions. The two case studies scored 100% for six of the ten questions on the checklist. These questions assess the two included case studies as being moderate-quality case series as there were limitations. They had good scores for using valid methods for the identification of the condition for all participants, having clear reporting of the demographics of the participants in the study, as well as, having clear clinical information of the participants. The outcomes of the cases were clearly reported for both studies. They also had clear reporting of the presenting site demographic information and used appropriate statistical analysis.
According to the JBI method, for the study participants, the authors should provide clear exclusion criteria. These inclusion and exclusion criteria should be specified with sufficient details and all the necessary information critical to the study. While Ferrari et al. [82] did fulfill these criteria, of note, Prechtl et al. [84] did not state their exclusion criteria, so this may limit the generalizability of the results. For good-quality case series, the study should clearly describe the method of measurement of the condition. This should be done in a standard (i.e., same way for all patients) and reliable (i.e., repeatable and reproducible results) way. The clinical condition for our study is NE. Both case studies listed a number of criteria for possible inclusion for NE but did not state the number or combination of these criteria required for the diagnosis and so scored 0.0% for this question. They did use a standard, albeit different, method for NE severity, with Ferrari et al. [82] used the Sarnat staging [7] while Prechtl et al. [84] used the Levene method [87]. With regard to the consecutive inclusion, studies that indicate a consecutive inclusion are more reliable than those that do not. Neither of our included studies stated clearly if they did consecutive inclusion of every neonate meeting the inclusion criteria, at their institutions, during the identified periods. Thus, they both scored 0.0% for this. Along a similar vein, the completeness of a case series contributes to its reliability. Studies that indicate a complete inclusion are more reliable than those that do not. Neither Ferrari et al. [82] nor Prechtl et al. [84] clearly stated that they included all the patients in their studies and scored 0.0% for this question.
The biases include selection, information, and sampling variation. Selection bias is typical of case series as it is a choice of a series of patients with a particular illness (NE), and a suspected linked outcome (CP) [88]. Selection bias limits the generalizability of the results. Information bias is less in retrospectively collected data as it is determined by what is already documented in the medical chart. These three studies were prospectively collected data making them susceptible to information bias. With regard to sampling variation, the precise determination of the rate of a disease, other than by chance, requires a large sample size. All of the included studies can be described as employing small sample sizes, and Glass et al. [83] had the highest number of participants at 58, while Ferrari et al. [82] had 34 cases and Prechtl et al. [84] had 26 cases with a follow-up period of over 3 to 4 years. Sample size may have been limited by the collection method as no study stated if they were inclusive of every neonate meeting the inclusion criteria, at their institutions, during the identified periods.