Diagnostic performance of the different serum markers of fibrosis
The performance of a test can be studied with the Area Under the Receiving Operating Curve (AUROC), which reflects the specificity and sensitivity of the test, that is, its discriminatory power. However, one pitfall is that the liver biopsy, which is considered to be the gold standard, is not a perfect standard as it suffers inherent problems of sampling [2]. Another problem is the dichotomous categorization of liver biopsy, that is, mild versus significant fibrosis in the METAVIR scoring system (F0F1 vs F2F3F4), or if the test is aimed to detect severe fibrosis (F0F1F2 vs F3F4) or cirrhosis (F0F1F2F3 vs F4). However, grouping different stages may not be relevant [2]. These categories are descriptive and do not reflect an arithmetical progression of fibrosis in numerical terms, that is, F4 is not twice the amount of fibrosis as F2,and those categories which happen to have numbers and not names cannot be used as a continuous variable in a statistical analysis. Moreover, the AUROC will depend on the prevalence of each fibrosis stage in the population under study. For example, the AUROC for the detection of significant fibrosis will be higher if the population sampled is mainly composed of F0 and F4 patients, than of F1 and F2 patients [41]. Therefore, comparison of different tests must be performed on the same population, and if this is not the case, comparison of AUROC provided by different studies makes no sense. Use of standardized AUROCs is a better way to assess comparisons [41] but has seldom been used.
A recent overview reported standardized AUROCs for the diagnosis of significant fibrosis (F2F3F4) of 0.84 (95% CI: 0.83–0.86) for FibroTest, without any differences amongst the different causes of liver disease [42]. This meta-analysis also compared FibroTest to the other patented biomarkers, and did not show any significant differences between the AUROCs of these markers for advanced fibrosis. A1c However, because of the limited number of patients included in the direct comparisons a clinically significant difference cannot be excluded, particularly between Hepascore with a smaller AUROC (0.04 difference) versus FibroTest and Fibrometer [42]. To date, in hepatitis C, two independent studies compared three of the five patented tests [30, 43], that is, FibroTest, Fibrometer and Hepascore. There was no difference between these tests, when considering the AUROC for the detection of significant fibrosis (F2F3F4), severe fibrosis (F3F4) or cirrhosis (F4). A1d For other etiologies of liver disease, there are no published comparisons of the diagnostic performance of the different tests.
Another way to compare these tests could be to compare their accuracy, or its opposite, the proportion of misclassi-fied patients, for each stage of fibrosis. Then a Profile Performance Test can be established. Halfon et al. did this for FibroTest and Fibrometer [30]. In this study, with similar accuracy between the two tests (71% and 72% all stages together, p = ns), the FibroTest classified more F1 patients correctly, whereas the Fibrometer classified more F2 and F3 patients correctly [30]. These results have to be confirmed by other studies.
The study of discordant cases is of major importance when interpreting the results of a test. To date, FibroTest is the test for which discordant results have been the best studied. With Fibrotest there are well-described causes of false positives (hemolysis, Gilbert’s syndrome and sepsis) and false negatives (inflammation). In an independent study, discordant results were attributed to FibroTest in 29% of cases (5% of entire population), 21% to liver biopsy, and in the remainder attribution was undetermined. Not surprisingly, the main reason for discordant results were patients with an intermediate stage of fibrosis [31]. However, the performance of liver biopsy in discriminating between two intermediate stages, can also be poor [2]. Although performing an entire liver biopsy is certainly the gold standard, a liver biopsy 15 mm long (the median biopsy length in tertiary centers) is not. It has an AUROC of 0.82 between F1 and F2, That is, there are around 20% false positives or false negatives [2]. Therefore, a test with an AUROC of 0.66 (usually described as a “weak” value when using a true gold standard) between F1 and F2 has a relative AUROC versus the best AUROC possible of 0.66/0.82 = 0.80, which can be considered acceptable for a non-invasive test. The size of liver biopsy is not solely responsible for discordance [44], but few studies have made evaluations with cohorts with optimal size biopsies.
The number of liver biopsies that could be avoided could also represent a target for comparing tests. However, one must be aware that the number of biopsies avoided depends on the specificity and sensitivity required of a test. For example, in the first publication of FibroTest by Imbert-Bismuth, which suggested a requirement of 95% specificity and 100% sensitivity for the diagnosis of significant liver fibrosis, only 46% of biopsies could be avoided [16]. Similarly, Parkes et al. arbitrarily defined the “inaccurate” zone of a marker when one “cannot reliably attribute test results” in comparison to tests with lower sensitivities/ specificities at thresholds with positive predictive values < 90%, and negative predictive values > 95% [45]. Choosing these thresholds could be acceptable if a true gold standard existed. However, if this definition was applied to 15 mm liver biopsies, the biopsy would be inaccurate in 40% of cases for a diagnosis between F1 and F2.
In summary, non-invasive tests for fibrosis are difficult to compare, and independent validations have not identified one as being better than any other. B4 Tests have different profiles, and the overall proportion of discordant results between tests and liver biopsy is in the range of 20%, considering all fibrosis stages together.
Limitations of serum markers
One of the main limitations of serum markers is their availability. The tests on the market are available, but not often reimbursed by health authorities or insurance companies. The reproducibility of some indirect components is not good, for example for platelet count. For FibroTest, as mentioned before, the interpretation should take into account the well-described causes of discordances: false positives because of hemolysis (decrease in haptoglobin and bilirubin), false negatives because of inflammation.
Transient elastography
Principles
Transient elastography measures the liver stiffness, using a device including an ultrasound transducer probe that is mounted on the axis of a vibrator [46]. Vibrations transmitted by the transducer induce an elastic shear wave propagating through the underlying tissues. Pulse-echo ultrasound acquisitions are used to follow the propagation of the wave and to measure its velocity, which is directly related to tissue stiffness: the stiffer the tissue, the faster the shear wave propagates. This procedure is non-invasive, painless, fast and easy to perform. Results are immediately available, are expressed as kilopascals, correspond to median values of ten validated measurements and range between 2,5 to 75kPa [47]. The results of a transient elastography examination can be considered valid only when several conditions are fulfilled: the interquartile range (IQR), reflecting the variability between the different measurements, should not exceed 30% of the median value, and the success rate must reach 60%. An expert in the field should always perform the clinical interpretation of transient elastography [48].
Diagnostic performance of transient elastography
As for serum markers, elastography was first validated in patients with chronic hepatitis C [49]. The diagnostic performance of transient elastography was fairly good, with AUROCs ranging from 0.79 to 0.83 for the diagnosis of clinically significant fibrosis, and 0.95 to 0.97 for cirrhosis [29, 49]. Cutoffs with optimal accuracy were proposed for each METAVIR fibrosis stage, which, however, differ between etiologies of liver diseases so that the diagnosis must be known for the correct interpretation. As for serum markers, a substantial overlap of stiffness values was observed between adjacent stages of fibrosis. However, as already mentioned, the diagnostic performance of liver biopsy when compared to the whole liver, is also far from satisfactory. Transient elastography has also been validated in diseases other than chronic hepatitis C: chronic hepatitis B [50, 51], HIV-HCV coinfection [52, 53], NASH [54], alcoholic liver disease [55], cholestatic diseases [56] and liver transplant patients (57, 58].
In a recent meta-analysis, the sensitivity and specificity for the diagnosis of cirrhosis with transient elastography was very good (87% and 91% respectively) [59]. A1c However, there was significant heterogeneity that could be explained by a “cutoff effect”. Indeed, optimal cutoffs for cirrhosis remain variable, ranging from 10,3kPa for hepatitis B to 17,3kPa for cholestatic diseases with no consensus. Similar to serum markers, the AUROCs for transient elastography depend on the prevalence of cirrhosis in the different populations [41].
Transient elastography was compared to FibroTest in one study: the diagnostic performance measured by AUROCs was similar [29]. Looking at the discordant results between transient elastography and FibroTest, more false negatives were observed with elastography than with FibroTest [60]. However, FibroTest more frequently overestimated the fibrosis found in the liver biopsy. In another study, elastography had the best diagnostic performance for early detection of cirrhosis in patients with chronic hepatitis C [61].
Limitations of transient elastography
Liver stiffness measurements can be difficult in overweight patients or in those with narrow intercostal space, and impossible in patients with ascites [48]. Recent studies suggest that liver stiffness is influenced by ALT flares, with a risk of overestimation of fibrosis. Reproducibility of liver stiffness is excellent with good inter- and intra-observer agreement [62], but this is less good in patients with low degree of fibrosis, or steatosis or with an increased BMI.
Is there a way to optimize the performance of the tests?
The combination of different serum markers together with transient elastography result in much better estimation of the degree of fibrosis. Sebastiani proposed stepwise algorithms in patients with compensated chronic hepatitis C, by performing sequentially APRI, FibroTest and in the remaining patients liver biopsy. With this approach, liver biopsy could be avoided in 71% of cases with an accuracy of 93% [63]. The same approach was also suggested for chronic hepatitis B [35]. Castera proposed the combination of FibroTest and transient elastography, and significantly improved the performance compared to each non-invasive test alone, with an accuracy of 84% when combining the tests [29]. This is probably a very promising approach. The most important point, when analysing results of non-invasive tests for the screening of liver fibrosis, is to consider their results according to the clinical setting, and to establish if there are clinical situations that may lead to discordant results. If there are such situations then repeat the tests after a few months in order to see if the discordant result remains or not.
Can we expect more from non-invasive tests?