Purpose: The most frequently used model for simulating multireader multicase (MRMC) data that emulates confidence-of-disease ratings from diagnostic imaging studies has been the Roe and Metz (RM) model, proposed by Roe and Metz in 1997 and later generalized by Hillis (2012), Abbey et al. (2013), and Gallas and Hillis (2014). A problem with these models is that it has been difficult to set model parameters such that the simulated data are similar to MRMC data encountered in practice. To remedy this situation, Hillis (2018) mapped parameters from the RM model to Obuchowski–Rockette (OR) model parameters that describe the distribution of the empirical AUC outcomes computed from the RM model simulated data. We continue that work by providing the reverse mapping, i.e., by deriving an algorithm that expresses RM parameters as functions of the OR empirical AUC distribution parameters. Approach: We solve for the corresponding RM parameters in terms of the OR parameters using numerical methods. Results: An algorithm is developed that results in, at most, one solution of RM parameter values that correspond to inputted OR parameter values. The algorithm can be implemented using an R software function. Examples are provided that illustrate the use of the algorithm. A simulation study validates the algorithm. Conclusions: The resulting algorithm makes it possible to easily determine RM model parameter values such that simulated data emulate a specific real-data study. Thus, MRMC analysis methods can be empirically tested using simulated data similar to that encountered in practice. |
1.IntroductionFor the typical diagnostic radiology study, several readers (typically radiologists) assign confidence-of-disease ratings to each case (i.e., subject) based on one or more corresponding radiologic images. The resulting data are called multireader multicase (MRMC) data. These studies are typically used to compare different imaging modalities with respect to reader performance. Often measures of reader performance are functions of the estimated receiver-operating-characteristic (ROC) curve, such as the area under the ROC curve (AUC). The Obuchowski and Rockette method (OR)1 is a commonly used method of analyzing reader performance outcomes which results in conclusions that generalize to both the reader and case populations. The most frequently used model for simulating MRMC data that emulate confidence-of-disease ratings from such studies has been the model first proposed by Roe and Metz2 and later generalized by Hillis,3 Abbey,4 and Gallas and Hillis.5 We will refer to each of these models as the “Roe and Metz” (RM) model when there is no need to distinguish between them. Numerous studies have used this model for evaluating MRMC analysis and sample size methods. As discussed by Hillis,6 the RM model generates continuous confidence-of-disease ratings based on an underlying binormal model for each reader–test combination, with the separation between the normal and abnormal rating distributions varying across readers. Because RM model parameters are expressed in terms of the latent rating data distribution, in contrast to MRMC analysis results that are almost always expressed in terms of parameters that describe the distribution of the reader performance outcomes, it has been difficult to set RM model parameter values such that the simulated data exhibit characteristics that are similar to MRMC data encountered in practice. To remedy this situation, Gallas and Hillis5 mapped the RM model parameters to variance and covariance parameters that describe the distribution of the empirical AUC outcomes computed from RM simulated data. Similarly, Hillis6 mapped the RM model parameters to OR parameters that describe the distribution of empirical AUC outcomes computed from RM simulated data. This paper continues that work by developing a numerical algorithm that expresses the RM parameters as functions of the empirical AUC distribution OR parameters. This result makes it easy to determine RM model parameter values such that the simulated data emulate a specific real-data study. The primary uses for the proposed algorithm are testing MRMC analysis methods and computing power estimates, using simulated MRMC data that match real data sets with respect to the empirical AUC distribution OR parameter estimates. An outline of this paper is as follows. In Sec. 2, we discuss the original Roe and Metz model, the Hillis3 generalization of it, and the OR model and analysis method. In Sec. 3, we discuss the numerical OR-to-RM algorithm that maps OR parameters to RM parameters, which is derived in Appendix A for the Hillis3 generalization of the original RM model. In Sec. 4, we illustrate using the OR-to-RM algorithm and the previously derived RM-to-OR algorithm to simulate data emulating a real-data study, along with other examples and remarks concerning the use of the two algorithms. The paper concludes in Secs. 5 and 6. 2.Previous Methods2.1.Roe and Metz Models: Original and Constrained Unequal-Variance2.1.1.Original RM modelLet denote a confidence-of-disease rating assigned by a reader to a case; is often called a decision variable (DV). The original RM simulation model proposed by Roe and Metz2 is a mixed four-factor (test, reader, case, and truth) ANOVA model for with case nested within truth; test, reader, and truth crossed; test and truth treated as fixed factors; and reader and case treated as random factors. Note that we use “test” as a general term that can refer to a diagnostic test, imaging modality, or a treatment. Throughout this paper, we only consider the situation of comparing two tests. Using the RM notation, the model is given as where denotes the confidence-of-disease rating for test , reader , case of truth state , and , with “−” indicating a nondiseased case and “+” indicating a diseased case. Here, is the effect of truth state , is the interaction effect of test and truth state , is the interaction effect of reader and truth state , is the effect of case nested within truth state , the multiple symbols in parentheses denote interactions, and is the error term. Thus, denotes the confidence-of-disease rating assigned to case of truth state by reader when reading under test . All effects are random except for and . The random effects are mutually independent and normally distributed with zero means. Roe and Metz denote the corresponding variance components by , , , , , , and . They note that and cannot be estimated separately for this model with no replications, as re-reading images in radiological studies is uncommon due to the cost, and hence define Although not mentioned by Roe and Metz, the omission of test, reader, and test-by-reader effects that do not depend on truth is justified by the invariance of the ROC curve to location shifts; thus inclusion of these terms would not change the ROC curve for a given reader. Note that interactions with truth are denoted only by a subscript in Eq. (1).Roe and Metz constrain the sum of the error variance and variance components involving case to be equal to one: It follows from this constraint6 that the fixed-reader nondiseased and diseased DV distributions have unit variances (and hence their ROC curves are symmetric about the negative 45 deg diagonal), with the fixed-reader AUCs varying across the reader population.Without loss of generality, Roe and Metz impose the constraints which result in the same DV distributions for both tests 1 and 2. Under this constraint, it can be shown6 that the mean and median separation of the nondiseased and diseased DV distributions across the reader population is given by and the median reader-specific AUC is given by , where is the cumulative distribution function of the standard normal distribution.2.1.2.Unequal test DV distributionsAlthough Roe and Metz only consider simulations for equal test DV distributions for each reader, the model can be easily modified to allow for test DV distributions that differ in their median AUC values by not setting to zero, that is, only the constraints are imposed. It follows that the median AUCs for tests 1 and 2 are equal to , , respectively, where are the mean and median separations of the nondiseased and diseased DV distributions for tests 1 and 2, respectively, across the reader population. From constraints Eq. (4), it follows that for test 1 and for test 2. To insure that , we assumeNote that the RM model that allows for test-dependent AUCs is completely defined by seven parameters: Note that can be computed using Eqs. (2) and (7).2.1.3.Constrained unequal-variance RM model (RMH model)In practice, estimated binormal-model nondiseased and diseased distribution variances for a reader-test combination are often different, with diseased subjects typically having more variable test results. Thus to better emulate real data, Hillis3 modified the original RM model by allowing variance components involving cases to depend on truth, with variance components involving diseased cases set equal to those involving nondiseased cases multiplied by the factor , . Specifically, the model is given by Eq. (1) with variance components (using an obvious notation) denoted by , , , , , , , , , and , with , , , . Similar to Eq. (2), the constraint is imposed. It follows that Constraint Eq. (6) is also imposed. We will refer to this model as the constrained unequal-variance RM model or simply as the RMH model, with the “H” in RMH indicating that it is the generalization of the original RM model proposed by Hillis.3Similar to the original RM model,2 imposing constraint Eq. (3) results in the null model with , and imposing constraint Eq. (4) results in the nonnull model with where again denotes the median AUC across the reader population for test , is defined by Eq. (5), and is the mean and median DV separation for test across readers.The algorithm discussed in this paper will be for the RMH model, which includes the original RM model2 as a special case when is set equal to 1. Note that the RMH model that allows for test-dependent AUCs is completely defined by the eight linearly independent parameters , and . We let denote the vector of these parameters: 2.2.Obuchowski–Rockette ModelObuchowski and Rockette1 proposed a test × reader factorial ANOVA model for the AUC estimates, but unlike a conventional ANOVA model, the errors are assumed to be correlated to account for correlation due to each reader evaluating the same cases. Their model, which we refer to as the OR model, is given as where is the intercept term, denotes the fixed effect of test , denotes the random effect of reader , denotes the random test × reader interaction, and is the error term. The and are assumed to be mutually independent and normally distributed with zero means and respective variances and . (OR in the subscripts is to distinguish OR effects and variance components from similarly notated RMH-model quantities.) The are assumed to be normally distributed with mean zero and variance and are assumed uncorrelated with the and . Three possible error covariances are assumed: The OR model assumes7These error variance–covariance parameters are typically estimated by averaging corresponding conditional-on-readers estimates computed using the jackknife,8–10 bootstrap,10,11 the method proposed by DeLong et al.12 (for empirical AUC estimates), or the method proposed by Metz et al.13 based on the semiparametric binormal ROC model. These four estimation methods are consistent but are not unbiased. An unbiased error covariance estimation method (unbiased method) was recently proposed by Hillis6,14 for use when empirical AUC is the outcome. This method utilizes the unbiased fixed-reader method discussed by Gallas [Ref. 15, p 362] for estimating the error variance, and extensions of it for estimating the error covariances. This method results in unbiased OR parameter estimates when data are generated from the RMH model.6 OR analysis using this method is included in the freely available R software package MRMCaov.16 The can be interpreted as AUC measurement error attributable to the random selection of cases and within-reader variability that describes how a fixed reader interprets the same image in different ways on different occasions. The OR model can alternatively be described with population correlations replacing corresponding . Defining the OR model for two tests, similar to the RMH model, is defined by eight linearly independent parameters: or equivalently, by We let denote the vector of these parameters: Note that when the outcome is the empirical AUC that and are the test 1 and test 2 expected values for the empirical AUC estimates across readers and cases.3.Proposed Methods3.1.OR-to-RMH Algorithm for Estimating RMH Parameter Values When the Goal Is to Emulate a Real-Data MRMC StudyThe RMH-to-OR mapping, previously derived by Hillis,6 and the new OR-to-RMH algorithm that maps OR parameters to RMH parameters and its development are provided in Tables 6 and 10, respectively, in Appendix A. In this section, we discuss the main points of the OR-to-RMH algorithm when the goal is to emulate data from a real study with the RMH model; i.e., to determine RMH parameter values such that the expected values of the OR parameter estimates from the simulated MRMC samples are described by the vector Eq. (15), estimated from a real study. The vector Eq. (15) implicitly provides information about the shape of the underlying ROC curve through the value of , which is a function of the RMH parameter in the RMH-to-OR mapping. The method used for estimating the RMH parameter for the OR-to-RMH algorithm is called the . To estimate a vector Eq. (9) that maps to a particular vector Eq. (15), the algorithm requires use of the option = unspecified, which we assume throughout this section. Two other options for and the situations where they are useful will be discussed in Sec. 3.2. 3.1.1.Overview of OR-to-RMH algorithmTable 6 in Appendix A gives the previously derived analytical RMH-to-OR mapping formulas.6 Mathematically, we describe this transformation by the function that maps the RMH parameter vector and the case samples sizes that will be used for the simulations to the resulting OR parameter vector: This function is analytical and thus does not require a numerical algorithm.The OR-to-RMH algorithm requires inputted values for , and , where is given by Eq. (15) and and are the corresponding real-study nondiseased and diseased case sizes. To derive the OR-to-RMH algorithm, we first assume that there exists an RMH parameter vector corresponding to such that Eq. (16) is true. We then express the OR parameters in terms of the RMH parameters and solve for the RMH parameters using numerical methods (see Appendix A for details.) It is possible that there are several vectors satisfying Eq. (16), in which case the corresponding vectors will differ only in their values, as discussed in Appendix A. It is also possible that there is no vector that satisfies Eq. (16). To force the OR-to-RMH algorithm to produce, at most, only one output, the vector with closest to 1 with is chosen; if no corresponding vector has , then the corresponding vector with closest to 1 with is chosen. If there are no corresponding solution vectors with , the algorithm does not return a solution for ; see Sec. 3.1.3 for what to do when this happens. Let denote the function defined by the OR-to-RMH algorithm, with = unspecified, that maps to a solution for , denoted by ; i.e., Ideally, will be such that the RMH-to-OR mapping will return the original OR parameter, i.e.,However, it is possible for the OR-to-RMH algorithm to return a solution such that Eq. (18) holds only approximately, i.e., The approximation results because of constraints on the RMH parameters that are imposed by the algorithm, as discussed in Appendix A and given in Eq. (23) in Table 7. For example, if the inputted value of exceeds that of then the solution will be such that in .Rationale for the b limitsThe lower and upper limits for of 0.01 and 4 are chosen because values outside these limits are not realistic for most real data sets. In most situations, a meaningful DV should be an increasing transformation of the likelihood ratio (likelihood of being diseased divided by likelihood of not being diseased).17 A DV having this property and its corresponding ROC curve are said to be proper; otherwise they are said to be improper [Ref. 18, pp. 19, 37]. A proper ROC curve is concave (down) and never crosses the chance line.17 It follows that an ROC curve that has “hooks” and crosses the chance line is improper. Pan and Metz19 note that hooks for fitted binormal ROC curves do not appear when fitting curves to reliable data sets, which strongly suggests that the true underlying ROC curves do not show such hooks for real-data studies. Thus, we have limited the underlying ROC curves to have values between 0.01 and 4.0 since for typical AUC values () it can be shown that ROC curves with values outside of these boundaries have noticeable hooks. For example, Fig. 1 shows ROC curves with AUCs of 0.8, 0.9, 0.95 for values of [Fig. 1(a)] and [Fig. 1(b)]. We see that the ROC curves for the extreme cases of [Fig. 1(a)] and [Fig. 1(b)] are noticeably improper because they have hooks in the upper right and lower left corner, respectively, with the ROC curves below the chance line in those regions. Although not shown, the improperness becomes more noticeable as decreases below 0.01 or increases above 4.0, or as the AUC decreases below 0.8. The ROC curves were computed using the equation , with and TPF and FPF denoting the true positive fraction (sensitivity) and false positive fraction (1 − specificity), respectively. (The expression for results from the conventional binormal ROC relationships and ). Simulation of data to emulate a real-data studyFigure 2 summarizes how the OR-to-RMH and RMH-to-OR algorithms can be used to simulate data that emulate a real-data study. The OR-to-RMH algorithm (with = unspecified) is applied to OR estimates () obtained from a real-data study, resulting in the corresponding RMH model. This model is then used for generating MRMC samples for any specified number of readers and cases, with and denoting the case numbers for the simulations and and denoting the case numbers for the original real-data study. The distribution of the empirical AUCs for the simulated data is described by . We recommend always checking how closely the simulated data emulate the study data by comparing and when the simulation model generates samples with the same case sizes as the original study, i.e., with and . 3.1.2.Should the simulated ROC curves resemble the original study ROC curves?We emphasize that even when simulating data using an RMH model such that in Fig. 3, we do not claim that the resulting empirical ROC curves will be visually similar to those estimated from a real-data study. Rather, we only claim that the expected values of the OR parameter estimates for the simulated data will be the same as those computed from the original real-data study, given by Eq. (13). (Note that Eq. (13) contains the error covariances rather than the error correlations.) However, because of the robustness of the binormal model assumption for fitting ROC curves to real data,20–22 we typically expect there will be some resemblance, although the degree of resemblance will be limited by the RMH model having only eight parameters. In particular, we note that the RMH model requires each reader’s ROC curve to have the same value, which will determine the shape of the ROC curve for a given reader AUC value; this result follows from the one-to-one correspondence between and , with , as mentioned in Sec. 3.1.1. 3.1.3.Reasons for neither an exact nor approximate solutionOR-to-RMH algorithm does not work because there is not a solution forFor given values of the RMH parameters , and (computed in steps 1 to 3 of the OR-to-RMH algorithm in Table 10), the value of (computed in step 4) determines the value of It can happen that the algorithm does not produce a solution for , either because no solution exists, or the solution is or that will yield the input value for for the values of , and that have been computed by the algorithm in previous steps. When this occurs, one can choose to use one of the other two methods for estimating , as discussed in Sec. 3.2. OR-to-RMH algorithm does not work because there is not a solution for an RMH parameter other thanWhen required, the algorithm imposes the constraints in Table 7(b) by altering somewhat the inputted OR parameter values, which can lead to an approximate solution as given by Eq. (19). However, when other constraints, which are implied by the RMH-to-OR mapping in Table 6, do not hold, the result is a missing value for the particular RMH parameter and for all other RMH parameters requiring it for their computation. For example, from the equations in Tables 8 and 9, it can be shown that there is an upper limit for , which is a function of the values of the inputted values for and . Similarly, it can be shown that there are upper limits for , and , which are functions of parameters computed in previous steps. When one of these values exceeds its upper limit, the algorithm does not yield a solution. This problem is more likely to happen when inputted values for are conjectured than when they are estimates from a real-data study. If this problem occurs, we first recommend that the inputted values be checked for entry errors. If there are none, then we suggest inputting a different (typically smaller) value for the OR parameter corresponding to the RMH parameter, which cannot be estimated. See Appendix A and Table 5 for more details and Sec. 4.3.7 for examples illustrating this problem. 3.2.OR-to-RMH Algorithm for Estimating RMH Parameter Values When the Goal Is to Emulate AUCs, OR Correlations and Variance Components, But NotAs discussed by Hillis,6,23 the OR parameters , and have meaningful interpretations that do not depend on sample size, and , , and have meaningful interpretations that remain approximately (but not exactly) constant as the sample sizes change. On the other hand, varies with the sample sizes. In this section, we discuss two approaches for determining RMH parameters that result in simulated MRMC data for which the empirical AUC distribution matches conjectured values of the parameters in Note that is the same as but without . The value of for the simulated data will be determined by the sample sizes and the RMH parameters.These approaches are useful when one is primarily interested in simulating data that match an OR correlation and variance component structure and a real-data value of is not available. They also are useful when real-data estimates for are available but there is no solution for using the OR-to-RMH algorithm with = unspecified. 3.2.1.OverviewThe two approaches are similar to that described in Sec. 3.1, except that estimation of does not depend on an inputted value for . Instead, is either (1) explicitly specified using = specified and setting the value of the input variable equal to the desired value for ; or (2) computed so as to result in a median specified mean-to-sigma ratio across readers, using = mean_to_sigma and setting the value of the input variable mean_sig_input equal to the desired mean-to-sigma ratio. Use of the OR-to-RMH and RMH-to-OR algorithms to simulate data using these two approaches is summarized in Fig. 3. Figure 3 is similar to Fig. 2 with these differences: (1) No input value for is included because the input values are for instead of for . (2) For the OR-to-RMH algorithm, the or function (as defined below) is used in the place of the function. Note that the outputted OR parameter values include a value for . Approach 1: b_method = specifiedWith this approach, the value of is specified. For example, the parameter values for the original2 RM model can be determined by setting . Let denote the function defined by the OR-to-RMH algorithm, with = specified, that maps and an inputted value of to a solution for , denoted by ; i.e., Again, ideally will be such that . However, similar to using = unspecified, it is possible for the OR-to-RMH algorithm to return a solution such that because of constraints on the RMH parameters Eq. (23) in Table 7 that are imposed by the algorithm.Approach 2: = mean_to_sigmaRecall from Sec. 3.1.3 that when = unspecified is used, the value of (based on the computed values of the RMH parameters and ) is determined such that for the simulated data will match the inputted value for . In contrast, when = mean_to_sigma is used, the user specifies a desired median mean-to-sigma value (see discussion of the mean-to-sigma measure below) across readers for the test corresponding to the minimum of the inputted and values. Let denote the function defined by the OR-to-RMH algorithm with = mean_to_sigma that maps and an inputted value of the mean-to-sigma ratio, denoted by , to a solution for : As was the case for the other two estimation methods, ideally, , but it is possible for this relationship to hold only approximately because of constraints on the RMH parameters.3.2.2.Mean-to-sigma ratioThe mean-to-sigma ratio, denoted by , is defined as the difference of the latent diseased and nondiseased DV means divided by the difference of their standard deviations. The mean-to-sigma ratio was first introduced by Swets,24 who noticed that it seemed to be approximately constant for a variety of experiments. Some support for this conclusion was provided by later analyses.22,25,26 For example, Green and Swets26 note that is typical for many studies. As discussed by Hillis and Berbaum,27 can be used as a measure of improperness for a binormal ROC curve; specifically, it indicates that the ROC curve crosses the chance line at , where fpf is the false positive fraction. They point out that it follows that an absolute value indicates a noticeably improper binormal curve and an absolute value of infinity indicates a symmetric curve (). For the RMH model, the mean-to-sigma ratio varies across readers. To avoid simulating data based on visibly improper binormal curves, we suggest that the probability of a reader’s true ROC curve being noticeably improper be small for each test, e.g., . This probability can be computed as a function of the RMH parameters, as discussed in Appendix B.1. 4.Results and Examples4.1.R language FunctionsTwo functions written in the R statistical software language that perform the OR-to-RMH and RMH-to-OR mappings are available within the freely available MRMCaov R package,16 which can be downloaded from the Github repository: https://github.com/brian-j-smith/MRMCaov. The function OR_to_RMH transforms OR parameters to RMH parameters using the numerical algorithm described in Table 10, and the function RMH_to_OR performs the analytical RMH-to-OR transformation, described in Table 6. 4.2.Example: Using the Algorithms to Simulate Data Emulating a Real-Data Study4.2.1.ApproachIn this section, we illustrate the use of the algorithms to simulate data that emulate data provided by Carolyn Van Dyke (VanDyke),28 which we have used for examples in previous papers,29,30 with empirical AUC being the reader performance metric. The study compared the relative performance of single spin-echo magnetic resonance imaging (SE MRI) to cinematic presentation of MRI (CINE MRI) for the detection of thoracic aortic dissection. There were patients without a dissection and patients with an aortic dissection imaged with both SE MRI and CINE MRI; cases were evaluated by five readers using a five-point ordinal confidence-of-disease scale. Similarly, each RMH simulated sample emulated five readers, each evaluating the same 69 nondiseased and 45 diseased cases. We apply the OR-to-RMH algorithm to the set of parameter estimates (“original” values) obtained from an OR analysis of the data set to obtain corresponding RMH parameters values, simulate 10,000 MRMC samples based on the RMH values and analyze each simulated sample using an OR analysis, using the unbiased error covariance method, with the outcome being the empirical AUC. We set = unspecified for the OR-to-RMH algorithm. Figure 4 shows the computation of the RMH simulation model and the “true values,” which we define as the OR parameter values that describe the true distribution of the empirical AUCs computed from the simulated samples; i.e., the true values are the same as the outputted OR parameter values, given by . We see that for this data set the outputted values are the same as the inputted values, and hence the original OR estimates exactly describe the true distribution of the simulated empirical AUC estimates. The R code and output for the OR-to-RMH and RMH-to-OR functions used to produce the results in Fig. 4 are provided in Appendix C.1. 4.2.2.Simulation study resultsTable 1 presents the simulation study results. “Unbiased estimates” are the empirical estimates (the means across the simulated sample estimates) for the first eight parameters (, , ), where OR estimates for each sample were computed using the OR method with the unbiased covariance estimation method discussed in Sec. 2.2. Because the sample estimates for the sample-level correlations , and are not unbiased, instead of reporting the empirical estimates we report the quotients resulting from dividing the corresponding empirical covariance estimates by the empirical error variance estimate. For example, the estimate of 0.434 for is computed by dividing the estimate (0.000343) by the estimate (0.000791). Because the resulting estimates are not the means of the sample-level correlations, empirical bias estimates and 95% confidence intervals for the correlations are not included. Table 1Simulation study estimates of OR parameters.
Notes: There were 10,000 simulated samples based on the Fig. 4 RMH model with 5 readers and n0=69, n1=45. “True values” are the βOR;output values from Fig. 4 and the corresponding error variance and covariances. For the first eight parameters, “unbiased-method estimates” and ” DeLong estimates” are the empirical estimates (i.e., means across the 10,000 samples) corresponding to using unbiased and DeLong error covariance estimation methods with the OR method. The correlation estimates are the quotients from dividing the corresponding covariance empirical covariance estimates by the empirical error variance estimate. “Within 95% CI?” is “yes” if the 95% confidence interval includes the true value and is “no” otherwise. For the DeLong estimates, results for μ1:OR and μ2:OR are omitted since they are exactly the same as for the unbiased estimates. “(Est - true)/true” is defined as (estimate – true value)/(true value); it describes the deviation of the estimate from the true value and is expressed as a percentage of the true value. For the first eight parameters (i.e., not the correlations), these values can also be interpreted as the empirical estimates of statistical bias expressed as a percentage of the true value. “Within 95% CI?” is “yes” if the empirical 95% confidence interval (not shown) includes the true value, and otherwise is “no.” We see that the unbiased estimates for the first eight parameters differ by from the true values and that the correlation estimates differ by . Moreover, all of the 95% empirical confidence intervals include the true value. Thus, the unbiased estimates agree with the true parameter values and hence provide validation for the OR-to-RMH algorithm. Plots of the empirical ROC curve for the VanDyke original data and for the first three simulated MRMC samples, based on the RMH model given in Fig. 4, are displayed in Fig. 5. Like the VanDyke study, each simulated sample has five independent readers reading the same set of 69 nondiseased and 45 diseased cases. Although the plots look somewhat different because the VanDyke plots are based on at most five distinct ratings, whereas the simulated-data plots are based on a continuous rating scale, in general the simulated-data ROC curves show a definite resemblance to the VanDyke ROC curves, although this is only our subjective assessment. 4.3.Other Remarks and Examples4.3.1.DeLong error covariance estimationFor comparison, we also include in Table 1 results using the DeLong et al.12 (DeLong) error covariance estimation method. Results for and are omitted since they depend only on the AUC estimation method and hence remain the same. We see from the confidence intervals that DeLong estimates for , , and are positively biased and the estimate is negatively biased. Similar results were obtained by Hillis.6 Although the DeLong method is biased, the estimates are relatively close to the true values, suggesting that results using the DeLong or another resampling error-covariance method, such as the jackknife or bootstrap, will typically be similar to those obtained using the unbiased method. This point is illustrated by the example in the next section. 4.3.2.Example of computing powerSuppose our goal is to estimate the power for detecting a difference in test AUCs for a study such as the VanDyke study, assuming that the reader-averaged empirical AUC estimates (0.897 and 0.941) are the true population values. This can be done by simulating similar data (as we did for Table 1) and then estimating power by the proportion of samples where the null hypothesis is rejected. The power estimates from doing this, based on the simulated samples used for Table 1, are 0.106 for the unbiased method and 0.107 using the DeLong method, illustrating how the choice of error covariance method makes almost no difference in our power estimates. 4.3.3.Ordinal rating scaleA limitation of the OR-to-RMH algorithm is that it applies only to continuous simulated ratings. For example, in Sec. 4, the simulation data emulated a continuous rating for which the empirical AUC distribution could be described by the original OR parameter values, but the VanDyke data set that yielded the original OR estimates consisted of ratings on a five-point ordinal scale. Although ordinal data can be simulated based on the RMH model by binning the simulated continuous data, the mapping from the RMH model to the corresponding OR parameters when the data are binned has not yet been developed, and hence neither has the corresponding OR-to-RM algorithm been developed. We conducted a simulation study to investigate how close the original OR parameter values might describe the distribution of the empirical AUC for ordinal ratings resulting from binning the continuous ratings generated by the RMH model given in Fig. 4. The simulation study was performed similar to Table 1 study, except that five-category ordinal ratings were created by binning simulated continuous ratings. The binning thresholds corresponded to the empirical cumulative probabilities for ratings 1,…,5 for the VanDyke nondiseased cases, pooled across readers. Results are presented in Table 2. As expected, the two AUC () estimates are less than for the continuous values, but only by a maximum of 1.44%. We also see that the correlations are similar to those for the continuous ratings (maximum deviation is ), with the relative values of the three even more similar: , as was the case for the continuous ratings, and is 0.12 lower than the other two, compared to being 0.13 lower for the continuous ratings. The maximum change in the error variance and covariance estimates was 8.07% and there were 6.7% and changes in and , respectively, which are in the same “ballpark” as for the continuous ratings. Table 2Simulation results when continuous ratings are binned into a five-point ordinal scale.
Notes: See notes for Table 1. OR parameter estimates are based on five-category ordinal ratings resulting from binning the continuous simulated ratings using the thresholds −0.2085494, 1.0270435, 1.7437654, and 2.3781446; these thresholds correspond to the empirical cumulative probabilities 0.4174, 0.8478, 0.9594, and 0.9913 for ratings 1 to 5 that were computed from the VanDyke data. We conclude that compared with the continuous data, the empirical AUC distribution for the binned data has a similar correlation structure, similar AUC estimates and somewhat similar values for the error variance, error covariances, and . Thus, this example shows that the simulated ordinal data approximately emulate the VanDyke data set. Moreover, one could adjust the RMH parameters to result in a closer emulation using an iterative approach, where each iteration consists of adjustment of original OR values based on results from the previous-iteration simulation study, computation of corresponding RMH values, and a corresponding simulation study. For example, a first iteration might begin by upward adjustment of the and values. 4.3.4.Changing the numbers of readers and casesIn our examples, thus far we have set the numbers of readers, diseased cases, and nondiseased cases to be the same as those of the VanDyke data set. However, often a researcher will want to investigate the performance of a reader-performance metric for a range of these numbers. ReadersFor a given set of RMH parameter values, changing the number of readers has no effect on the corresponding OR parameters , , , , , , , , , , and , as shown by the omission of the reader number in the RMH-to-OR algorithm formulas in Table 6 in Appendix A. CasesFor a given set of RMH parameter values, changing the number of cases has no effect on , , , or , as shown by the omission of the case sample sizes in the corresponding formulas in Table 6. In contrast, , and will be affected. Although the correlations are also affected, changes in the correlations will typically be small [Ref. 6, p 2078]. For example, Table 3(b) shows when the case sizes are doubled (, ) that is reduced by 50%, the correlations are virtually unchanged (maximum of 0.6%), and there is no change in , , or . Table 3c shows when the case sample sizes are switched (, ) that is reduced by 19% and there is a small increase in the correlations (maximum increase of 2.3%), with all other values remaining unchanged. These results are computed using the RMH-to-OR formulas in Table 6, thus eliminating the need for simulations. Table 3Effect of different case sizes and RMH δ1 and δ2 values on OR parameters.
Notes: Part (a) shows the set of OR parameter “true values” (βOR;solution) from Fig. 4 that correspond to simulations using the RMH model parameters (βRMH;solution) in Fig. 4 when n0=69, and n1=45. In addition, the median mean-to-sigma ratios q1 and q2 corresponding to the test 1 and 2 latent RMH rating distributions are included, as well as Pr1, defined as the probability that a reader’s true ROC curve is noticeably improper for test 1. Parts (b)–(e) show the corresponding values when the indicated changes are made to the case sizes for the simulated samples or to the RMH model values. The OR values are computed by applying the RMH-to-OR algorithm to the RMH model from Fig. 4 with the changes in the left column incorporated. Values in parentheses are the percentage change in the OR parameters from the original values. See Appendix C.2 for the R code that produced these results. 4.3.5.Null and power simulationsThe example in Sec. 4.3.2 showed how power could be easily computed for simulated data that emulate a particular study, assuming the effect size () is equal to the observed effect size. Other effect sizes can be investigated by adjusting and in the RMH parameter set accordingly, using the relationship (from Table 6): which implies where is the cumulative standard normal distribution function.In addition, often the researcher wants to empirically compute the type I error for testing versus . This can be done by creating a null RMH model by setting , with the empirical type I error rate given by the proportion of simulated samples where is rejected. For example, in Table 3(d) we alter the RMH model given in Fig. 4 by setting , with the value of determined such that the corresponding values are both equal to , the mean of the two original OR AUC values, 0.897 and 0.941, in Fig. 4. It follows from Eq. (22), with , that , using the values for , and , given in Fig. 4. In Table 3(e), we similarly determine for a null RMH model the value of that correspond to . In both Table 3(d) and 3(e), we see that all of the original OR parameter values are changed, as well as the mean-to-sigma ratios, with Table 3(e) showing much more change. For this reason, we suggest that if the researcher wants to simulate data with error correlations and reader and reader-by-test variance components similar to those from an OR analysis of a real-data study, but with much different AUC values, the OR-to-RMH algorithm with = mean_to_sigma should be used to determine the corresponding vector, as discussed in the next section. The R code and output for the OR-to-RMH and RMH-to-OR functions used to produce the results in Table 3 are included in Appendix C.2. 4.3.6.Mean-to-sigma ratios and the specified and mean_to_sigma b_optionsFrom Table 3, parts (a)-(c), we see that the mean-to-sigma ratios are and for the Fig. 4 RMH model latent distributions, as well as for the models when the case sizes are changed. However, in parts (d) and (e), we see that when the values for the RMH parameters and are changed, the mean-to-sigma ratios also change. In Table 3, is the probability that a reader’s true ROC curve is noticeably improper for test 1. (See Appendix B.1 for how to compute .) We see that this probability is relatively small () for the first four models and thus is not of concern. In contrast, = 0.326 for null model 2, and thus we recommend not using this model for a simulation study. (Note: although , the analogous probability for test 2, is not included in Table 3, conclusions based on it were the same.) In Table 4, we see for the specified and mean_to_sigma that the OR parameters corresponding to the resulting RMH models are equal to all of the original OR values except for the error variance and covariances (not shown). Table 4Comparison of RMH parameter values and corresponding true OR values resulting from using the three different b_methods. The RMH parameter values (βRMH;solution) are obtained by applying the OR-to-RMH algorithm to the “original” OR parameter values (βOR;input) in Fig. 4. The true OR values (βOR;output) result from applying the RMH-to-OR algorithm to the RMH parameter values. See Appendix C.4 for the corresponding R code and the complete sets of RMH and OR values.
The R code for generating Table 4 is included in Appendix C.3. 4.3.7.TroubleshootingTable 5 provides examples where the OR-to-RMH algorithm fails to produce a solution. In each example, the OR-to-RMH algorithm is applied to the original parameter estimate values from the VanDyke study, given in Fig. 4, but with one value altered to result in the algorithm not working. For example, in part (a) is changed from 0.00154 (original value) to 0.154 and the algorithm fails. Using Table 11 in Appendix A, we can identify which input value is causing the problem by checking for the first parameter in the sequence that is missing (NA), where are the alternative RM parameters discussed in Appendix A. Noting that the first parameter with a missing value is , the rules in Table 11 suggest reducing the value of Similarly, in part (b), is increased and is the first parameter with a missing value; here, Table 11 suggests reducing the value of In part (c), is increased and is the first parameter with a missing value; here, Table 11 suggests either changing (reducing or increasing) the value of or using = specified or = mean_to_sigma. The R code for generating the results in Table 5 is provided in Appendix C.4. The values for the parameters are by default not printed unless the option all = T is included in the print function, as illustrated in Appendix C.4. Also note in Appendix C.4 that the OR_to_RM function suggests the remedy, based on Table 11, when the algorithm fails to produce a solution. Table 5Troubleshooting examples. For each example, one of the original parameter estimate values from the VanDyke study, as given by βOR;orig in Fig. 4, is replaced by a value that causes the OR-to-RMH algorithm to fail. These examples show how the value responsible for the algorithm failure can be identified from the alternative parameters x1,…,x7 and b values using the rules given Table 11. All examples use b_method = unspecified. See Appendix C.4 for the R code that produced these results. Note that to print the x1,…,x7 variables the option all = T must be included in the print function (see Appendix C.4 for examples).
4.3.8.Using the algorithm with Gallas parameter estimatesFor a real-data MRMC study analyzed by the Gallas method,15,31 a method has been developed to convert the U-statistic parameters of empirical AUC and variance estimates to RM model parameters.32 Alternatively, it has been shown by Hillis14 that the Gallas MRMC method produces the same empirical AUC single test and difference-of-two-tests variance estimates as the OR method, if the constraints given by Eq. (12) are not imposed on the OR estimates. As a result, OR parameter estimates can be computed from the Gallas parameter estimates using formulas provided in Hillis.14 Hence, RMH model parameters that correspond to real data studies can be derived using the OR-to-RMH algorithm applied to the transformed Gallas parameter estimates. 5.DiscussionA previous problem with the original RM model and later generalized versions of it was that the RM model parameters were expressed only in terms of the latent binormal rating distributions, as opposed to the more familiar reader performance measure distributions. Thus, it has been difficult to set RM model parameters such that the simulated data were similar to MRMC data encountered in practice. Assuming the constrained unequal-variance RM model,3 which we have referred to as the RMH model in this paper, Hillis6 recently remedied this problem by deriving formulas for computing the OR parameter values that describe the distribution of empirical AUC outcomes computed from RMH simulated data. However, that paper did not provide a reverse OR-to-RMH mapping. This paper overcomes that limitation by deriving a numerical OR-to-RMH algorithm that computes RMH parameter values from a specified set of OR parameter values and by providing an R function to implement the algorithm. The OR-to-RMH algorithm and its corresponding R function make it easy to calibrate the RMH model to produce simulated data that emulate specific real data sets with respect to the distribution of the empirical AUC estimates. The original RM model paper2 presented several simulation structures that were supposed to represent ROC analyses of representative real data sets, which was useful because then researchers could assess the performance of MRMC analysis methods using a commonly accepted set of RM simulation structures. However, there was a mistake in some of the computations of the RM parameters and the model was limited to equal-variance binormal ROC curves, which are not common.6 The present approach has several limitations that we hope to remedy in future research. It is limited to generating continuous rating data that emulate a set of inputted OR parameter values describing the distribution of the empirical AUC estimates. Although the simulated continuous rating data can be binned, the distribution of the empirical AUC estimates for the binned data will not as closely emulate the inputted OR parameter values. We suggested a method to adjust the parameter values to better fit ordinal discrete ratings through an iterative simulation approach, but this process is time consuming and we hope to develop RMH-to-OR and OR-to-RMH algorithms, similar to the ones in this paper, that are primarily designed for simulation of rating data with a few ordinal values (e.g., 1, 2, 3, 4, or 5). The present approach is also limited to the empirical AUC as the reader performance measure. We hope to develop an approach that allows for a semiparametric outcome, such as the binormal AUC. Finally, our algorithm is based on the RMH model,3 which assumes that the latent distributions are the same for both tests. Thus, another area for future research is to relax this assumption and develop algorithms for a more general RM model, such as the unconstrained unequal variance model,6 the generalized RM model,5 or some other generalization of the original RM model. 6.ConclusionsThe main contributions of this paper are the OR-to-RMH algorithm and the corresponding R software OR_to_RMH function; these contributions make it easy to calibrate RMH model parameters to match real-data OR parameter estimates, thus making it easy to simulate rating data that emulate real data sets for testing MRMC analysis methods or for performing power analysis. These contributions will allow researchers to develop sets of RMH simulation structures that are representative of a wide spectrum of MRMC studies, which can then be used to validate MRMC analysis methods. We expect these new RMH simulation structures will replace the original RM model structures, which were not linked to specific real-world data sets and were limited to equal-variance ROC curves, making the representativeness of the structures difficult to evaluate. 7.Appendix A: Algorithm Details for Mapping OR Model Parameters to RMH model ParametersIn this section, we derive the mapping from OR model parameters to RMH model parameters. For the mapping, we assume the RMH model because it has the same number of parameters as the OR model. The mapping from a more general RM model, which includes the RMH model as a special case, to the OR model was derived by Hillis.6 Modifying this more RM general model by constraining the error variance and variance components involving diseased cases to be equal to those involving nondiseased cases multiplied by , results in the RMH model. Table 6 presents the resulting analytical RMH-to-OR mapping. To facilitate the derivation of the reverse (OR-to-RM) mapping, an alternative parameterization for the RMH model is presented in Table 7. Table 7(a) expresses the alternative RMH parameters in terms of the RMH parameters, Table 7(b) presents the constraints on these parameters, and Table 7(c) expresses the RMH model parameters in terms of the alternative RMH parameters. Table 8 expresses the OR parameters in terms of the alternative RMH parameters and Table 9 expresses the alternative RMH parameters in terms of the OR parameters. Table 6RMH-to-OR mapping: OR parameters expressed in terms of the RMH model parameters for the empirical AUC.
Notes: The numbers of nondiseased and diseased cases are denoted by n0 and n1;FBVN(.,.;ρ) denotes the standardized bivariate normal distribution function with correlation ρ; μi:OR=μOR+τi:OR; δi=μ++τi+; V=1+b−2+2(σR2+σTR2); c1=1n0n1; c2=n1−1n0n1; c3=n0−1n0n1; c4=1−n0−n1n0n1. This table is reprinted, adapted, and revised with permission from Hillis [Ref. 6, Table 3]; notation is the same except that (σfixed(−)2+σfixed(+)2) has been replaced by 1+b−2,b>0, which results in the RMH model. Table 7Alternative parameterization for RMH model parameterization.
Table 8RMH-to-OR mapping: OR parameters for the empirical AUC expressed in terms of the alternative parameterization of the RMH model given in Table 7.
Notes: This table results from replacing the RMH model parameters in Table 6 by the alternative RMH model parameters, as defined in Table 7(a). FBVN(.,.;ρ) is the standardized bivariate normal distribution function with correlation ρ; V=1+b−2+2(σR2+σTR2); c1=1n0n1; c2=n1−1n0n1; c3=n0−1n0n1; c4=1−n0−n1n0n1. Table 9Alternative RMH parameters expressed in terms of OR parameters.
Notes: These results follow from the Table 8 relationships and constraints Eq. (23) in Table 7. FBVN(.,.;ρ) is the standardized bivariate normal distribution function with correlation ρ; c1=1n0n1; c2=n1−1n0n1; c3=n0−1n0n1; c4=1−n0−n1n0n1. Table 10OR-to-RMH algorithm for computing parameter values for the RMH model that correspond to specified OR parameter values.
Notes: θ^1 and θ^2 denote specified values of the reader-averaged performance empirical AUCs for tests 1 and 2, respectively; σ^R:OR2,σ^TR:OR2, and σ^ε:OR2 denote specified values of the corresponding OR parameters, and r^1,r^2, and r^3 denote specified values for the OR correlations defined by ri=Covi:OR/σε:OR2. These specified values can be computed from real data or conjectured. FBVN(.,.;ρ) is the standardized bivariate normal distribution function with correlation ρ. Note that constraints Eq. (23) in Table 7 have been incorporated into the preceding steps. Table 11Troubleshooting the OR-to-RMH algorithm when missing parameter values result.
The proposed algorithm is presented in Table 10. Steps 1 to 6 replace the OR parameters in Table 8 by specified values and then solve for the corresponding alternative RMH parameter values. Note that these steps incorporate the alternative parameter constraints given in Table 7(b). Using Table 7(c) mappings, step 7 computes the final RMH parameter estimates as functions of the estimated alternative RMH parameter values. From Table 9, it follows that for each of the alternative parameters other than , there can be only one solution. It then follows from Table 8(a) that there can be only one solution for the RMH parameters other than . Hence, if there is more than one solution, they differ only in their values. Sometimes there is not an exact or approximate solution and the OR-to-RMH algorithm returns missing values. When this happens, changing the values of the inputted OR parameters or changing the option will generally result in a solution, as discussed in Sec. 3.1.3. The algorithm solves for the alternative RMH parameters in the following order: , and . Because the parameters may require estimates of preceding but not subsequent parameters, all parameters following a parameter with no solution are assigned a missing value by the algorithm. Table 11 describes the appropriate correction action that will produce a solution for the OR-to-RMH algorithm, according to which is the first RMH parameter to not have a solution. 8.Appendix B: Mean-to-Sigma Details8.1.B.1 Computation of the Probability of a Noticeably Improper ROC CurveFor the RMH model, the mean-to-sigma ratio varies across readers. Letting denote the mean-to-sigma ratio for test and reader , Hillis3 shows the RMH model implies that where It follows for test that the probability that a reader’s ROC curve is noticeably improper (i.e., the absolute value of the mean-to-sigma ratio is less than 2, as discussed in Sec. 3.2.2) is given as where8.2.B.2 Derivation of in Step 4 in Table 10 when b_method = mean_to_sigmaWithout loss of generality, we assume that test 1 has the lower OR AUC input; i.e., . Let denote the empirical AUC estimate for a randomly selected RMH reader reading a random RMH sample of ratings for test 1. Given the solution values of , and from steps 1 to 4 in Table 6, we want to solve for such that and , where is the mean-to-sigma ratio for reader . Recall that for test 1, the median separation between the latent normal and abnormal distributions for test 1 across readers is equal to . It follows that the median mean-to-sigma ratio is given by and henceFrom Table 6, we can write Using the relationship from Table 7, it follows that Substituting expression Eq. (24) for into Eq. (25) yields which implies or equivalently Collecting terms in Eq. (26) results in a quadratic equation in : Solving for using the quadratic equation formula yields where9.Appendix C: Commands and Output for Tables from Applying the OR_to_RMH and RMH_to_OR R FunctionsThis appendix includes the R commands and resulting output that were used to produce the content of Fig. 4 and Tables 3–5. Note that both the RMH_to_OR and RMH_to_OR functions return values for mean_to_sig1, mean_to_sig2, mean_sig1_025, and mean_sig2_025; these are not RMH-model or OR-model parameters but rather are parameters describing the distributions of the true reader AUC values. 9.1.C.1 R Commands and Output Corresponding to Fig. 49.1.1.C.1.1 Computation of RMH values by applying OR-to-RMH algorithm to VanDyke original OR values> VanDyke_OR_orig_values <- data.frame(n0 = 69, n1 = 45, AUC1 = 0.897, + AUC2 = 0.941, var_R = 0.00154, var_TR = 0.000208, + error_var = 0.000788, corr1 = 0.433, + corr2 = 0.430, corr3 = 0.299) > RM_values <- OR_to_RMH(VanDyke_OR_orig_values) > print(RM_values) n0 n1 delta1 delta2 var_R var_TR var_C var_TC 1 69 45 2.392224 2.957029 0.1223413 0.005180485 0.4716964 0.1222262 var_RC var_error b b_method mean_to_sig1 mean_to_sig2 1 0.1091448 0.2969327 0.656081 unspecified 4.563553 5.64101 Pr1_improper Pr2_improper 1 0.003896242 7.862956e-05 9.1.2.C.1.2 Computation of OR true values by applying RMH-to-OR algorithm to RMH values> OR_true_values <- RMH_to_OR(RM_values) > print(OR_true_values) n0 n1 AUC1 AUC2 var_R var_TR error_var cov1 1 69 45 0.897 0.941 0.00154 0.000208 0.0007880002 0.0003412041 cov2 cov3 corr1 corr2 corr3 b mean_to_sig1 1 0.0003388401 0.0002356121 0.433 0.43 0.299 0.656081 4.563553 mean_to_sig2 Pr1_improper Pr2_improper 1 5.64101 0.003896242 7.862956e-05 9.2.C.2 R Commands and Output Corresponding to Table 3> # Create data frame with 5 rows, with row 1 same as RM_values in Table 3a > # and rows 2-5 changed slightly. > VanDyke_OR_orig_values <- data.frame(n0 = 69, n1 = 45, AUC1 = 0.897, + AUC2 = 0.941, var_R = 0.00154, var_TR = 0.000208, error_var = 0.000788, + corr1 = 0.433, corr2 = 0.430, corr3 = 0.299) > RM_values <- OR_to_RMH(VanDyke_OR_orig_values) > RM_Table4 <- RM_values[c(1,1,1,1,1),] #creates data frame with 5 rows, each = RM_values > RM_Table34[2,c("n0","n1")] <- c(138, 90) > RM_Table34[3,c("n0","n1")] <- c(45, 69) > RM_Table34[4,c("delta1","delta2")] <- c(2.6452, 2.6452) > RM_Table34[5,c("delta1","delta2")] <- c(1.2759, 1.2759) > print(RM_Table3) n0 n1 delta1 delta2 var_R var_TR var_C 1 69 45 2.392224 2.957029 0.1223413 0.005180485 0.4716964 1.1 138 90 2.392224 2.957029 0.1223413 0.005180485 0.4716964 1.2 45 69 2.392224 2.957029 0.1223413 0.005180485 0.4716964 1.3 69 45 2.645200 2.645200 0.1223413 0.005180485 0.4716964 1.4 69 45 1.275900 1.275900 0.1223413 0.005180485 0.4716964 var_TC var_RC var_error b b_method mean_to_sig1 1 0.1222262 0.1091448 0.2969327 0.656081 unspecified 4.563553 1.1 0.1222262 0.1091448 0.2969327 0.656081 unspecified 4.563553 1.2 0.1222262 0.1091448 0.2969327 0.656081 unspecified 4.563553 1.3 0.1222262 0.1091448 0.2969327 0.656081 unspecified 4.563553 1.4 0.1222262 0.1091448 0.2969327 0.656081 unspecified 4.563553 mean_to_sig2 Pr1_improper Pr2_improper 1 5.64101 0.003896242 7.862956e-05 1.1 5.64101 0.003896242 7.862956e-05 1.2 5.64101 0.003896242 7.862956e-05 1.3 5.64101 0.003896242 7.862956e-05 1.4 5.64101 0.003896242 7.862956e-05 > OR_values_Table3 <- RMH_to_OR(RM_Table3) > print(OR_values_Table3) n0 n1 AUC1 AUC2 var_R var_TR error_var 1 69 45 0.8970000 0.9410000 0.001540000 2.080000e-04 0.0007880002 1.1 138 90 0.8970000 0.9410000 0.001540000 2.080000e-04 0.0003912576 1.2 45 69 0.8970000 0.9410000 0.001540000 2.080000e-04 0.0006344427 1.3 69 45 0.9190000 0.9190000 0.001644069 7.426773e-05 0.0007890063 1.4 69 45 0.7500034 0.7500034 0.007014410 3.019443e-04 0.0023458109 cov1 cov2 cov3 corr1 corr2 1 0.0003412041 0.0003388401 0.0002356121 0.4330000 0.4300000 1.1 0.0001703301 0.0001691406 0.0001176498 0.4353401 0.4322997 1.2 0.0002800701 0.0002778178 0.0001940871 0.4414426 0.4378927 1.3 0.0003644012 0.0003363961 0.0002513892 0.4618483 0.4263542 1.4 0.0012240655 0.0012083227 0.0009406161 0.5218091 0.5150981 corr3 b mean_to_sig1 mean_to_sig2 Pr1_improper 1 0.2990000 0.656081 4.563553 5.641010 0.003896242 1.1 0.3006966 0.656081 4.563553 5.641010 0.003896242 1.2 0.3059174 0.656081 4.563553 5.641010 0.003896242 1.3 0.3186150 0.656081 5.046146 5.046146 0.000783834 1.4 0.4009769 0.656081 2.433985 2.433985 0.326185605 Pr2_improper 1 7.862956e-05 1.1 7.862956e-05 1.2 7.862956e-05 1.3 7.838340e-04 1.4 3.261856e-01 9.3.C.3 R Commands and Output Corresponding to Table 4> VanDyke_OR_orig_values <- data.frame(n0 = 69, n1 = 45, AUC1 = 0.897, + AUC2 = 0.941, var_R = 0.00154, var_TR = 0.000208, var_error = 0.000788, + corr1 = 0.433, corr2 = 0.430, corr3 = 0.299) > Table4_OR1 <- VanDyke_OR_orig_values[c(1,1,1),] #creates data frame with 3 rows, > # each the same as VanDyke_OR_orig_values > Table4_OR2 <- data.frame(b_method=c("unspecified", "mean_to_sigma","specified"), + b_input = c(NA,NA,1), mean_sig_input = c(NA,5.2,NA)) > Table4_OR <- cbind(Table5_OR1, Table5_OR2) > print("Original OR parameter values") [1] "Original OR parameter values" > print(Table4_OR) n0 n1 AUC1 AUC2 var_R var_TR var_error corr1 corr2 corr3 1 69 45 0.897 0.941 0.00154 0.000208 0.000788 0.433 0.43 0.299 1.1 69 45 0.897 0.941 0.00154 0.000208 0.000788* 0.433 0.43 0.299 1.2 69 45 0.897 0.941 0.00154 0.000208 0.000788* 0.433 0.43 0.299 b_method b_input mean_sig_input 1 unspecified NA NA 1.1 mean_to_sigma NA 5.2 1.2 specified 1 NA *Note that with mean_to_sigma = mean_to_sigma or specified it is not necessary to specify a value for var_error, or the value can be NA > Table4_RMH <- OR_to_RMH(Table4_OR) > print("Table 4 RMH parameter values") [1] "Table 4 RMH parameter values" > print(Table4_RM) n0 n1 delta1 delta2 var_R var_TR var_C 1 69 45 2.392224 2.957029 0.12234134 0.005180485 0.4716964 1.1 69 45 2.303940 2.847902 0.11347812 0.004805176 0.4674676 1.2 69 45 1.855834 2.293997 0.07362882 0.003117776 0.4498198 var_TC var_RC var_error b b_method 1 0.1222262 0.1091448 0.2969327 0.6560810 unspecified 1.1 0.1220955 0.1089342 0.3015027 0.6929693 mean_to_sigma 1.2 0.1215947 0.1080172 0.3205683 1.0000000 specified mean_to_sig1 mean_to_sig2 Pr1_improper Pr2_improper 1 4.563553 5.641010 0.003896242 7.862956e-05 1.1 5.200000 6.427723 0.001778344 2.748745e-05 1.2 Inf Inf 0.000000000 0.000000e+00 > Table5_true_values <- RM_to_OR(Table4_RM) > print("Table 4 True OR values") [1] "Table 4 True OR values" > print(Table4_true_values) n0 n1 AUC1 AUC2 var_R var_TR var_error cov1 1 69 45 0.897 0.941 0.00154 0.000208 0.0007880002 0.0003412041 1.1 69 45 0.897 0.941 0.00154 0.000208 0.0007664249 0.0003318620 1.2 69 45 0.897 0.941 0.00154 0.000208 0.0006584975 0.0002851294 cov2 cov3 corr1 corr2 corr3 b 1 0.0003388401 0.0002356121 0.433 0.43 0.299 0.6560810 1.1 0.0003295627 0.0002291610 0.433 0.43 0.299 0.6929693 1.2 0.0002831539 0.0001968908 0.433 0.43 0.299 1.0000000 mean_to_sig1 mean_to_sig2 Pr1_improper Pr2_improper 1 4.563553 5.641010 0.003896242 7.862956e-05 1.1 5.200000 6.427723 0.001778344 2.748745e-05 1.2 Inf Inf 0.000000000 0.000000e+00 9.4.C.4 R Commands and Output Corresponding to Table 59.4.1.C.4.1 Table 5(a) code ( changed from 0.00154 to 0.154)> VanDyke_OR_altered_values_a <- data.frame(n0 = 69, n1 = 45, AUC1 = 0.897, + AUC2 = 0.941, var_R = 0.154, var_TR = 0.000208, var_error = 0.000788, + corr1 = 0.433, corr2 = 0.430, corr3 = 0.299) > RM_values = OR_to_RM(VanDyke_OR_altered_values_a) Warning message: In OR_to_RM.default(n0 = 69, n1 = 45, AUC1 = 0.897, AUC2 = 0.941, : Conversion failed. Try reducing the value of var_R. > print(RM_values,all=T) n0 n1 delta1 delta2 var_R var_TR var_C var_TC var_RC var_error b 1 69 45 NA NA NA NA NA NA NA NA NA b_method mean_to_sig1 mean_to_sig2 Pr1_improper Pr2_improper 1 unspecified NA NA NA NA x1 x2 x3 x4 x5 x6 x7 1 1.264641 1.563224 NA NA NA NA NA 9.4.2.C.4.2 Table 5(b) code ( changed from 0.00028 to 0.28)> VanDyke_OR_altered_values_b <- data.frame(n0 = 69, n1 = 45, AUC1 = 0.897, + AUC2 = 0.941, var_R = 0.00154, var_TR = 0.208, var_error = 0.000788, + corr1 = 0.433, corr2 = 0.430, corr3 = 0.299) > RM_values <- OR_to_RM(VanDyke_OR_altered_values_b) Warning message: In OR_to_RM.default(n0 = 69, n1 = 45, AUC1 = 0.897, AUC2 = 0.941, : Conversion failed. Try reducing the value of var_TR. > print(RM_values,all=T) n0 n1 delta1 delta2 var_R var_TR var_C var_TC var_RC var_error b 1 69 45 NA NA NA NA NA NA NA NA NA b_method mean_to_sig1 mean_to_sig2 Pr1_improper Pr2_improper 1 unspecified NA NA NA NA x1 x2 x3 x4 x5 x6 x7 1 1.264641 1.563224 0.06838082 NA NA NA NA 9.4.3.C.4.3 Table 5(c) code ( changed from 0.000788 to 0.00788)> VanDyke_OR_altered_values_c <- data.frame(n0 = 69, n1 = 45, AUC1 = 0.897, + AUC2 = 0.941, var_R = 0.00154, var_TR = 0.000208, var_error = 0.00788, + corr1 = 0.433, corr2 = 0.430, corr3 = 0.299) > RM_values <- OR_to_RM(VanDyke_OR_altered_values_c) Warning message: In OR_to_RM.default (n0 = 69, n1 = 45, AUC1 = 0.897, AUC2 = 0.941, : Conversion failed. If using b_method = "unspecified," there are two possible solutions: (a) Try changing (reduce or increase) the value of var_error.( b) Try using one of the other two b_method options, which should always work. > print(RM_values,all=T) n0 n1 delta1 delta2 var_R var_TR var_C var_TC var_RC var_error b 1 69 45 NA NA NA NA NA NA NA NA NA b_method mean_to_sig1 mean_to_sig2 Pr1_improper Pr2_improper 1 unspecified NA NA NA NA x1 x2 x3 x4 x5 x6 x7 1 1.264641 1.563224 0.06838082 0.07127637 NA NA NA AcknowledgmentsFor the first and second authors, this research was supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under Award No. R01EB025174. Some of the information presented in this paper was presented in a prior SPIE proceedings paper by the first author.34 We thank two reviewers and an associate editor for their helpful comments which greatly improved the manuscript. ReferencesN. A. Obuchowski and H. E. Rockette,
“Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations,”
Commun. Stat. Simul. Comput., 24
(2), 285
–308
(1995). https://doi.org/10.1080/03610919508813243 Google Scholar
C. A. Roe and C. E. Metz,
“Dorfman–Berbaum–Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation,”
Acad. Radiol., 4
(4), 298
–303
(1997). https://doi.org/10.1016/S1076-6332(97)80032-3 Google Scholar
S. L. Hillis,
“Simulation of unequal-variance binormal multireader ROC decision data: an extension of the Roe and Metz simulation model,”
Acad. Radiol., 19
(12), 1518
–1528
(2012). https://doi.org/10.1016/j.acra.2012.09.011 Google Scholar
C. K. Abbey, F. W. Samuelson and B. D. Gallas,
“Statistical power considerations for a utility endpoint in observer performance studies,”
Acad. Radiol., 20
(7), 798
–806
(2013). https://doi.org/10.1016/j.acra.2013.02.008 Google Scholar
B. D. Gallas and S. L. Hillis,
“Generalized Roe and Metz receiver operating characteristic model: analytic link between simulated decision scores and empirical AUC variances and covariances,”
J. Med. Imaging, 1
(3), 031006
(2014). https://doi.org/10.1117/1.JMI.1.3.031006 JMEIET 0920-5497 Google Scholar
S. L. Hillis,
“Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski–Rockette model parameters,”
Stat. Med., 37
(13), 2067
–2093
(2018). https://doi.org/10.1002/sim.7616 SMEDDA 1097-0258 Google Scholar
S. L. Hillis,
“A marginal-mean ANOVA approach for analyzing multireader multicase radiological imaging data,”
Stat. Med., 33
(2), 330
–360
(2014). https://doi.org/10.1002/sim.5926 SMEDDA 1097-0258 Google Scholar
M. Quenoille,
“Approximate tests of correlation in time series,”
J. R. Stat. Soc. Ser. B, 11 68
–84
(1949). https://doi.org/10.1111/j.2517-6161.1949.tb00023.x JSTBAJ 0035-9246 Google Scholar
J. Shao and T. Dongshen, The Jackknife and Bootstrap, Springer-Verlag, New York
(1995). Google Scholar
B. Efron, The Jackknife, The Bootstrap and Other Resampling Plans, SIAM(1982). Google Scholar
B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York
(1993). Google Scholar
E. R. DeLong, D. M. DeLong and D. L. Clarke-Pearson,
“Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,”
Biometrics, 44
(3), 837
–845
(1988). https://doi.org/10.2307/2531595 BIOMB6 0006-341X Google Scholar
C. Metz, B. Herman and C. Roe,
“Statistical comparison of two ROC-curve estimates obtained from partially-paired datasets,”
Med. Decis. Making, 18
(1), 110
–121
(1998). https://doi.org/10.1177/0272989X9801800118 MDMADE Google Scholar
S. L. Hillis,
“Relationship between Obuchowski–Rockette–Hillis and Gallas methods for analyzing multi-reader diagnostic imaging data with empirical AUC as the reader performance measure,”
Biostat. Epidemiol.,
(2022). https://doi.org/10.1080/24709360.2022.2062115 Google Scholar
B. D. Gallas,
“One-shot estimate of MRMC variance: AUC,”
Acad. Radiol., 13
(3), 353
–362
(2006). https://doi.org/10.1016/j.acra.2005.11.030 Google Scholar
B. J. Smith, S. L. Hillis and L. L. Pesce,
“MRMCaov: multi-reader multi-case analysis of variance,”
(2021). Google Scholar
M. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford University Press, New York
(2003). Google Scholar
J. P. EganJ. P. Egan, Signal Detection Theory and ROC-Analysis, Academic Press(1975). Google Scholar
X. C. Pan and C. E. Metz,
“The “proper” binormal model: parametric receiver operating characteristic curve estimation with degenerate data,”
Acad. Radiol., 4
(5), 380
–389
(1997). https://doi.org/10.1016/S1076-6332(97)80121-3 Google Scholar
J. Hanley,
“The robustness of the binormal assumptions used in fitting ROC curves,”
Med. Decis. Making, 8
(3), 197
–203
(1988). https://doi.org/10.1177/0272989X8800800308 MDMADE Google Scholar
J. A. Hanley,
“The use of the ‘binormal’model for parametric roc analysis of quantitative diagnostic tests,”
Stat. Med., 15
(14), 1575
–1585
(1996). https://doi.org/10.1002/(SICI)1097-0258(19960730)15:14<1575::AID-SIM283>3.0.CO;2-2 SMEDDA 1097-0258 Google Scholar
J. Swets,
“Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance,”
Psychol. Bull., 99
(2), 181
–198
(1986). https://doi.org/10.1037/0033-2909.99.2.181 PSBUAI 0033-2909 Google Scholar
S. L. Hillis and K. M. Schartz,
“Multireader sample size program for diagnostic studies: demonstration and methodology,”
J. Med. Imaging, 5
(4), 1
–27
(2018). https://doi.org/10.1117/1.JMI.5.4.045503 JMEIET 0920-5497 Google Scholar
J. Swets, W. Tanner and T. Birdsall,
“Decision processes in perception,”
Psychol. Rev., 68
(5), 301
–340
(1961). https://doi.org/10.1037/h0040547 PSRVAX 0033-295X Google Scholar
J. Swets,
“Indices of discrimination or diagnostic accuracy: their ROCs and implied models,”
Psychol. Bull., 99
(1), 100
–117
(1986). https://doi.org/10.1037/0033-2909.99.1.100 PSBUAI 0033-2909 Google Scholar
D. Green and J. Swets, Signal Detection Theory and Psychophysics, Peninsula Publishing, Los Altos
(1988). Google Scholar
S. L. Hillis and K. S. Berbaum,
“Using the mean-to-sigma ratio as a measure of the improperness of binormal ROC curves,”
Acad. Radiol., 18
(2), 143
–154
(2011). https://doi.org/10.1016/j.acra.2010.09.002 Google Scholar
C. Van Dyke et al.,
“Cine MRI in the diagnosis of thoracic aortic dissection,”
in 79th RSNA Meetings,
(1993). Google Scholar
S. L. Hillis et al.,
“A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data,”
Stat. Med., 24 1579
–1607
(2005). https://doi.org/10.1002/sim.2024 SMEDDA 1097-0258 Google Scholar
S. L. Hillis,
“A comparison of denominator degrees of freedom methods for multiple observer ROC analysis,”
Stat. Med., 26
(3), 596
–619
(2007). https://doi.org/10.1002/sim.2532 SMEDDA 1097-0258 Google Scholar
B. D. Gallas et al.,
“A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators,”
Commun. Stat. Theory Methods, 38
(15), 2586
–2603
(2009). https://doi.org/10.1080/03610920802610084 Google Scholar
X. Zhu and W. Chen,
“Simulation of multi-reader multi-case study data with realistic ROC performance characteristics,”
Proc. SPIE, 11316 113160M
(2020). https://doi.org/10.1117/12.2550545 Google Scholar
D. P. Tihansky,
“Properties of the bivariate normal cumulative distribution,”
J. Am. Stat. Assoc., 67
(340), 903
–905
(1972). https://doi.org/10.1080/01621459.1972.10481314 Google Scholar
S. L. Hillis,
“Determining Roe and Metz model parameters for simulating multireader multicase confidence-of-disease rating data based on read-data or conjectured Obuchowski–Rockette parameter estimates,”
Proc. SPIE, 11316 113160N
(2020). https://doi.org/10.1117/12.2550541 Google Scholar
BiographyStephen L. Hillis is a research professor in the Departments of Radiology and Biostatistics at the University of Iowa. He received his PhD in statistics in 1987 and his MFA degree in music 1978, both from the University of Iowa. He is the author of more than 100 peer-reviewed journal articles and four book chapters. Since 1998, his research has focused on methodology for multireader diagnostic radiologic imaging studies. Brian J. Smith is a professor in the Department of Biostatistics at the University of Iowa and director of the Biostatistics Core in the Holden Comprehensive Cancer Center. He received his PhD in biostatistics in 2001 from the University of Iowa. His research is cancer focused and includes statistical computing, predictive modeling, and methods for medical imaging studies. Weijie Chen is a research physicist in the Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, CDRH, US FDA, where he conducts research and regulatory reviews of medical devices. He earned his PhD in Medical Physics in 2007 from the University of Chicago. He has published 36 peer-reviewed journal articles, 31 proceedings papers, two book chapters, two editorials, and one patent. His research interests include performance characterization and assessment methodologies for imaging and AI/ML/CAD devices. |