Apart from the computations described in this work, we also tested the classifier performance by only using those cases that were given the same class by both readers. For all considered artifacts, the sensitivity, specificity, and AUC of the algorithms were slightly higher, showing that the presented approach partly relies on the used image data and the manual annotations. For the nipple position classification, e.g., the AUC was raised to 0.99 leading to a sensitivity of 0.46 at a specificity of more than 0.99. Nevertheless, we decided to include the cases with disagreement as negatives, in order not to bias the data base by excluding the difficult cases. In another experiment (not explicitly shown in this manuscript), the classifiers were trained only on the clear, i.e., concordantly annotated, cases of dataset A and tested on the unclear cases. The classifier accuracies (sum of true positives and true negatives divided by all cases) when compared to Reader 1 and Reader 2, respectively, were 0.52 and 0.48 for the nipple position, 0.27 and 0.73 for the nipple shadow, as well as 0.45 and 0.55 for the breast contour shape. Thus, for all considered quality aspects, the trained classifiers were consistently more in line with Reader 2 than with Reader 1.