**Pulmonary Nodule Detection**
Hyunjun Eun, Daeyeong Kim, and Changick Kim
본 연구는 폐 결절 검출에서 효과적으로 False Positive를 줄이는 방법에 대해 제안합니다. 먼저 Candidate Detection단계에서는 True Negative의 발생을 막기 위해 수 많은 False Positive를 검출하게 됩니다. 제안하는 방법에서는 효과적으로 False Positive를 줄이기 위해 Non-Nodule을 특정 종류로 Categorization한 후 Nodule과 Non-Nodule의 분류를 진행 합니다. Non-Nodule의 분류는 Deep Autoencoder를 통하여 자동적으로 이루어지며, 유사한 Non-Nodule들을 Clustering 하여 Nodule과 Non-Nodule 분류를 위한 CNN 학습에 이용합니다. 이러한 작업은 너무나 다양한 Non-Nodule 데이터에 대한 학습을 효과적으로 진행할 수 있게 합니다. 또한 Non-Nodule과 Nodule의 데이터 양에 대한 불균형 문제를 해결 할 수 있습니다. Our ensemble of 2D CNNs utilizes single-view 2D patches to improve both computational and memory efficiency compared to previous techniques exploiting 3D CNNs. We .rst categorize non-nodules on the basis of features encoded by an autoencoder. Then, all 2D CNNs are trained by using the same nodule samples, but with different types of non-nodules. By extending the learning capability, this training scheme resolves difficulties of extracting representative features from non-nodules with large appearance variations. Note that, instead of manual categorization requiring the heavy workload of radiologists, we propose to automatically categorize non-nodules based on the autoencoder and k-means clustering.
**Method** The proposed framework, an ensemble of single-view 2D CNNs with fully automatic non-nodule categorization, aims to achieve more efficient and accurate false positive reduction in pulmonary nodule detection. In Fig. 2, we delineate the proposed framework composed of two parts: non-nodule categorization for training and nodule probability prediction. We automatically categorize non-nodules by using an autoencoder and k-means clustering at the non-nodule categorization stage. Based on these categorized non-nodules, we train each 2D CNN with a different 275 category of non-nodules and same nodules to predict a nodule probability. In testing, a nodule probability of a
nodule candidate is predicted by combining the prediction results of each 2D CNN. **Non-Nodule Categorization**
To train 2D CNNs with different types of non-nodules and same nodules, we categorize non-nodules with large appearance variations. This training scheme can generate an expert network to distinguish a specfi.c type of non- nodules and nodules by extending the learning capability . Furthermore, the ensemble of these 2D CNNs has the great ability of effectively reducing false positives . For fully automatic non-nodule categorization without the hand of radiologists, we intro duce an autoencoder to extract representative features of non-nodules and adopt the k-means clustering to group non-nodules based on these features.
We show several reconstruction results of non-nodules in Fig. 3. As can be seen, the trained autoencoder faithfully reconstructs non-nodules similar to input. Based on the remained features, we perform k-means clustering with K clusters. In Fig. 6, we show
clustering results with K = 5 on 1,500 non-nodules randomly selected. For visualization, we use t-SNE that reduces multiple dimensions of features to two dimensions. **Non-Nodule Probability Prediction**
For predicting nodule probability of candidates, we introduce the ensemble of 2D CNNs. Employing a 2D CNN with single-view 2D patches has many advantages in terms of memory and computational efficiency. However, only using a 2D single view might yield unstable predictions due to less information compared to the 3D volumetric data. To alleviate this problem, we extend the capability of networks by training each CNN with the same nodules and a different type of non-nodules obtained from non-nodule categorization (see Fig. 2(b)). From this training strategy, CNNs can more easily learn meaningful features from the small number of similar non-nodules than from the large number of diverse non-nodules. Furthermore, the ensemble of these 2D CNNs reduces a large number of false positives and increases prediction performance by reducing the bias of a single network. The architecture of the 2D CNN is illustrated in Fig. 7. We show categorized non-nodules accounting for a majority of each group in Fig. 9. We set K = 5. Note that non-nodule samples for non-nodule categorization are nodule candidates extracted from candidate detection. We do not randomly select non-nodule samples as everything else but nodules in a scan. The random selection might provide poor performance in modeling data since the variance in non-nodule samples would be very large. As can be seen, each group involves visually similar non-nodules. For example, non-nodule 1 commonly contains arbitrary and thick structures. In non-nodule 2 and 5, chest walls are heavily included. In case of non-nodule 5, lung regions are more homogeneous compared to non-nodule 2. Non-nodule 3 and 4 generally contain vessel branches. However, vessels in non-nodule 4 are clearer and thicker than ones in non-nodule 3.
We performed a comparison between works of top-6 teams participated in the false positive track of the LUNA16 challenge. We referred the LUNA16 leaderboard system for details of these teams. Note that we and the teams that use the initially provided list of nodule candidates are indicated with an asterisk on the leaderboard system. Table 2 describes architectures of top-6 teams and ours. For data type, four teams used 3D patches to obtain rich information. However, it is extremely expensive in computational cost to utilize 3D CNNs with 3D data. By contrast, DIAG CONVNET [21] and our architecture exploited 2D patches to consider computational and storage efficiency. We also calculated the number of parameters and oating point operations per second (FLOPS) of each architecture to measure computational complexity. Even though CUMedVis [11] utilizes the smaller number of networks than DIAG CONVNET and ours, CUMedVis requires the highest computational complexity with the number of parameters of 1,422K and FLOPS of 5,961M due to employing 3D CNNs. We measured the training time and the test time for our proposed framework. The average training time for an autoencoder and a single CNN are 2.1 hours and 1 hours, respectively. For the k-means clustering to categorize non-nodules, it takes only 10 secs. The test time per scan including 621 candidates on average is 0.49 secs. In Table 3, we present sensitivities on seven false positives numbers (i.e., 0.125, 0.25, 0.5, 1, 2, 4, and 8) and a competition performance metric (CPM) score. A CPM score is computed an average sensitivity at seven false positive numbers. As can be seen, sensitivities at small false positive numbers of other teams are low because non-nodules similar to nodules obtain high nodule probabilities. Unlike them, the proposed framework (i.e., AE5) obtain relatively high sensitivities even at small false positive numbers. Furthermore, our framework achieved the highest CPM score even with low computational complexity. Hyunjun Eun, Daeyeong Kim, Chanho Jung, and Changick Kim, "Single-View 2D CNNs with Fully Automatic Non-nodule Categorization for False Positive Reduction in Pulmonary Nodule Detection," Accepted to *Computer Methods and Programs in Biomedicine*. |