How To Setup Jenkins With Windows Master and Linux Slave

Jenkins set up and configuration for CI/CD automation; if you the master is on windows while the slaves on in linux, an alternative but fun discussion

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Convolutional Neural Network for Skin Cancer Classification

Exploring image classification techniques for skin cancer diagnoses using the HAM-10000 dataset

An Advanced Machine Learning Project by Kolton Fowler, Oluwaseun Ibitoye, Avery Shepherd, and Rahul Singla

Melanoma

There are three main goals for this project:

1) Develop a model that predicts the correct image class with high accuracy (multi-class classification).

2) Develop a model that predicts whether a lesion is a melanoma or non-melanoma with high accuracy (binary classification).

3) Ensure the developed models have low false negative rates and false positive rates pertaining to melanoma.

Ultimately, we want to decrease these false diagnoses and improve the accuracy of a doctor’s melanoma diagnosis with image classification.

Melanoma from the HAM-10000 dataset

Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen’s disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (vasc). The distribution of diagnostic categories is shown below. As seen below, there is an imbalance in the target classes and this will be addressed later on in the discussion.

Distribution of Diagnostic Categories

More than 50% of lesions are confirmed through histopathology, the ground truth for the rest of the cases is either follow-up examination, expert consensus, or confirmation by in-vivo confocal microscopy.

Given that all the images are from the same source and were specifically made for the purpose of image classification, they were already standardardized to 600 x 450. For the purposes of this project, we could have chosen to scale down the image for faster processing during the pre-processing stage, however due to the nature of the problem and the high cost business application we decided to keep the high resolution images and prevent the chance of loosing useful information.

We used keras sequential modeling, by tensorflow, to build all the below mentioned neural networks.

First CNN Architecture (1)

We kicked it off with the most common CNN architecture

After training this architecture on ~5000 images, it was found that this architecture had poor accuracy. The NN was predicting almost all the images to belong to 1 class. After further investigation, it was found that this was being caused due to dying ReLU and absence of normalization. If any of the weights in the neural network go highly negative, then that edge can die out pretty quickly because of the ReLU activation function. These problems have been addressed in architecture 2.

Second CNN Architecture (2)

Compared to previous architecture, two changes were made:

Third CNN Architecture (3)

Since, Melanoma is the only skin lesion that can lead to death, we finally created an architecture to predict if a given lesion is Melanoma or not. So, the softmax layer was replaced with a sigmoid layer at the end.

Cascading — CNN + Random Forest

The CNN architecture 3 was able to achieve a decent false negative rate but had a really high positive rate (repercussions of high positive rate have been discussed more below). To address, high positive rate issue we used a cascaded model. Cascaded model is an architecture in which a ML model uses the output from another ML model to make the final prediction. In our example, we have used Random Forest to make the final prediction of lesion being Melanoma or not. The Random Forest uses soft probability from CNN and additional data on patient such as age, location of lesion, sex to make the final prediction.

Before diving into the results, it is good to understand how the CNN is working and what are the key features that are being extracted from the image. The chart below shows the activation maps that were created after each CNN layer.

Key observations:

While doing any multiclass classification there can be two problems that we should be aware of:

To handle these above mentioned issues, one of the heuristics is to assign different class weights.

The following table shows the implication of changing class weight of Melanoma from 1 to 5 (all the other classes have weight of 1). When classes were equally weighted, only 28% of the actual Melanoma cases were classified as Melanoma and others were classified as other non-cancerous lesions. This kind of misclassification can have a serious negative impact on patients with Melanoma. After updating the weight, 92% of the Melanoma cases are being correctly classified. This is a big improvement with a small change. One caveat to note, false positive rate has also increased with this weight update. So we should change weights with caution.

Now that the class imbalance problem has been improved by updating the class weights, the same principle is applied to the binary classification problem. In this case the weight for the Melanoma class is 7 and the Non-Melanoma class is 1. As mentioned above, that main metrics we are concerned about are the false positive rate and the false negative rate. The initial results show a false positive rate of 57.4% and a false negative rate of 6.9%. The false positive rate from this model shows an improvement on the the first in-person visual inspection false positive rate of 69.03%. This means that this classification model does a better job at correctly identifying a lesion as negative when it is actually negative than a doctor on first visual inspection of a lesion.

Unfortunately, the model did not improve the false negative rates so some hyperparameters were updated to see if it would enhance the model. First, an additional layer was added with kernel size of 9 and then one of the original layers kernel size was increased from 5 to 7 for a final composition of kernel size — 9,7,5,5,3. This change caused the false positive rate to improve even more to 50.9% but the false negative rates increased to 14.7%.

Using a cascading model with Random Forests on the results from the binary classification, as well as metadata such as the age, race and localization, we find that the false positive rates are reduced significantly from 50.9% in the original model to 26.7%.

The inclusion of metadata contributed significantly to the improvement observed in the cascading model. As shown in the feature importance map, the method in which the lesion was confirmed is the most important feature, most significantly if a lesion was identified through histopathology.

After we dove deeper into the nature of skin cancer, we realized that Melanoma skin cancer is associated with the most negative implications surrounding skin cancer as a whole. This changed the goal of the study a bit, as originally we had in mind that the study would better classify all types of skin cancer, but since Melanoma is the one we truly care about due to its high cost; we altered the study to have the goal be to best classify instances of Melanoma skin cancer.

The results of our analysis support the hypothesis that image classification should be used to supplement doctor diagnosis and not replace it. Our final model has significantly reduced the false positive rates to 26.7% as compared to first in-personal visual inspection false positive rate of 69.03%. We can use different parameters to confirm a positive or negative diagnosis. For instance, for a positive diagnosis, we can use the cascaded model (with a smaller false positive rate) to confirm it, whereas for a negative diagnosis, we can use the pre-cascaded model with 4 layers (with a smaller false negative rate) to confirm it.

Ultimately, using Convolutional Neural Network to support cancer diagnosis techniques could potentially save money, time and emotional distress associated with an incorrect diagnosis.

Future Work

[2] “What Is Melanoma?” American Cancer Society, American Cancer Society, https://www.cancer.org/cancer/melanoma-skin-cancer/about/what-is-melanoma.html.

Add a comment

Related posts:

Are you living life by design or by default?

Recently I was listening to the ‘Achieve Your Goals’ podcast on setting goals, with Hal Elrod and Geoff Woods. Geoff asked the listeners the question of the title of this blog. As a child, I used to…

Woman Escapes Hostile Workplaces With New Cheesecake Biz

Following a history of falling victim to sexual harassment in the workplace, College Point’s Lisa Cotoggio, 57, put her foot down and decided to venture out on her own and do what she loves — baking…

7 Tips to Capture Perfect Event Photos and Video

It has been a known fact that events are never a one-time transaction. The efforts put into the event reap your results in many different ways. With the modern age, events are no longer a one-day…