In this third part, I will discuss underfitting and overfitting (and how to detect them), penalized regression (ridge, lasso, and elastic net), and how to implement them in python.

photo from: rasalv88

You can find here the first part (introduction and math): here

You can find here the second part (fit, problem diagnosis, and solving): here

Underfitting and overfitting

Let’s discuss briefly an important point. Underfitting and overfitting.

The goal of a good model is that is able to generalize well when it encounters new data. This ability allows your model to make predictions on unseen data

Underfitting is when a model can’t accurately capture the…

In this second tutorial, I will show how to fit a linear regression model to predict Acute Myeloid Leukemia, how to evaluate the results. Moreover, I will discuss the plots to investigate the algorithm results and solutions to the most common problems.

photo from: rasalv88

You can find here the previous tutorial: here

Fit a simple linear regression model

Let’s start with defining our inputs variable (X) and our dependent variable. In this case, we want to estimate the value of MYC, an mRNA transcript based on the other gene behavior. We choose MYC because is the most famous oncogene (the first actually that has been recognized). Many groups…

In this tutorial, I will discuss how to use linear regression with transcriptomic data for Acute Myeloid Leukemia (AML), in this first part I will introduce linear regression and math behind.

photo from: rasalv88

This is the index of the tutorial so you can choose which part interest you

  1. Introduction to supervised learning and linear regression
  2. Introduction to the dataset and it is preprocessing
  3. Digging in the algorithm and the math behind
  4. Fit a simple linear regression model
  5. Algorithm evaluation: error metrics, assumptions, plots, and solutions
  6. Underfitting and overfitting
  7. Penalized regression: ridge, lasso, and elastic net
  8. Other resources
  9. Bibliography

in this first part…

This review is an introduction to artificial intelligence on SARS-CoV-2 medical images

Figure source: here

In this review, I will present the medical image type available, an overview of the deep learning models that are published, overviews of current challenges with possible solutions, and links on the dataset you can use. In the end, I will list also some available resources if you want to deepen some of the topics. Moreover, I will provide literature references to explore.


  1. Medical images’ data type
  2. Use of artificial intelligence in COVID medical images
  3. Current Challenges
  4. Conclusion
  5. Available datasets
  6. Other resources
  7. Bibliography


From its outbreak, SARS-CoV-2 has reached more than 100 million cases and led to more than…

figure source: Rasalv88

This tutorial is the continuation of Clustering techniques with Gene Expression Data where we discussed algorithm belonging to partitional clustering (k-means) and hierarchical clustering.

For the Acute Myeloid Leukemia gene expression dataset loading refer to first part of the previous tutorial where the pre-processing of the dataset is explained.

You can find the first part: here.

Tutorial structure:

  1. Introduction to Density-based clustering
  2. Implementing DBSCAN in python
  3. Introduction to Gaussian Mixture Models
  4. Implementing Gaussian Mixture Models in python
  5. Previous tutorials
  6. bibliography

Introduction to Density-based clustering

Density-based clustering determines cluster assignment based on the density of data observations in particular region. Clusters are regions where there…

Fig 1. Scikit learning cheat sheet

In this tutorial I will focus on different clustering techniques using gene expression data. In this tutorial I will use data from acute myeloid leukemia (AML), which is one of the most fatal malignancies. Data clustering methods are unsupervised techniques that are widely used in machine learning and it has been proved to be a useful tool in biomedicine research. Contrarily to supervised machine learning, clustering techniques interpret the data to find cluster (or group) in the feature space. Technically, clusters area of density in the feature space where the observations are closer to the member of a given cluster…

Fig 1. CT-scans from SARS-CoV-2 patients (left) and not-infected patients (right). Figure source: Adapted from the dataset page

This article is based on the work of Nitin Mane and his GitHub release. The original dataset (1) was published from the Lancaster university and it is accessible here.

The dataset and the original model were collected and developed by Asociación de Investigacion en Inteligencia Artificial Para la Leucemia Peter Moss collaborators, PlamenLancaster: Professor Plamen Angelov from Lancaster University/ Centre Director @ Lancaster Intelligent, Robotic and Autonomous systems (LIRA) Research Centre, & his researcher, Eduardo Soares PhD.

Figure 1. example of microarray. Figure source: (2).

This tutorial will focus on different reduction complex techniques using gene expression data. In this tutorial and in the following I will use data from acute myeloid leukemia (AML), which is one of the most fatal malignancies. Genomic information and transcriptome information have already led to rediscuss the classification of AML. Indeed, leukemia malignancies show specific signature that allow to subclassify them. This is important to identify patient subset and better tailor therapeutic option, identify new potential target and predictive models. Primary diagnosis (which relies on immunophenotyping, morphology analysis) is expensive, requires experts and specific infrastructure. …

figure source: here

In this third article, we focus on other tasks and other type of data exploited in deep learning (DL) in hematology. As aforementioned, deep learning can be useful in many tasks related to leukemia. In the precedent review we focused on image data analysis. Here we will consider different other data sources and application such as therapy selection, differential diagnosis, risk predictions (1).

As an example, medical diagnosis in hematology is based on blood test, where clinicians focus on values that fall outside a reference range. This is leading to miss patterns and also relationships between parameters. …

Figure 1. example of different blood cell according to morphology. N (Normal lymphocytes), CLL (chronic lymphocytic leukemia), SMZL (splenic marginal zone lymphoma), MCL (mantle cell lymphoma), HCL (hairy cell leukemia), FL (follicular lymphoma), PL (B and T prolymphocytic leukemia), LGL-T (large granular lymphocyte lymphoma), SS (Sézary syndrome), BC (Blast cells), PC (plasma cell), RL (Reactive lymphocytes). Figure source: (2).

Previous article on Acute Myeloid Leukemia in general: here.

In leukemia and hematology in general there are many potential applications for machine learning techniques with the aim of a differential diagnosis, aiding in the therapy decision, prediction of risk and reducing potential medical errors. The main applications nowadays are predictive modelling, diagnostics and medical image analysis (1). In this section, we will focus on machine learning and deep learning in medical images diagnosis. The increase in available data, hardware capabilities and cloud computing are allowing a great development in the field, and medicine is benefiting from this revolution. …

Salvatore Raieli

Bioinformatician and data scientist, passionate and writing about bioinformatics, machine learning, and AI.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store