Machine Learning


6 min read

Origins of machine learning

  • Machine learning has its origins in statistics and mathematical modeling of data

Fundamental idea of machine learning

  • The fundamental idea of machine learning is to use data from past observations to predict unknown outcomes or values


  • An ice cream store owner using historical sales and weather records to predict daily ice cream sales

  • A doctor using clinical data to predict a patient's risk of diabetes

  • A researcher using past observations to automate the identification of penguin species

Machine learning model

  • A machine learning model is a software application that calculates an output value based on input values

  • The process of defining the model's function is known as training

  • After training, the model can be used to predict new values in a process called inferencing.

Diagram showing the training and inferencing phases in machine learning.

Training data

  • The training data consists of past observations

  • Observations include the observed features and the known label

  • Features are often referred to as x, and the label as y


  • In the ice cream sales scenario, features (x) are weather measurements and the label (y) is the number of ice creams sold

  • In the medical scenario, features (x) are patient measurements and the label (y) is the likelihood of diabetes

  • In the Antarctic research scenario, features (x) are penguin attributes and the label (y) is the species

Algorithm and model

  • An algorithm is applied to determine a relationship between features and label

  • The result is a model that is a function denoted as f

  • The model is used for inferencing by inputting feature values and receiving a prediction of the label

  • The output from the model is often denoted as ŷ or "y-hat"

Types of machine learning

Diagram showing supervised machine learning (regression and classification) and unsupervised machine learning (clustering).

Supervised machine learning

  • Training data includes both feature values and known label values

  • Used to train models by determining a relationship between features and labels

  • Predicts unknown labels for features in future cases


  • Form of supervised machine learning with numeric label predictions

  • Predicts values like number of ice creams sold or selling price of a property


  • Form of supervised machine learning with categorical label predictions

  • Two common scenarios: binary classification and multiclass classification

Binary classification

  • Predicts one of two outcomes, true/false or positive/negative

  • Examples: risk for diabetes, loan default, response to marketing offer

Multiclass classification

  • Predicts one of multiple possible classes

  • Examples: species of a penguin, genre of a movie

Unsupervised machine learning

  • Training data consists only of feature values without known labels


  • Most common form of unsupervised machine learning

  • Identifies similarities between observations based on features and groups them into clusters

  • Examples: grouping flowers, identifying similar customers

Segmenting Customers:

  • Segment customers into groups

Analyzing Customer Groups:

  • Identify and categorize different classes of customers

  • Examples of customer classes could include high value-low volume customers, frequent small purchasers, etc.

Labeling Clustering Results:

  • Use categorizations to label observations in clustering results

Training a Classification Model:

  • Utilize the labeled data to train a classification model

  • The model will predict which customer category a new customer might belong to.


Training a Regression Model

  • Regression models are trained to predict numeric label values based on training data

  • The training data includes both features and known labels

  • The training process involves multiple iterations

  • An appropriate algorithm is used to train the model

  • The model's predictive performance is evaluated

  • The model is refined by repeating the training process with different algorithms and parameters

  • The goal is to achieve an acceptable level of predictive accuracy

Key Elements of the Training Process

  • Splitting the training data to create a dataset for training the model and another subset for validation

  • Using an algorithm (e.g., linear regression) to fit the training data to a model

  • Using the validation data to test the model by predicting labels for the features

  • Comparing the predicted labels with the actual labels in the validation dataset

  • Calculating a metric to indicate the accuracy of the model's predictions

Example of Regression

  • Training a model to predict ice cream sales based on temperature as the feature

  • Historic data includes records of daily temperatures and ice cream sales.

Diagram showing the process of training an evaluating a supervised model.

Mean Absolute Error (MAE)

  • The mean absolute error (MAE) measures the average absolute difference between predicted and actual values.

  • In the ice cream example, the MAE is calculated by finding the mean of the absolute errors (2, 3, 3, 1, 2, and 3), resulting in a value of 2.33.

Mean Squared Error (MSE)

  • The mean squared error (MSE) measures the average squared difference between predicted and actual values.

  • It amplifies larger errors by squaring individual errors and calculating the mean of the squared values.

  • In the ice cream example, the MSE is calculated by finding the mean of the squared absolute values (4, 9, 9, 1, 4, and 9), resulting in a value of 6.

Root Mean Squared Error (RMSE)

  • The root mean squared error (RMSE) is calculated by taking the square root of the MSE.

  • In the ice cream example, the RMSE is calculated as the square root of 6, resulting in a value of 2.45 (ice creams).

Coefficient of determination (R2)

  • The coefficient of determination (R2) measures the proportion of variance in the validation results explained by the model.

  • R2 values range between 0 and 1, with higher values indicating a better fit.

  • In the ice cream example, the R2 calculated from the validation data is 0.95, indicating that the model explains 95% of the variance in the data.

Iterative training

  • In real-world scenarios, data scientists use an iterative process to train and evaluate models.

  • This process involves varying feature selection, algorithm selection, and algorithm parameters to improve model performance.

Selection of the best model

  • The model that results in the best evaluation metric is selected

  • The selected model should have an acceptable evaluation metric for the specific scenario.

Binary classification

Classification in machine learning

  • Classification is a supervised machine learning technique

  • It follows an iterative process of training, validating, and evaluating models

Binary classification

  • Binary classification predicts one of two possible labels for a single class

  • It often uses multiple features (x) and a y value of 1 or 0

Example - binary classification

  • In a simplified example, blood glucose level is used to predict diabetes

  • The model predicts whether the label (y) is 1 (diabetes) or 0 (no diabetes)

Training a binary classification model

  • To train the model, we use an algorithm to fit the training data to a function that calculates the probability of the class label being true (diabetes)

  • The probability is measured between 0.0 and 1.0, where 1.0 represents a high probability of having diabetes

  • The function describes the probability of the class label being true for a given value of x

  • Three observations in the training data have a known class label of true (1.0), and three observations have a known class label of false (0.0)

  • An S-shaped curve represents the probability distribution, where values above the threshold predict true (1) and values below predict false (0)

  • The threshold is defined at a probability of 0.5

  • By applying the function to new data, we can predict the class label (diabetes) based on the probability output