Origins of machine learning

Machine learning has its origins in statistics and mathematical modeling of data

Fundamental idea of machine learning

The fundamental idea of machine learning is to use data from past observations to predict unknown outcomes or values

Examples

An ice cream store owner using historical sales and weather records to predict daily ice cream sales
A doctor using clinical data to predict a patient's risk of diabetes
A researcher using past observations to automate the identification of penguin species

Machine learning model

A machine learning model is a software application that calculates an output value based on input values
The process of defining the model's function is known as training
After training, the model can be used to predict new values in a process called inferencing.

Diagram showing the training and inferencing phases in machine learning.

Training data

The training data consists of past observations
Observations include the observed features and the known label
Features are often referred to as x, and the label as y

Examples

In the ice cream sales scenario, features (x) are weather measurements and the label (y) is the number of ice creams sold
In the medical scenario, features (x) are patient measurements and the label (y) is the likelihood of diabetes
In the Antarctic research scenario, features (x) are penguin attributes and the label (y) is the species

Algorithm and model

An algorithm is applied to determine a relationship between features and label
The result is a model that is a function denoted as f
The model is used for inferencing by inputting feature values and receiving a prediction of the label
The output from the model is often denoted as ŷ or "y-hat"

Types of machine learning

Diagram showing supervised machine learning (regression and classification) and unsupervised machine learning (clustering).

Supervised machine learning

Training data includes both feature values and known label values
Used to train models by determining a relationship between features and labels
Predicts unknown labels for features in future cases

Regression

Form of supervised machine learning with numeric label predictions
Predicts values like number of ice creams sold or selling price of a property

Classification

Form of supervised machine learning with categorical label predictions
Two common scenarios: binary classification and multiclass classification

Binary classification

Predicts one of two outcomes, true/false or positive/negative
Examples: risk for diabetes, loan default, response to marketing offer

Multiclass classification

Predicts one of multiple possible classes
Examples: species of a penguin, genre of a movie

Unsupervised machine learning

Training data consists only of feature values without known labels

Clustering

Most common form of unsupervised machine learning
Identifies similarities between observations based on features and groups them into clusters
Examples: grouping flowers, identifying similar customers

Segmenting Customers:

Segment customers into groups

Analyzing Customer Groups:

Identify and categorize different classes of customers
Examples of customer classes could include high value-low volume customers, frequent small purchasers, etc.

Labeling Clustering Results:

Use categorizations to label observations in clustering results

Training a Classification Model:

Utilize the labeled data to train a classification model
The model will predict which customer category a new customer might belong to.

Regression

Training a Regression Model

Regression models are trained to predict numeric label values based on training data
The training data includes both features and known labels
The training process involves multiple iterations
An appropriate algorithm is used to train the model
The model's predictive performance is evaluated
The model is refined by repeating the training process with different algorithms and parameters
The goal is to achieve an acceptable level of predictive accuracy

Key Elements of the Training Process

Splitting the training data to create a dataset for training the model and another subset for validation
Using an algorithm (e.g., linear regression) to fit the training data to a model
Using the validation data to test the model by predicting labels for the features
Comparing the predicted labels with the actual labels in the validation dataset
Calculating a metric to indicate the accuracy of the model's predictions

Example of Regression

Training a model to predict ice cream sales based on temperature as the feature
Historic data includes records of daily temperatures and ice cream sales.

Diagram showing the process of training an evaluating a supervised model.

Mean Absolute Error (MAE)

The mean absolute error (MAE) measures the average absolute difference between predicted and actual values.
In the ice cream example, the MAE is calculated by finding the mean of the absolute errors (2, 3, 3, 1, 2, and 3), resulting in a value of 2.33.

Mean Squared Error (MSE)

The mean squared error (MSE) measures the average squared difference between predicted and actual values.
It amplifies larger errors by squaring individual errors and calculating the mean of the squared values.
In the ice cream example, the MSE is calculated by finding the mean of the squared absolute values (4, 9, 9, 1, 4, and 9), resulting in a value of 6.

Root Mean Squared Error (RMSE)

The root mean squared error (RMSE) is calculated by taking the square root of the MSE.
In the ice cream example, the RMSE is calculated as the square root of 6, resulting in a value of 2.45 (ice creams).

Coefficient of determination (R2)

The coefficient of determination (R2) measures the proportion of variance in the validation results explained by the model.
R2 values range between 0 and 1, with higher values indicating a better fit.
In the ice cream example, the R2 calculated from the validation data is 0.95, indicating that the model explains 95% of the variance in the data.

Iterative training

In real-world scenarios, data scientists use an iterative process to train and evaluate models.
This process involves varying feature selection, algorithm selection, and algorithm parameters to improve model performance.

Selection of the best model

The model that results in the best evaluation metric is selected
The selected model should have an acceptable evaluation metric for the specific scenario.

Binary classification

Classification in machine learning

Classification is a supervised machine learning technique
It follows an iterative process of training, validating, and evaluating models

Binary classification

Binary classification predicts one of two possible labels for a single class
It often uses multiple features (x) and a y value of 1 or 0

Example - binary classification

In a simplified example, blood glucose level is used to predict diabetes
The model predicts whether the label (y) is 1 (diabetes) or 0 (no diabetes)

Training a binary classification model

To train the model, we use an algorithm to fit the training data to a function that calculates the probability of the class label being true (diabetes)
The probability is measured between 0.0 and 1.0, where 1.0 represents a high probability of having diabetes
The function describes the probability of the class label being true for a given value of x
Three observations in the training data have a known class label of true (1.0), and three observations have a known class label of false (0.0)
An S-shaped curve represents the probability distribution, where values above the threshold predict true (1) and values below predict false (0)
The threshold is defined at a probability of 0.5
By applying the function to new data, we can predict the class label (diabetes) based on the probability output

Machine Learning

Table of contents

Origins of machine learning

Machine learning model

Types of machine learning

Supervised machine learning

Regression

Classification

Binary classification

Multiclass classification

Unsupervised machine learning

Clustering

Regression

Mean Absolute Error (MAE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

Coefficient of determination (R2)

Iterative training

Binary classification