Get in Touch

Course Outline

Introduction

This section offers a broad introduction to the appropriate scenarios for utilizing 'machine learning', key considerations, and its broader implications, including advantages and disadvantages. Topics covered include data types (structured/unstructured/static/streamed), data validity and volume, data-driven versus user-driven analytics, statistical models compared to machine learning models, challenges associated with unsupervised learning, the bias-variance trade-off, iteration and evaluation processes, cross-validation methods, and the distinctions between supervised, unsupervised, and reinforcement learning.

MAJOR TOPICS

1. Understanding naive Bayes

  • Core concepts of Bayesian methods
  • Probability theory
  • Joint probability
  • Conditional probability utilizing Bayes' theorem
  • The naive Bayes algorithm
  • Naive Bayes classification
  • The Laplace estimator
  • Applying numeric features with naive Bayes

2. Understanding decision trees

  • Divide and conquer strategy
  • The C5.0 decision tree algorithm
  • Selecting the optimal split
  • Pruning the decision tree

3. Understanding neural networks

  • Transition from biological to artificial neurons
  • Activation functions
  • Network topology
  • Determining the number of layers
  • Direction of information flow
  • Node count per layer
  • Training neural networks via backpropagation
  • Deep Learning

4. Understanding Support Vector Machines

  • Classification using hyperplanes
  • Maximizing the margin
  • Handling linearly separable data
  • Handling non-linearly separable data
  • Utilizing kernels for non-linear spaces

5. Understanding clustering

  • Clustering as a machine learning objective
  • The k-means clustering algorithm
  • Using distance metrics for cluster assignment and updates
  • Determining the appropriate number of clusters

6. Measuring performance for classification

  • Working with classification prediction data
  • Examining confusion matrices in detail
  • Evaluating performance using confusion matrices
  • Performance metrics beyond accuracy
  • The kappa statistic
  • Sensitivity and specificity
  • Precision and recall
  • The F-measure
  • Visualizing performance tradeoffs
  • ROC curves
  • Estimating future performance
  • The holdout method
  • Cross-validation
  • Bootstrap sampling

7. Tuning standard models for enhanced performance

  • Automated parameter tuning using caret
  • Creating a simple tuned model
  • Customizing the tuning process
  • Improving model performance through meta-learning
  • Understanding ensembles
  • Bagging
  • Boosting
  • Random forests
  • Training random forests
  • Evaluating random forest performance

MINOR TOPICS

8. Understanding classification using nearest neighbors

  • The kNN algorithm
  • Calculating distance
  • Selecting an appropriate k
  • Preparing data for kNN application
  • Why is the kNN algorithm considered lazy?

9. Understanding classification rules

  • Separate and conquer approach
  • The One Rule algorithm
  • The RIPPER algorithm
  • Deriving rules from decision trees

10. Understanding regression

  • Simple linear regression
  • Ordinary least squares estimation
  • Correlations
  • Multiple linear regression

11. Understanding regression trees and model trees

  • Incorporating regression into trees

12. Understanding association rules

  • The Apriori algorithm for association rule learning
  • Measuring rule interest through support and confidence
  • Constructing a set of rules using the Apriori principle

Extras

  • Spark/PySpark/MLlib and Multi-armed bandits

Requirements

Knowledge of Python

 21 Hours

Testimonials (7)

Related Categories