Preview Machine Learning with Spark 3.0

Introduction

Overview

What is Spark ML

Introduction to Machine Learning

Setting Up the Environment

Impact of Databricks Community Edition Changes (2026) & Transition to Zeppelin

Requirements

(Hands On) Installing JAVA

Steps for Installing JAVA

(Hands On) Setting JAVA environments

Steps for Setting JAVA environments

(Hands On) Apache Zeppelin Installation Steps on Ubuntu machine

Steps for Installing Apache Zeppelin on Ubuntu machine

(Hands On) Installing Docker Desktop on Windows 10/11

Steps for Installing Docker on Windows

(Hands On) Running Apache Zeppelin on Docker (Windows)

Steps for Running Apache Zeppelin on Docker

(Hands On) Configure and Connect to Spark interpreter

Steps for Configure and Connect to Spark Interpreter

Zeppelin Basics

What is Zeppelin

Features & Benefits

Notebook UI Overview

Markdown and text formatting

Creating and running paragraphs

Hands on Creating and Running paragraphs

Visualization Options (Tables, Bar chart, Pie chart, etc.)

Hands On - Types of Default Chart in Zeppelin

Zeppelin with Apache Spark

Spark interpreter details

Working with RDDs and DataFrames

Spark SQL queries and caching

Visualizing Spark outputs

Job tracking and performance tuning basics

Apache Spark Basics (Optional)

Introduction to Apache Spark

(Old) Free Account creation in Databricks

Provisioning a Spark Cluster

Basics about notebooks

Why we should learn Apache Spark?

Spark RDD (Create and Display Practical)

Spark Dataframe (Create and Display Practical)

Anonymus Functions in Scala

Extra (Optional on Spark DataFrame)

Extra (Optional on Spark DataFrame) in Details

Spark Datasets (Create and Display Practical)

Apache Spark Machine Learning

Types of Machine Learning

Steps involved in Machine Learning Program

Spark MLlib

Importing Notebook and Data Upload

Basic statistics Correlation

Data Source

Data Source CSV File

Data Source JSON File

Data Source LIBSVM File

Data Source Image File

Data Source Arvo File

Data Source Parquet File

Machine Learning Data Pipeline Overview

Machine Learning Project as an Example

Machine Learning Pipeline Example Project (Will it Rain Tomorrow in Australia) 1

Machine Learning Pipeline Example Project (Will it Rain Tomorrow in Australia) 2

Machine Learning Pipeline Example Project (Will it Rain Tomorrow in Australia) 3

Components of a Machine Learning Pipeline

Extracting, transforming and selecting features

TF-IDF (Feature Extractor)

Word2Vec (Feature Extractor)

CountVectorizer (Feature Extractor)

FeatureHasher (Feature Extractor)

Tokenizer (Feature Transformers)

StopWordsRemover (Feature Transformers)

n-gram (Feature Transformers)

Binarizer (Feature Transformers)

PCA (Feature Transformers)

Polynomial Expansion (Feature Transformers)

Discrete Cosine Transform (DCT) (Feature Transformers)

StringIndexer (Feature Transformers)

IndexToString (Feature Transformers)

OneHotEncoder (Feature Transformers)

SQLTransformer (Feature Transformers)

VectorAssembler (Feature Transformers)

RFormula (Feature Selector)

ChiSqSelector (Feature Selector)

Classification Model

Decision tree classifier Project

Logistic regression Model (Classification Model It has regression in the name)

Naive Bayes Project (Iris flower class prediction)

Random Forest Classifier Project

Gradient-boosted tree classifier Project

Linear Support Vector Machine Project

One-vs-Rest classifier (a.k.a. One-vs-All) Project

Regression model

Linear Regression Model Project

Decision tree regression Model Project

Random forest regression Model Project

Gradient-boosted tree regression Model Project

Clustering KMeans Project (Mall Customer Segmentation)

Explanation of few terms used in Model

Linear Regression Model Project - Predict Ads Click

Download Data

Download Source Code

Predict Ads Code and Data (Project)

Preview - Machine Learning with Spark 3.0