arrow_back
Introduction
Introduction
Download Resources
Download Source Code for Spark Course
Download Data for Spark Course
Setting Up the Environment
Impact of Databricks Community Edition Changes (2026) & Transition to Zeppelin
Requirements
(Hands On) Installing JAVA
Steps for Installing JAVA
(Hands On) Setting JAVA environments
Steps for Setting JAVA environments
(Hands On) Apache Zeppelin Installation Steps on Ubuntu machine
Steps for Installing Apache Zeppelin on Ubuntu machine
(Hands On) Installing Docker Desktop on Windows 10/11
Steps for Installing Docker on Windows
(Hands On) Running Apache Zeppelin on Docker (Windows)
Steps for Running Apache Zeppelin on Docker
(Hands On) Configure and Connect to Spark interpreter
Steps for Configure and Connect to Spark Interpreter
Zeppelin Basics
What is Apache Zeppelin
Features & Benefits
Notebook UI Overview
Markdown and text formatting
Creating and Running Paragraphs
Hands on Creating and Running paragraphs
Visualization Options (Tables, Bar chart, Pie chart, etc.)
Hands On - Types of Default Chart in Zeppelin
Zeppelin with Apache Spark
Spark interpreter details
Working with RDDs and DataFrames
Spark SQL queries and caching
Visualizing Spark outputs
Job tracking and performance tuning basics
Introduction to Spark and Spark Architecture Components
Introduction to Apache Spark
(Old) Free Account creation in Databricks
(New) Free Account creation in Databricks
Provisioning a Spark Cluster
Basics about notebooks
Why we should learn Apache Spark?
Spark Architecture Components
Driver
Partitions
Executors
Spark Execution
Spark Jobs
Spark Stages
Spark Tasks
Practical Demonstration of Jobs, Tasks and Stages
Spark SQL, DataFrames and Datasets
Spark RDD (Create and Display Practical)
Spark Dataframe (Create and Display Practical)
Anonymus Functions in Scala
Extra (Optional on Spark DataFrame)
Extra (Optional on Spark DataFrame) in Details
Spark Datasets (Create and Display Practical)
Caching
Notes on reading files with Spark
Data Source CSV File
Data Source JSON File
Data Source LIBSVM File
Data Source Image File
Data Source Arvo File
Data Source Parquet File
Untyped Dataset Operations (aka DataFrame Operations)
Running SQL Queries Programmatically
Global Temporary View
Creating Datasets
Scalar Functions (Built-in Scalar Functions) Part 1
Scalar Functions (Built-in Scalar Functions) Part 2
Scalar Functions (Built-in Scalar Functions) Part 3
User Defined Scalar Functions
Spark RDD
Operations
Transformations
map(function)
filter(function)
flatMap(function)
mapPartitions(func)
mapPartitionsWithIndex(func)
sample(withReplacement, fraction, seed)
union(otherDataset)
intersection(otherDataset)
distinct([numPartitions]))
groupby(func)
groupByKey([numPartitions])
reduceByKey(func, [numPartitions])
aggregateByKey(zeroValue)(seqOp, combOp, [numPartitions])
sortByKey([ascending], [numPartitions])
join(otherDataset, [numPartitions])
cogroup(otherDataset, [numPartitions])
cartesian(otherDataset)
coalesce(numPartitions)
repartition(numPartitions)
repartitionAndSortWithinPartitions(partitioner)
Wide vs Narrow Transformations
Actions
reduce(func)
collect()
count()
first()
take(n)
takeSample(withReplacement, num, [seed])
takeOrdered(n, [ordering])
countByKey()
foreach(func)
Shuffling
Persistence (Cache)
Unpersist
Broadcast Variables
Accumulators
Preview - Apache Spark with Scala useful for Databricks Certification
Discuss (
0
)
navigate_before
Previous
Next
navigate_next