arrow_back
Introduction
Introduction
Download Resources
Download Source Code for Spark Course
Download Data for Spark Course
Introduction to Spark and Spark Architecture Components
Introduction to Apache Spark
(Old) Free Account creation in Databricks
(New) Free Account creation in Databricks
Provisioning a Spark Cluster
Basics about notebooks
Why we should learn Apache Spark?
Spark Architecture Components
Driver
Partitions
Executors
Spark Execution
Spark Jobs
Spark Stages
Spark Tasks
Practical Demonstration of Jobs, Tasks and Stages
Spark SQL, DataFrames and Datasets
Spark RDD (Create and Display Practical)
Spark Dataframe (Create and Display Practical)
Anonymus Functions in Scala
Extra (Optional on Spark DataFrame)
Extra (Optional on Spark DataFrame) in Details
Spark Datasets (Create and Display Practical)
Caching
Notes on reading files with Spark
Data Source CSV File
Data Source JSON File
Data Source LIBSVM File
Data Source Image File
Data Source Arvo File
Data Source Parquet File
Untyped Dataset Operations (aka DataFrame Operations)
Running SQL Queries Programmatically
Global Temporary View
Creating Datasets
Scalar Functions (Built-in Scalar Functions) Part 1
Scalar Functions (Built-in Scalar Functions) Part 2
Scalar Functions (Built-in Scalar Functions) Part 3
User Defined Scalar Functions
Spark RDD
Operations
Transformations
map(function)
filter(function)
flatMap(function)
mapPartitions(func)
mapPartitionsWithIndex(func)
sample(withReplacement, fraction, seed)
union(otherDataset)
intersection(otherDataset)
distinct([numPartitions]))
groupby(func)
groupByKey([numPartitions])
reduceByKey(func, [numPartitions])
aggregateByKey(zeroValue)(seqOp, combOp, [numPartitions])
sortByKey([ascending], [numPartitions])
join(otherDataset, [numPartitions])
cogroup(otherDataset, [numPartitions])
cartesian(otherDataset)
coalesce(numPartitions)
repartition(numPartitions)
repartitionAndSortWithinPartitions(partitioner)
Wide vs Narrow Transformations
Actions
reduce(func)
collect()
count()
first()
take(n)
takeSample(withReplacement, num, [seed])
takeOrdered(n, [ordering])
countByKey()
foreach(func)
Shuffling
Persistence (Cache)
Unpersist
Broadcast Variables
Accumulators
Preview - Apache Spark with Scala useful for Databricks Certification
Discuss (
0
)
navigate_before
Previous
Next
navigate_next