Preview Learn Big Data Hadoop: Hands-On for Beginner

Introduction to Big Data

Introduction

Introduction to Big Data

Three Vs of Big Data

How Big is BIG DATA?

How analysis of Big Data is useful for organizations?

Challenges of Traditional Systems

Big Data Engineering Learning Roadmap

Introduction to HADOOP

What is Hadoop

Why Hadoop and its Use Cases

Different Ecosystems of Hadoop

Structured Unstructured Semi-Structured Data

Relation between Big Data and Hadoop

Future of Hadoop

Challenges with Big Data

Hadoop VS RDBMS

Hadoop VS Data Warehouse

Hadoop VS Teradata

Type of Big Data Projects

What is a Cluster Environment?

What Is a Hadoop Cluster?

Apache Hadoop 3.3.0 Single Node Installation on Windows 10

Download Java 8 and Apache Hadoop 3.3.0

Installing and Configuring Java

Installing and Configuring Hadoop

Starting Apache Hadoop 3.3.0 Single Node Cluster

Stopping Apache Hadoop 3.3.0 Single Node Cluster

Apache Hadoop 3.3.0 Single Node Installation on Ubuntu Linux

Apache Hadoop 3.3.0 installation on Ubuntu Part 1

Apache Hadoop 3.3.0 installation on Ubuntu Part 2

(Ubuntu) Starting Apache Hadoop 3.3.0 Single Node Cluster

(Ubuntu) Stopping Apache Hadoop 3.3.0 Single Node Cluster

HDFS (Hadoop Distributed File System) Commands

Hadoop Distributed File System (HDFS)

File System (FS) shell

(Hands On) FileSystem Shell Command to Check Hadoop version

(Hands On) FileSystem Shell Command to get help for any command

(Hands On) FileSystem Shell Command to Make Directory in HDFS

(Hands On) FileSystem Shell Command to display data [cat]

(Hands On) FileSystem Shell Command [checksum]

(Hands On) FileSystem Shell Command [copyFromLocal]

(Hands On) FileSystem Shell Command [copyToLocal]

(Hands On) FileSystem Shell Command [count]

(Hands On) FileSystem Shell Command [cp]

(Hands On) FileSystem Shell Command [df]

(Hands On) FileSystem Shell Command [du]

(Hands On) FileSystem Shell Command [find]

(Hands On) FileSystem Shell Command [get]

(Hands On) FileSystem Shell Command [getfacl]

(Hands On) FileSystem Shell Command [head]

(Hands On) FileSystem Shell Command [ls]

(Hands On) FileSystem Shell Command [moveFromLocal]

(Hands On) FileSystem Shell Command [mv]

(Hands On) FileSystem Shell Command [put]

(Hands On) FileSystem Shell Command [rm]

(Hands On) FileSystem Shell Command [rmdir]

(Hands On) FileSystem Shell Command [tail]

(Hands On) FileSystem Shell Command [touchz]

(Hands On) FileSystem Shell Command to append data [appendToFile]

(Hands On) FileSystem Shell Command to change group [chgrp]

(Hands On) FileSystem Shell Command to change permission [chmod]

(Hands On) FileSystem Shell Command to change owner [chown]

(Hands On) FileSystem Shell Command to merge files [getmerge]

(Hands On) FileSystem Shell Command to change replication [setrep]

(Hands On) FileSystem Shell Command to view statistics [stat]

(Hands On) FileSystem Shell Command to change modifying timestamp [touch]

(Hands On) FileSystem Shell Command to concat files[concat]

(Hands On) FileSystem Shell Command to display classpath [classpath]

(Hands On) FileSystem Shell Command to display environment variables [envvars]

(Hands On) FileSystem Shell Command fsck [fsck]

(Hands On) FileSystem Shell Command getconf[getconf]

(Hands On) FileSystem Shell Command group[group]

(Hands On) FileSystem Shell Command datanode[datanode]

HDFS and YARN Architecture

HDFS Overview

HDFS Architecture

Storage aspects of HDFS

Hadoop Modes of Installation

NameNode

DataNode

NodeManager

ResourceManager

Secondary NameNode

Data Replication

Rack Awareness

Robustness

HDFS Snapshot

Balancer

YARN

What is YARN?

Difference between Map Reduce & YARN

YARN Architecture

Scheduler for Yarn (CapacityScheduler/Fair Scheduler)

Examples Running Mapreduce on YARN

YARN Web UI

MapReduce

Overview

What is MapReduce?

Mapreduce Limitation

Mapper

Reduce

Shuffle

Sort

Secondary Sort

How Many Maps?

How Many Reduces?

Reducer NONE

Partitioner

Counter

InputSplit

RecordReader

Example

FAQ in Apache Hadoop and Mapreduce Interview

How to unzip .gz files in a new directory in hadoop?

Scenario Based Question

Can I have multiple files in HDFS use different block sizes?

Does Wildcard characters work correctly in FsShell?

How to deal with small files in Hadoop?

What steps do you follow in order to improve the performace of Mapreduce Job?

What is the purpose of shuffling and sorting phase in the reducer in Map Reduce

Is it important for Hadoop MapReduce jobs to be written in Java?

Apache Pig

Introduction to Apache Pig

Map Reduce Vs Apache Pig

Installing Apache Pig

Execution Modes

Batch Mode

Pig Latin Statements

Data types

Example of Simple Data Type

Example of Complex Data Type

Loading Data

Working with Data

FILTER operator (Hands On)

FOREACH operator (Hands On)

GROUP operator (Hands On)

COGROUP operator (Hands On)

JOIN operator (Hands On)

UNION operator (Hands On)

SPLIT operator (Hands On)

Storing Data (Hands On)

Debugging Pig Latin (Hands On)

DUMP operator (Hands On)

DESCRIBE operator (Hands On)

EXPLAIN operator (Hands On)

ILLUSTRATE operator (Hands On)

Comparison Operators (Hands On)

ORDER BY operator (Hands On)

RANK operator (Hands On)

Apache Pig - Built In Functions

AVG - Eval Functions (Hands On)

CONCAT - Eval Functions (Hands On)

COUNT - Eval Functions (Hands On)

MAX - Eval Functions (Hands On)

MIN - Eval Functions (Hands On)

SIZE - Eval Functions (Hands On)

SUM - Eval Functions (Hands On)

IN - Eval Functions (Hands On)

ABS - Math Functions (Hands On)

CBRT - Math Functions (Hands On)

FLOOR - Math Functions (Hands On)

LOG - Math Functions (Hands On)

FAQ in Apache Pig

Scenario Based Question (File modification based)

How to remove single quotes from data using Pig?

How to compute sum of a field in all the rows from an alias?

Is there a way to do this? eg, pass the name of the file to be processed, etc?

Scenario Based Question (Date)

How to do Transpose in corresponding few columns in pig?

Scenario Based Question (Programming)

Write a word count program in pig?

How to load files with different delimiter each time in piglatin?

Apache Hive

Introduction to Apache Hive

Hive Architecture

How a Hive query flows through the system.

Hive Features

Hive Limitation

Installation Steps of Apache Hive

Installing Apache Hive on Windows Machine using Docker Desktop

Installing Docker on Windows

Downloading Apache Hive Image on Docker

Running Apache Hive on Docker Desktop

Docker Hive Installation Commands

Hive Data Model Diagram

Tables

Partitions

Buckets or Clusters

Hive Data Types

Primitive Type

Complex Type

Create Database

Create Table

Create Table (Hands On)

Managed and External Tables

Managed and External Tables (Hands On)

Storage Formats

LOAD

SELECT

INSERT

UPDATE

DELETE

String Functions

Metastore

Partitions in Detail

Partitions (Hands On)

Bucketing Theory

Bucketing (Hands On)

Frequently Asked Interview Question and Answers (Hive)

How to create HIVE Table with multi character delimiter?

How to load Data from a .txt file to Table Stored as ORC in Hive?

How to skip header rows from a table in Hive?

Create single Hive table for small files without degrading performance in Hive?

How will you consume this CSV file into the Hive warehouse using built SerDe?

Is it possible to change the default location of a managed table?

Can hive queries be executed from script files? How?

Can we run unix shell commands from hive? Give example?

Apache Sqoop

Introduction to Apache Sqoop

Installing Apache Sqoop on Ubuntu

MySQL client and Server Installation

Importing Data with Apache Sqoop

Scoop-import

Mandatory Steps to be performed at mysql

Mandatory Steps to be performed at sqoop lib Directory

Transferring an Entire Table

Specifying a Target Directory

Importing Only a Subset of Data

Protecting Your Password

Using a File Format

Compressing Imported Data

Speeding Up Transfers

Controlling Parallelism

Importing All Your Tables

Importing Only New Data (Incremental Import)

Free-Form Query Import

Exporting Data with Apache Sqoop

Sqoop-export

Transferring Data from Hadoop

Apache Flume

Introduction to Apache Flume

Installing Apache Flume on Ubuntu

Apache Flume Architecture

Features of Apache Flume

Pros and Cons of Apache Flume

When should you go for Apache Flume?

Apache Flume Applications

Hands on Example

Apache Kafka

What is event streaming?

Introduction to Apache Kafka

How does Kafka work in a nutshell?

Elements of Kafka

Core Component of Apache Kafka

Installing Single Node Kafka Cluster

Sending Data file to Kafka Topic

Reading Kafka Topic

Kafka Command-Line Interface (CLI) Tools

kafka-server-start.sh

kafka-server-stop.sh

zookeeper-server-start.sh

zookeeper-server-stop.sh

kafka-cluster.sh

kafka-broker-api-versions.sh

kafka-topics.sh

kafka-console-producer.sh

kafka-console-consumer.sh

kafka-producer-perf-test

Kafka Topic Operations

Add a Topic

Describe a Topic

Change the retention value for a topic

Increase partitions for a topic

Delete a Topic

Python using Databricks

Getting Started with Python

Variables and Data Types

Conditionals and Loops

Methods Functions and Packages

Collection and Classes

Preview - Learn Big Data Hadoop: Hands-On for Beginner