Preview Apache Hadoop and Mapreduce Interview Q&A

Section 1

Introduction

How to unzip .gz files in a new directory in hadoop? (Theory)

How to unzip .gz files in a new directory in hadoop? (Hands On)

Scenario Based Question

How does Hadoop Namenode failover process works

Scenario Based Question

How can we initiate a manual failover when automatic failover is configured?

When not use Hadoop?

Is there a simple command for hadoop that can change the name of a file ?

When To Use Hadoop?

Scenario Based Question

Section 2

Can I have multiple files in HDFS use different block sizes?

Scenario Based Question

As we talk about Hadoop is Highly scalable how well does it Scale?

What platforms and Java versions does Hadoop run on?

What kind of hardware scales best for Hadoop?

Is there an easy way to see the status and health of a cluster?

Scenario Based Question

Section 3

How to Troubleshoot “Connection Refused” Errors in Hadoop Cluster

Does Hadoop require SSH?

What does NFS: Cannot create lock on (some dir) mean?

Scenario Based Question

Section 4

What is the purpose of the secondary name-node?

Scenario Based Question

How to Configure Multiple Storage Volume in Hadoop Nodes

Scenario Based Question

Does HDFS Maintain Record Boundaries Between Data Blocks?

Does Wildcard characters work correctly in FsShell?

Hadoop Error: File Could Only Be Replicated to 0 Nodes

Scenario Based Question

What happens when two clients try to write into the same HDFS file?

How to limit Data node's disk usage?

Section 5

Scenario Based Question

On an individual data node, how do you balance the blocks on the disk?

Scenario Based Question

Difference between hadoop fs -put and hadoop fs -copyFromLocal?

Scenario Based Question

How to check HDFS Directory size?

Scenario Based Question

On what concept the Hadoop framework works?

What is Hadoop streaming?

Section 6

Explain about the process of inter cluster data copying.?

Scenario Based Question

Differentiate between Structured and Unstructured data?

Explain the difference between NameNode, Backup Node and Checkpoint NameNode?

How can you overwrite the replication factors in HDFS?

What is the process to change the files at arbitrary locations in HDFS?

Explain about the indexing process in HDFS?

What is a rack awareness and on what basis is data stored in a rack?

What happens to a NameNode that has no data?

Scenario Based Question

Section 7

Scenario Based Question

Whenever a client submits a hadoop job who receives it?

What do you understand by edge nodes in Hadoop?

What are real-time industry applications of Hadoop?

What all modes Hadoop can be run in?

Explain the major difference between HDFS block and InputSplit?

What are the most common Input Formats in Hadoop?

What is Speculative Execution in Hadoop?

What is Fault Tolerance?

What is a heartbeat in HDFS?

Section 8

How to keep HDFS cluster balanced?

How to deal with small files in Hadoop?

Scenario Based Question

What type of problems can mapreduce solve?

What is the difference between Hadoop Map Reduce and Google Map Reduce?

How to get the input file name in the mapper in a Hadoop program?

Scenario Based Question

Can you set number of map task in Map reduce?

Section 9

If your Mapreduce Job launches 20 task for 1 job can you limit to 10 task?

Scenario Based Question

What is Shuffling and Sorting in Hadoop MapReduce?

How do I submit extra content (jars, static files, etc) for Mapreduce job to use

How do I get my MapReduce Java Program to read the Cluster's set configuration?

Explain what happens when Hadoop spawned 50 tasks for a job and one of the task

What is OutputCommitter?

What is RecordReader in a Map Reduce?

What is a MapReduce Combiner?

What do you understand by the term Straggler ?

Section 10

What is identity Mapper and identity reducer?

What is the role of a MapReduce partitioner?

When should you use a reducer?

What steps do you follow in order to improve the performace of Mapreduce Job?

What is the purpose of shuffling and sorting phase in the reducer in Map Reduce

Scenario Based Question

What do you understand by compute and storage nodes?

Is it possible to rename the output file?

What is the default input type in MapReduce?

How is reporting controlled in hadoop?

Section 11

Scenario Based Question

How do Map/Reduce InputSplit's handle record boundaries correctly?

Scenario Based Question

Can we search files using wildcards

What is the difference between Hadoop and RDBMS?

Can reducers communicate with each other?

What is a TaskInstance?

What are the primary phases of a Reducer?

Scenario Based Question

How do you gracefully stop a running job?

Section 12

How do I limit Limiting Task Slot Usage

How to increase the number of slots used?

Scenario Based Question

What is the process of changing the split size if there is limited storage space

Is it important for Hadoop MapReduce jobs to be written in Java?

What is the relationship between Job and Task in Hadoop?

When is it suggested to use a combiner in a MapReduce job?

Explain the differences between a combiner and reducer.

Where is Mapper output stored?

Is it possible to split 100 lines of input as a single split in MapReduce?

Section 13

List the configuration parameters that have to be specified when running a MRjob

Scenario Based Question

When is it not recommended to use MapReduce paradigm for large scale data?

What is the fundamental difference between a MapReduce Split and a HDFS block?

What happens when a DataNode fails during the write process?

Scenario Based Question

How data is spilt in Hadoop?

Explain about the basic parameters of mapper and reducer function.

Preview - Apache Hadoop and Mapreduce Interview Q&A