arrow_back
Section 1
Introduction
How to unzip .gz files in a new directory in hadoop?
Scenario Based Question
How does Hadoop Namenode failover process works?
Scenario Based Question
How can we initiate a manual failover when automatic failover is configured?
When not use Hadoop?
Is there a simple command for hadoop that can change the name of a file ?
When To Use Hadoop?
Scenario Based Question
Section 2
Can I have multiple files in HDFS use different block sizes?
Scenario Based Question
As we talk about Hadoop is Highly scalable how well does it Scale?
What platforms and Java versions does Hadoop run on?
What kind of hardware scales best for Hadoop?
Is there an easy way to see the status and health of a cluster?
Scenario Based Question
Scenario Based Question
Scenario Based Question
Scenario Based Question
Section 3
I am seeing connection refused in the logs. How do I troubleshoot this?
Does Hadoop require SSH?
What does NFS: Cannot create lock on (some dir) mean?
Scenario Based Question
Scenario Based Question
Scenario Based Question
Scenario Based Question
Scenario Based Question
Scenario Based Question
Scenario Based Question
Section 4
What is the purpose of the secondary name-node?
Scenario Based Question
How do I set up a hadoop node to use multiple volumes?
Scenario Based Question
Does HDFS make block boundaries between records?
Does Wildcard characters work correctly in FsShell?
What does "file could only be replicated to 0 nodes, instead of 1" mean?
Scenario Based Question
What happens when two clients try to write into the same HDFS file?
How to limit Data node's disk usage?
Section 5
Scenario Based Question
Scenario Based Question
On an individual data node, how do you balance the blocks on the disk?
Scenario Based Question
Difference between hadoop fs -put and hadoop fs -copyFromLocal?
Scenario Based Question
How to check HDFS Directory size?
Scenario Based Question
On what concept the Hadoop framework works?
What is Hadoop streaming?
Section 6
Explain about the process of inter cluster data copying.?
Scenario Based Question
Differentiate between Structured and Unstructured data?
Explain the difference between NameNode, Backup Node and Checkpoint NameNode?
How can you overwrite the replication factors in HDFS?
What is the process to change the files at arbitrary locations in HDFS?
Explain about the indexing process in HDFS?
What is a rack awareness and on what basis is data stored in a rack?
What happens to a NameNode that has no data?
Scenario Based Question
Section 7
Scenario Based Question
Whenever a client submits a hadoop job, who receives it?
What do you understand by edge nodes in Hadoop?
What are real-time industry applications of Hadoop?
What all modes Hadoop can be run in?
Explain the major difference between HDFS block and InputSplit?
What are the most common Input Formats in Hadoop?
What is Speculative Execution in Hadoop?
What is Fault Tolerance?
What is a heartbeat in HDFS?
Section 8
How to keep HDFS cluster balanced?
How to deal with small files in Hadoop?
Scenario Based Question
What type of problems can mapreduce solve?
What is the difference between Hadoop Map Reduce and Google Map Reduce?
How to get the input file name in the mapper in a Hadoop program?
Scenario Based Question
Scenario Based Question
Scenario Based Question
Can you set number of map task in Map reduce?
Section 9
If your Mapreduce Job launches 20 task for 1 job can you limit to 10 task?
Scenario Based Question
What is Shuffling and Sorting in Hadoop MapReduce?
How do I submit extra content (jars, static files, etc) for Mapreduce job to use
How do I get my MapReduce Java Program to read the Cluster's set configuration?
Explain what happens when Hadoop spawned 50 tasks for a job and one of the task
What is OutputCommitter?
What is RecordReader in a Map Reduce?
What is a MapReduce Combiner?
What do you understand by the term Straggler ?
Section 10
What is identity Mapper and identity reducer?
What is the role of a MapReduce partitioner?
When should you use a reducer?
What steps do you follow in order to improve the performace of Mapreduce Job?
What is the purpose of shuffling and sorting phase in the reducer in Map Reduce
Scenario Based Question
What do you understand by compute and storage nodes?
Is it possible to rename the output file?
What is the default input type in MapReduce?
How is reporting controlled in hadoop?
Section 11
Scenario Based Question
How do Map/Reduce InputSplit's handle record boundaries correctly?
Scenario Based Question
Can we search files using wildcards?
What is the difference between Hadoop and RDBMS?
Can reducers communicate with each other?
What is a TaskInstance?
What are the primary phases of a Reducer?
Scenario Based Question
How do you gracefully stop a running job?
Section 12
How do I limit Limiting Task Slot Usage
How to increase the number of slots used?
Scenario Based Question
What is the process of changing the split size if there is limited storage space
Is it important for Hadoop MapReduce jobs to be written in Java?
What is the relationship between Job and Task in Hadoop?
When is it suggested to use a combiner in a MapReduce job?
Explain the differences between a combiner and reducer.
Where is Mapper output stored?
Is it possible to split 100 lines of input as a single split in MapReduce?
Section 13
List the configuration parameters that have to be specified when running a MRjob
Scenario Based Question
When is it not recommended to use MapReduce paradigm for large scale data?
What is the fundamental difference between a MapReduce Split and a HDFS block?
What happens when a DataNode fails during the write process?
Scenario Based Question
How data is spilt in Hadoop?
Explain about the basic parameters of mapper and reducer function.
Preview - Apache Hadoop and Mapreduce Interview Questions and Answers
Discuss (
0
)
navigate_before
Previous
Next
navigate_next