Hadoop Developer Interview Questions
Amazon
Q. What is the
difference between TextInput format and KeyValue format in Hadoop?
Q. Log file contains entries
like user A visited page 1, user B visited page 3, user C visited page 2, user
D visited page no 4 . How will you implement a Hadoop job for this to answer
the following queries in real-time – Which page was visited by user C more
than 4 times in a day and Which page was visited by only one user exactly 3
times in a day?
Q. What is the advantage of
having a Distributed Cache in Hadoop?
Q. You have a file that contains
200 billion URLs. How will you find the first unique URL using Hadoop
MapReduce?
Q. What is InputSplit in
Hadoop?
Q. How will you scale a
system to handle huge amounts of unstructured data?
Q. Assume that the web
server creates a log file with timestamp and query. How will you design the
Hadoop architecture (explaining how you will store the data) that can help you
return top 15 queries made in the last 12 hours.
Q. You have a huge file (in
GB’s) that contains data in multiple languages. Find n most frequently
occurring patterns in a text file using Hadoop MapReduce.
Capgemini
Q. What is speculative
execution in Hadoop?
Q. How big data problems are
solved in retail sector?
Q. What is the largest
amount of data that you have handled?
MindTree
Q. What is heap error and how can
you fix it?
Q. How many joins does MapReduce
have and when will you use each type of join?
Q. What are sinks and sources in
Apache Flume when working with Twitter data?
Q. How many JVMs run on a DataNode
and what is their use?
Q. If you have configured Java
version 8 for Hadoop and Java version 7 for Apache Spark, how will you set the
environment variables in the basic configuration file?
Q. Differentiate between bash and
basic profile.
Infosys
Q. Implement word count program in
Apache Hive.
Q. Differentiate between Bucketing
and Partitioning and when will you use each of these.
Q. How can you implement global
sort and partitioning logic in Apache Hive?
Apple
Q. There are 100,000 files spread
across multiple servers which need to be processed. How will you do that using
Hadoop?
Q. What are the Map and Reduce
functions in the standard Hadoop “Hello World” word count program?
Q. How will you manage multiple
nodes together without having a master node in your architecture design?
Q. Find the occurrence of every
word (the number of pages on which the word is coming) in a huge file or book
using Hadoop MapReduce.
Accenture
Q. Can you load 3TB of data in Apache Hive?
Microsoft
Q. Explain the working of Hadoop architecture with various
components.
Q. Why do you need HBase when you can use Hive to query Hadoop?
Expedia
Q. Every day a new log file is created that contains User ID details.
Given a range of n days, how will you find the top 5 users?
Google
Q. There is a table employee (employee_id int, employee_name varchar,
employee_salary decimal, employee_manager_id int). We want to get the details
of those employees that have salary more than their manager or do not have a
manager at all. Implement the mapper and reducer functions to achieve this
using Hadoop.
Q. Can you design a counter across all the Google servers using
Hadoop stack?
Twitter
Q. Suggest an algorithm to design Twitter trends.
Q. Will you use Apache Pig or Hadoop MapReduce for ad-hoc and
scheduled jobs?
Facebook
Q. There is a huge file that cannot fit into the memory, you have to
calculate the number of unique words present in the file. Assume that you have
more than one system available and the problem can be distributed.
Q. How does Facebook handle single point of failure problem?
Q. Do you know about the AvatarNode implementation at Facebook?
Q. Facebook decides to award the user with an Audi who submits the
billionth search query on a particular day by displaying a banner on their
search results page. Considering the scale of Facebook, how will you implement
it?
Q. How does Facebook store user’s status updates and likes?
Q. All Facebook messages sent from desktop and Mobile are persisted
on which database?
TCS
Q. What is the difference between data and big data?
Q. Which object will you use to track the progress of a job?
Top Tech Other Companies like Cognizant, CTS, Wipro
Q. What Hadoop components will you use to design a Craiglist based
architecture?
Q. Why cannot you use Java primitive data types in Hadoop MapReduce?
Q. Can HDFS blocks be broken?
Q. Does Hadoop replace data warehousing systems?
Q. How will you protect the data at rest?
Q. Propose a design to develop a system that can handle ingestion of
both periodic data and real-time data.
Q. A folder contains 10000 files with each file having size greater
than 3GB.The files contain users, their names and date. How will you get the
count of all the unique users from 10000 files using Hadoop?
Q. File could be replicated to 0 Nodes, instead of 1. Have you ever
come across this message? What does it mean?
Q. How do reducers communicate with each other?
Q. How can you backup file system metadata in Hadoop?
Q. What do you understand by a straggler in the context of MapReduce?
As of now we have been able to collect these Hadoop developer
interview questions however we would like to get your input such as what
questions were you asked in your Hadoop developer interview? Please do comment
below with the questions to help the Hadoop community at large.
This comment has been removed by the author.
ReplyDelete