Company Specific Interview QA

Hadoop Developer Interview Questions

Amazon 
Q. What is the difference between TextInput format and KeyValue format in Hadoop?

Q. Log file contains entries like user A visited page 1, user B visited page 3, user C visited page 2, user D visited page no 4 . How will you implement a Hadoop job for this to answer the following queries in real-time –   Which page was visited by user C more than 4 times in a day and Which page was visited by only one user exactly 3 times in a day?

Q. What is the advantage of having a Distributed Cache in Hadoop?

Q. You have a file that contains 200 billion URLs. How will you find the first unique URL using Hadoop MapReduce?

Q. What is InputSplit in Hadoop?

Q. How will you scale a system to handle huge amounts of unstructured data?

Q. Assume that the web server creates a log file with timestamp and query. How will you design the Hadoop architecture (explaining how you will store the data) that can help you return top 15 queries made in the last 12 hours.

Q. You have a huge file (in GB’s) that contains data in multiple languages. Find n most frequently occurring patterns in a text file using Hadoop MapReduce.

Capgemini
Q. What is speculative execution in Hadoop?

Q. How big data problems are solved in retail sector?

Q. What is the largest amount of data that you have handled?

MindTree
Q. What is heap error and how can you fix it?

Q. How many joins does MapReduce have and when will you use each type of join?

Q. What are sinks and sources in Apache Flume when working with Twitter data?

Q. How many JVMs run on a DataNode and what is their use?

Q. If you have configured Java version 8 for Hadoop and Java version 7 for Apache Spark, how will you set the environment variables in the basic configuration file?

Q. Differentiate between bash and basic profile.

Infosys
Q. Implement word count program in Apache Hive.

Q. Differentiate between Bucketing and Partitioning and when will you use each of these.

Q. How can you implement global sort and partitioning logic in Apache Hive?

Apple
Q. There are 100,000 files spread across multiple servers which need to be processed. How will you do that using Hadoop?

Q. What are the Map and Reduce functions in the standard Hadoop “Hello World” word count program?

Q. How will you manage multiple nodes together without having a master node in your architecture design?

Q. Find the occurrence of every word (the number of pages on which the word is coming) in a huge file or book using Hadoop MapReduce.

Accenture
Q. Can you load 3TB of data in Apache Hive?

Microsoft
Q. Explain the working of Hadoop architecture with various components.

Q. Why do you need HBase when you can use Hive to query Hadoop?

Expedia
Q. Every day a new log file is created that contains User ID details. Given a range of n days, how will you find the top 5 users?

Google
Q. There is a table employee (employee_id int, employee_name varchar, employee_salary decimal, employee_manager_id int). We want to get the details of those employees that have salary more than their manager or do not have a manager at all. Implement the mapper and reducer functions to achieve this using Hadoop.

Q. Can you design a counter across all the Google servers using Hadoop stack?

Twitter
Q. Suggest an algorithm to design Twitter trends.

Q. Will you use Apache Pig or Hadoop MapReduce for ad-hoc and scheduled jobs?

Facebook
Q. There is a huge file that cannot fit into the memory, you have to calculate the number of unique words present in the file. Assume that you have more than one system available and the problem can be distributed.

Q. How does Facebook handle single point of failure problem?

Q. Do you know about the AvatarNode implementation at Facebook?

Q. Facebook decides to award the user with an Audi who submits the billionth search query on a particular day by displaying a banner on their search results page. Considering the scale of Facebook, how will you implement it?

Q. How does Facebook store user’s status updates and likes?

Q. All Facebook messages sent from desktop and Mobile are persisted on which database?

TCS
Q. What is the difference between data and big data?

Q. Which object will you use to track the progress of a job?

Top Tech Other Companies like Cognizant, CTS, Wipro
Q. What Hadoop components will you use to design a Craiglist based architecture?

Q. Why cannot you use Java primitive data types in Hadoop MapReduce?

Q. Can HDFS blocks be broken?

Q. Does Hadoop replace data warehousing systems?

Q. How will you protect the data at rest?

Q. Propose a design to develop a system that can handle ingestion of both periodic data and real-time data.

Q. A folder contains 10000 files with each file having size greater than 3GB.The files contain users, their names and date. How will you get the count of all the unique users from 10000 files using Hadoop?

Q. File could be replicated to 0 Nodes, instead of 1. Have you ever come across this message? What does it mean?

Q. How do reducers communicate with each other?

Q. How can you backup file system metadata in Hadoop?

Q. What do you understand by a straggler in the context of MapReduce? 

As of now we have been able to collect these Hadoop developer interview questions however we would like to get your input such as what questions were you asked in your Hadoop developer interview? Please do comment below with the questions to help the Hadoop community at large.










Share:

1 comment:

Sample Text

Copyright © Become a Big Data - Hadoop Professional Distributed By ITGetup Team & Design by Hadoop Specialist Team