Hortonworks-Certified-Apache-Hadoop-2.0-Developer :
Practice Test
Question No : 1
What does the following command do?
register '/piggyban):/pig-files.jar';
A. Invokes the
user-defined functions contained in the jar file
B. Assigns a name to a
user-defined function or streaming command
C. Transforms Pig
user-defined functions into a format that Hive can accept
D. Specifies the
location of the JAR file containing the user-defined functions
Answer: D
Question No : 2
Consider the following two relations, A and B.
A Pig JOIN statement that combined relations A by its
first field and B by its second field
would produce what output?
A. 2 Jim Chris 2
3 Terry 3
4 Brian 4
B. 2 cherry
2 cherry
3 orange
4 peach
C. 2 cherry Jim, Chris
3 orange Terry
4 peach Brian
D. 2 cherry Jim 2
2 cherry Chris 2
3 orange Terry 3
4 peach Brian 4
Answer: D
Question No : 3
You have user profile records in your OLPT database, that
you want to join with web logs
you have already ingested into the Hadoop file system.
How will you obtain these user
A. HDFS command
B. Pig LOAD command
C. Sqoop import
E. Ingest with Flume
F. Ingest with Hadoop
Answer: C
Reference: Hadoop and Pig for Large-Scale Web Log Analysis
Question No : 4
You have written a Mapper which invokes the following
five calls to the
OutputColletor.collect method:
output.collect (new Text (“Apple”), new Text (“Red”) ) ;
output.collect (new Text (“Banana”), new Text (“Yellow”)
) ;
output.collect (new Text (“Apple”), new Text (“Yellow”) )
output.collect (new Text (“Cherry”), new Text (“Red”) ) ;
output.collect (new Text (“Apple”), new Text (“Green”) )
How many times will the Reducer’s reduce method be
A. 6
B. 3
C. 1
D. 0
E. 5
Answer: B
Explanation: reduce()
gets called once for each [key, (list of values)] pair. To explain, let's
say you called:
out.collect(new Text("Car"),new
out.collect(new Text("Car"),new
out.collect(new Text("Car"),new
out.collect(new Text("Truck"),new
out.collect(new Text("Truck"),new
Then reduce() would be called twice with the pairs
reduce(Car, <Subaru, Honda, Ford>)
reduce(Truck, <Dodge, Chevy>)
Reference: Mapper output.collect()?
Question No : 5
Given the following Pig commands:
Which one of the following statements is true?
A. The $1 variable
represents the first column of data in 'my.log'
B. The $1 variable
represents the second column of data in 'my.log'
C. The severe relation
is not valid
D. The grouped relation
is not valid
Answer: B
Question No : 6
What data does a Reducer reduce method process?
A. All the data in a
single input file.
B. All data produced by
a single mapper.
C. All data for a given
key, regardless of which mapper(s) produced it.
D. All data for a given
value, regardless of which mapper(s) produced it.
Answer: C
Explanation: Reducing
lets you aggregate values together. A reducer function receives an
iterator of input values from an input list. It then
combines these values together, returning
a single output value.
All values with the same key are presented to a single
reduce task.
Reference: Yahoo! Hadoop Tutorial, Module 4: MapReduce
Question No : 7
All keys used for intermediate output from mappers must:
A. Implement a
splittable compression algorithm.
B. Be a subclass of
C. Implement
D. Override isSplitable.
E. Implement a
comparator for speedy sorting.
Answer: C
Explanation: The
MapReduce framework operates exclusively on <key, value> pairs, that
is, the framework views the input to the job as a set of
<key, value> pairs and produces a
set of <key, value> pairs as the output of the job,
conceivably of different types.
The key and value classes have to be serializable by the
framework and hence need to
implement the Writable interface. Additionally, the key
classes have to implement the
WritableComparable interface to facilitate sorting by the
Reference: MapReduce Tutorial
Question No : 8
Which Hadoop component is responsible for managing the
distributed file system
A. NameNode
B. Metanode
C. DataNode
D. NameSpaceManager
Answer: A
Question No : 9
You need to move a file titled “weblogs” into HDFS. When
you try to copy the file, you can’t.
You know you have ample space on your DataNodes. Which
action should you take to
relieve this situation and store more files in HDFS?
A. Increase the block
size on all current files in HDFS.
B. Increase the block
size on your remaining files.
C. Decrease the block
size on your remaining files.
D. Increase the amount
of memory for the NameNode.
E. Increase the number
of disks (or size) for the NameNode.
F. Decrease the block
size on all current files in HDFS.
Answer: C
Question No : 10
In the reducer, the MapReduce API provides you with an
iterator over Writable values.
What does calling the next () method return?
A. It returns a
reference to a different Writable object time.
B. It returns a
reference to a Writable object from an object pool.
C. It returns a
reference to the same Writable object each time, but populated with different
D. It returns a
reference to a Writable object. The API leaves unspecified whether this is a
reused object or a new object.
E. It returns a
reference to the same Writable object if the next value is the same as the
previous value, or a new Writable object otherwise.
Answer: C
Explanation: Calling
Iterator.next() will always return the SAME EXACT instance of
IntWritable, with the contents of that instance replaced
with the next value.
Reference: manupulating iterator in mapreduce
Question No : 11
MapReduce v2 (MRv2/YARN) splits which major functions of
the JobTracker into separate
daemons? Select two.
A. Heath states checks
B. Resource management
C. Job
D. Job coordination
between the ResourceManager and NodeManager
E. Launching tasks
F. Managing file system
G. MapReduce metric
H. Managing tasks
Answer: B,C
Explanation: The
fundamental idea of MRv2 is to split up the two major functionalities of
the JobTracker, resource management and job
scheduling/monitoring, into separate
daemons. The idea is to have a global ResourceManager
(RM) and per-application
ApplicationMaster (AM). An application is either a single
job in the classical sense of Map-
Reduce jobs or a DAG of jobs.
The central goal of YARN is to clearly separate two
things that are unfortunately smushed
together in current Hadoop, specifically in (mainly)
/ Monitoring the status of the cluster with respect to
which nodes have which resources
available. Under YARN, this will be global.
/ Managing the parallelization execution of any specific
job. Under YARN, this will be done
separately for each job.
Reference: Apache Hadoop YARN – Concepts &
Question No : 12
For each input key-value pair, mappers can emit:
A. As many intermediate
key-value pairs as designed. There are no restrictions on the
types of those key-value pairs (i.e., they can be
B. As many intermediate
key-value pairs as designed, but they cannot be of the same type
as the input key-value pair.
C. One intermediate
key-value pair, of a different type.
D. One intermediate
key-value pair, but of the same type.
E. As many intermediate
key-value pairs as designed, as long as all the keys have the
same types and all the values have the same type.
Answer: E
Explanation: Mapper
maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks that transform input
records into intermediate records. The
transformed intermediate records do not need to be of the
same type as the input records.
A given input pair may map to zero or many output pairs.
Reference: Hadoop Map-Reduce Tutorial
Question No : 13
Which one of the following statements describes the
relationship between the
ResourceManager and the ApplicationMaster?
A. The ApplicationMaster
requests resources from the ResourceManager
B. The ApplicationMaster
starts a single instance of the ResourceManager
C. The ResourceManager
monitors and restarts any failed Containers of the
D. The ApplicationMaster
starts an instance of the ResourceManager within each
Answer: A
Question No : 14
Given the following Hive commands:
Which one of the following statements Is true?
A. The file mydata.txt
is copied to a subfolder of /apps/hive/warehouse
B. The file mydata.txt
is moved to a subfolder of /apps/hive/warehouse
C. The file mydata.txt
is copied into Hive's underlying relational database 0.
D. The file mydata.txt
does not move from Its current location in HDFS
Answer: A
Question No : 15
Which YARN component is responsible for monitoring the
success or failure of a
A. ResourceManager
B. ApplicationMaster
C. NodeManager
D. JobTracker
Answer: A
Question No : 16
When can a reduce class also serve as a combiner without
affecting the output of a
MapReduce program?
A. When the types of the
reduce operation’s input key and input value match the types of
the reducer’s output key and output value and when the
reduce operation is both
communicative and associative.
B. When the signature of
the reduce method matches the signature of the combine
C. Always. Code can be
reused in Java since it is a polymorphic object-oriented
programming language.
D. Always. The point of
a combiner is to serve as a mini-reducer directly after the map
phase to increase performance.
E. Never. Combiners and
reducers must be implemented separately because they serve
different purposes.
Answer: A
Explanation: You can
use your reducer code as a combiner if the operation performed is
commutative and associative.
Reference: 24 Interview Questions & Answers for
Hadoop MapReduce developers, What
are combiners? When should I use a combiner in my
MapReduce Job?
Question No : 17
Review the following data and Pig code:
What command to define B would produce the output
(M,62,95l02) when invoking the
DUMP operator on B?
A. B = FILTER A BY (zip
= = '95102' AND gender = = M");
(gender = = 'M' AND zip = = '95102');
C. B = JOIN A BY (gender
= = 'M' AND zip = = '95102');
D. B= GROUP A BY (zip =
= '95102' AND gender = = 'M');
Answer: A
Question No : 18
Which one of the following Hive commands uses an HCatalog
table named x?
D. Hive commands cannot
reference an HCatalog table
Answer: C
Question No : 19
To use a lava user-defined function (UDF) with Pig what
must you do?
A. Define an alias to
shorten the function name
B. Pass arguments to the
constructor of UDFs implementation class
C. Register the JAR file
containing the UDF
D. Put the JAR file into
the user's home folder in HDFS
Answer: C
Question No : 20
What does the following WebHDFS command do?
Curl -1 -L “http://host:port/webhdfs/v1/foo/bar?op=OPEN”
A. Make a directory
B. Read a file /foo/bar
C. List a directory /foo
D. Delete a directory
Answer: B
Question No : 21
What are the TWO main components of the YARN
ResourceManager process? Choose 2
A. Job Tracker
B. Task Tracker
C. Scheduler
D. Applications Manager
Answer: C,D
Question No : 22
Given a directory of files with the following structure:
line number, tab character, string:
You want to send each line as one record to your Mapper.
Which InputFormat should you
use to complete the line: conf.setInputFormat
(____.class) ; ?
A. SequenceFileAsTextInputFormat
B. SequenceFileInputFormat
C. KeyValueFileInputFormat
D. BDBInputFormat
Answer: C
Question No : 23
In a MapReduce job with 500 map tasks, how many map task
attempts will there be?
A. It depends on the
number of reduces in the job.
B. Between 500 and 1000.
C. At most 500.
D. At least 500.
E. Exactly 500.
Answer: D
From Cloudera Training Course:
Task attempt is a particular instance of an attempt to
execute a task
– There will be at least as many task attempts as there
are tasks
– If a task attempt fails, another will be started by the
– Speculative execution can also result in more task
attempts than completed tasks
Question No : 24
Which HDFS command uploads a local file X into an existing
HDFS directory Y?
A. hadoop scp X Y
B. hadoop fs -localPut X
C. hadoop fs-put X Y
D. hadoop fs -get X Y
Answer: C
Question No : 25
Which one of the following files is required in every
Oozie Workflow application?
A. job.properties
B. Config-default.xml
C. Workflow.xml
D. Oozie.xml
Answer: C
Question No : 26
Given the following Pig command:
logevents = LOAD 'input/my.log' AS
(date:chararray, levehstring, code:int,
Which one of the following statements is true?
A. The logevents
relation represents the data from the my.log file, using a comma as the
parsing delimiter
B. The logevents
relation represents the data from the my.log file, using a tab as the
parsing delimiter
C. The first field of
logevents must be a properly-formatted date string or table return an
D. The statement is not
a valid Pig command
Answer: B
Question No : 27
What does Pig provide to the overall Hadoop solution?
A. Legacy language
Integration with MapReduce framework
B. Simple scripting
language for writing MapReduce programs
C. Database table and
storage management services
D. C++ interface to
MapReduce and data warehouse infrastructure
Answer: B
Question No : 28
You want to perform analysis on a large collection of
images. You want to store this data in
HDFS and process it with MapReduce but you also want to
give your data analysts and
data scientists the ability to process the data directly
from HDFS with an interpreted highlevel
programming language like Python. Which format should you
use to store this data in
A. SequenceFiles
B. Avro
Answer: B
Reference: Hadoop binary files processing introduced by
image duplicates finder
Question No : 29
Which best describes how TextInputFormat processes input
files and line breaks?
A. Input file splits may
cross line breaks. A line that crosses file splits is read by the
RecordReader of the split that contains the beginning of
the broken line.
B. Input file splits may
cross line breaks. A line that crosses file splits is read by the
RecordReaders of both splits containing the broken line.
C. The input file is
split exactly at the line breaks, so each RecordReader will read a series
of complete lines.
D. Input file splits may
cross line breaks. A line that crosses file splits is ignored.
E. Input file splits may
cross line breaks. A line that crosses file splits is read by the
RecordReader of the split that contains the end of the
broken line.
Answer: A
Reference: How Map and Reduce operations are actually
carried out
Question No : 30
For each intermediate key, each reducer task can emit:
A. As many final
key-value pairs as desired. There are no restrictions on the types of those
key-value pairs (i.e., they can be heterogeneous).
B. As many final
key-value pairs as desired, but they must have the same type as the
intermediate key-value pairs.
C. As many final
key-value pairs as desired, as long as all the keys have the same type
and all the values have the same type.
D. One final key-value
pair per value associated with the key; no restrictions on the type.
E. One final key-value
pair per key; no restrictions on the type.
Answer: C
Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop
Tutorial, Module 4: MapReduce
Question No : 31
You want to run Hadoop jobs on your development
workstation for testing before you
submit them to your production cluster. Which mode of
operation in Hadoop allows you to
most closely simulate a production cluster while using a
single machine?
A. Run all the nodes in
your production cluster as virtual machines on your development
B. Run the hadoop
command with the –jt local and the –fs file:///options.
C. Run the DataNode,
TaskTracker, NameNode and JobTracker daemons on a single
D. Run simldooop, the
Apache open-source software for simulating Hadoop clusters.
Answer: C
Question No : 32
Assuming default settings, which best describes the order
of data provided to a reducer’s
reduce method:
A. The keys given to a
reducer aren’t in a predictable order, but the values associated with
those keys always are.
B. Both the keys and
values passed to a reducer always appear in sorted order.
C. Neither keys nor
values are in any predictable order.
D. The keys given to a
reducer are in sorted order but the values associated with each key
are in no predictable order
Answer: D
Explanation: Reducer
has 3 primary phases:
1. Shuffle
The Reducer copies the sorted output from each Mapper
using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since
different Mappers may have
output the same key).
The shuffle and sort phases occur simultaneously i.e.
while outputs are being fetched they
are merged.
To achieve a secondary sort on the values returned by the
value iterator, the application
should extend the key with the secondary key and define a
grouping comparator. The keys
will be sorted using the entire key, but will be grouped
using the grouping comparator to
decide which keys and values are sent in the same call to
3. Reduce
In this phase the reduce(Object, Iterable, Context)
method is called for each <key,
(collection of values)> in the sorted inputs.
The output of the reduce task is typically written to a
RecordWriter via
TaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class
Question No : 33
MapReduce v2 (MRv2/YARN) is designed to address which two
A. Single point of
failure in the NameNode.
B. Resource pressure on
the JobTracker.
C. HDFS latency.
D. Ability to run
frameworks other than MapReduce, such as MPI.
E. Reduce complexity of
the MapReduce APIs.
F. Standardize on a
single MapReduce API.
Answer: A,B
Reference: Apache Hadoop YARN – Concepts &
No comments:
Post a Comment