Search This Blog

Hortonworks Certified Apache Hadoop Developer Test


Hortonworks-Certified-Apache-Hadoop-2.0-Developer : Practice Test

Question No : 1
What does the following command do?
register '/piggyban):/pig-files.jar';
A. Invokes the user-defined functions contained in the jar file
B. Assigns a name to a user-defined function or streaming command
C. Transforms Pig user-defined functions into a format that Hive can accept
D. Specifies the location of the JAR file containing the user-defined functions
Answer: D

Question No : 2
Consider the following two relations, A and B.
A Pig JOIN statement that combined relations A by its first field and B by its second field
would produce what output?
A. 2 Jim Chris 2
3 Terry 3
4 Brian 4
B. 2 cherry
2 cherry
3 orange
4 peach
C. 2 cherry Jim, Chris
3 orange Terry
4 peach Brian
D. 2 cherry Jim 2
2 cherry Chris 2
3 orange Terry 3
4 peach Brian 4
Answer: D

Question No : 3
You have user profile records in your OLPT database, that you want to join with web logs
you have already ingested into the Hadoop file system. How will you obtain these user
records?
A. HDFS command
B. Pig LOAD command
C. Sqoop import
D. Hive LOAD DATA command
E. Ingest with Flume agents
F. Ingest with Hadoop Streaming
Answer: C
Reference: Hadoop and Pig for Large-Scale Web Log Analysis

Question No : 4
You have written a Mapper which invokes the following five calls to the
OutputColletor.collect method:
output.collect (new Text (“Apple”), new Text (“Red”) ) ;
output.collect (new Text (“Banana”), new Text (“Yellow”) ) ;
output.collect (new Text (“Apple”), new Text (“Yellow”) ) ;
output.collect (new Text (“Cherry”), new Text (“Red”) ) ;
output.collect (new Text (“Apple”), new Text (“Green”) ) ;
How many times will the Reducer’s reduce method be invoked?
A. 6
B. 3
C. 1
D. 0
E. 5
Answer: B
Explanation: reduce() gets called once for each [key, (list of values)] pair. To explain, let's
say you called:
out.collect(new Text("Car"),new Text("Subaru");
out.collect(new Text("Car"),new Text("Honda");
out.collect(new Text("Car"),new Text("Ford");
out.collect(new Text("Truck"),new Text("Dodge");
out.collect(new Text("Truck"),new Text("Chevy");
Then reduce() would be called twice with the pairs
reduce(Car, <Subaru, Honda, Ford>)
reduce(Truck, <Dodge, Chevy>)
Reference: Mapper output.collect()?

Question No : 5
Given the following Pig commands:
Which one of the following statements is true?
A. The $1 variable represents the first column of data in 'my.log'
B. The $1 variable represents the second column of data in 'my.log'
C. The severe relation is not valid
D. The grouped relation is not valid
Answer: B

Question No : 6
What data does a Reducer reduce method process?
A. All the data in a single input file.
B. All data produced by a single mapper.
C. All data for a given key, regardless of which mapper(s) produced it.
D. All data for a given value, regardless of which mapper(s) produced it.
Answer: C
Explanation: Reducing lets you aggregate values together. A reducer function receives an
iterator of input values from an input list. It then combines these values together, returning
a single output value.
All values with the same key are presented to a single reduce task.
Reference: Yahoo! Hadoop Tutorial, Module 4: MapReduce

Question No : 7
All keys used for intermediate output from mappers must:
A. Implement a splittable compression algorithm.
B. Be a subclass of FileInputFormat.
C. Implement WritableComparable.
D. Override isSplitable.
E. Implement a comparator for speedy sorting.
Answer: C
Explanation: The MapReduce framework operates exclusively on <key, value> pairs, that
is, the framework views the input to the job as a set of <key, value> pairs and produces a
set of <key, value> pairs as the output of the job, conceivably of different types.
The key and value classes have to be serializable by the framework and hence need to
implement the Writable interface. Additionally, the key classes have to implement the
WritableComparable interface to facilitate sorting by the framework.
Reference: MapReduce Tutorial

Question No : 8
Which Hadoop component is responsible for managing the distributed file system
metadata?
A. NameNode
B. Metanode
C. DataNode
D. NameSpaceManager
Answer: A

Question No : 9
You need to move a file titled “weblogs” into HDFS. When you try to copy the file, you can’t.
You know you have ample space on your DataNodes. Which action should you take to
relieve this situation and store more files in HDFS?
A. Increase the block size on all current files in HDFS.
B. Increase the block size on your remaining files.
C. Decrease the block size on your remaining files.
D. Increase the amount of memory for the NameNode.
E. Increase the number of disks (or size) for the NameNode.
F. Decrease the block size on all current files in HDFS.
Answer: C

Question No : 10
In the reducer, the MapReduce API provides you with an iterator over Writable values.
What does calling the next () method return?
A. It returns a reference to a different Writable object time.
B. It returns a reference to a Writable object from an object pool.
C. It returns a reference to the same Writable object each time, but populated with different
data.
D. It returns a reference to a Writable object. The API leaves unspecified whether this is a
reused object or a new object.
E. It returns a reference to the same Writable object if the next value is the same as the
previous value, or a new Writable object otherwise.
Answer: C
Explanation: Calling Iterator.next() will always return the SAME EXACT instance of
IntWritable, with the contents of that instance replaced with the next value.
Reference: manupulating iterator in mapreduce

Question No : 11
MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate
daemons? Select two.
A. Heath states checks (heartbeats)
B. Resource management
C. Job scheduling/monitoring
D. Job coordination between the ResourceManager and NodeManager
E. Launching tasks
F. Managing file system metadata
G. MapReduce metric reporting
H. Managing tasks
Answer: B,C
Explanation: The fundamental idea of MRv2 is to split up the two major functionalities of
the JobTracker, resource management and job scheduling/monitoring, into separate
daemons. The idea is to have a global ResourceManager (RM) and per-application
ApplicationMaster (AM). An application is either a single job in the classical sense of Map-
Reduce jobs or a DAG of jobs.
Note:
The central goal of YARN is to clearly separate two things that are unfortunately smushed
together in current Hadoop, specifically in (mainly) JobTracker:
/ Monitoring the status of the cluster with respect to which nodes have which resources
available. Under YARN, this will be global.
/ Managing the parallelization execution of any specific job. Under YARN, this will be done
separately for each job.
Reference: Apache Hadoop YARN – Concepts & Applications

Question No : 12
For each input key-value pair, mappers can emit:
A. As many intermediate key-value pairs as designed. There are no restrictions on the
types of those key-value pairs (i.e., they can be heterogeneous).
B. As many intermediate key-value pairs as designed, but they cannot be of the same type
as the input key-value pair.
C. One intermediate key-value pair, of a different type.
D. One intermediate key-value pair, but of the same type.
E. As many intermediate key-value pairs as designed, as long as all the keys have the
same types and all the values have the same type.
Answer: E
Explanation: Mapper maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks that transform input records into intermediate records. The
transformed intermediate records do not need to be of the same type as the input records.
A given input pair may map to zero or many output pairs.
Reference: Hadoop Map-Reduce Tutorial

Question No : 13
Which one of the following statements describes the relationship between the
ResourceManager and the ApplicationMaster?
A. The ApplicationMaster requests resources from the ResourceManager
B. The ApplicationMaster starts a single instance of the ResourceManager
C. The ResourceManager monitors and restarts any failed Containers of the
ApplicationMaster
D. The ApplicationMaster starts an instance of the ResourceManager within each
Container
Answer: A

Question No : 14
Given the following Hive commands:
Which one of the following statements Is true?
A. The file mydata.txt is copied to a subfolder of /apps/hive/warehouse
B. The file mydata.txt is moved to a subfolder of /apps/hive/warehouse
C. The file mydata.txt is copied into Hive's underlying relational database 0.
D. The file mydata.txt does not move from Its current location in HDFS
Answer: A

Question No : 15
Which YARN component is responsible for monitoring the success or failure of a
Container?
A. ResourceManager
B. ApplicationMaster
C. NodeManager
D. JobTracker
Answer: A

Question No : 16
When can a reduce class also serve as a combiner without affecting the output of a
MapReduce program?
A. When the types of the reduce operation’s input key and input value match the types of
the reducer’s output key and output value and when the reduce operation is both
communicative and associative.
B. When the signature of the reduce method matches the signature of the combine
method.
C. Always. Code can be reused in Java since it is a polymorphic object-oriented
programming language.
D. Always. The point of a combiner is to serve as a mini-reducer directly after the map
phase to increase performance.
E. Never. Combiners and reducers must be implemented separately because they serve
different purposes.
Answer: A
Explanation: You can use your reducer code as a combiner if the operation performed is
commutative and associative.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What
are combiners? When should I use a combiner in my MapReduce Job?

Question No : 17
Review the following data and Pig code:
What command to define B would produce the output (M,62,95l02) when invoking the
DUMP operator on B?
A. B = FILTER A BY (zip = = '95102' AND gender = = M");
B. B= FOREACH A BY (gender = = 'M' AND zip = = '95102');
C. B = JOIN A BY (gender = = 'M' AND zip = = '95102');
D. B= GROUP A BY (zip = = '95102' AND gender = = 'M');
Answer: A

Question No : 18
Which one of the following Hive commands uses an HCatalog table named x?
A. SELECT * FROM x;
B. SELECT x.-FROM org.apache.hcatalog.hive.HCatLoader('x');
C. SELECT * FROM org.apache.hcatalog.hive.HCatLoader('x');
D. Hive commands cannot reference an HCatalog table
Answer: C

Question No : 19
To use a lava user-defined function (UDF) with Pig what must you do?
A. Define an alias to shorten the function name
B. Pass arguments to the constructor of UDFs implementation class
C. Register the JAR file containing the UDF
D. Put the JAR file into the user&apos;s home folder in HDFS
Answer: C

Question No : 20
What does the following WebHDFS command do?
Curl -1 -L “http://host:port/webhdfs/v1/foo/bar?op=OPEN”
A. Make a directory /foo/bar
B. Read a file /foo/bar
C. List a directory /foo
D. Delete a directory /foo/bar
Answer: B

Question No : 21
What are the TWO main components of the YARN ResourceManager process? Choose 2
answers
A. Job Tracker
B. Task Tracker
C. Scheduler
D. Applications Manager
Answer: C,D

Question No : 22
Given a directory of files with the following structure: line number, tab character, string:
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you
use to complete the line: conf.setInputFormat (____.class) ; ?
A. SequenceFileAsTextInputFormat
B. SequenceFileInputFormat
C. KeyValueFileInputFormat
D. BDBInputFormat
Answer: C
Explanation:
http://stackoverflow.com/questions/9721754/how-to-parse-customwritable-from-text-inhadoop

Question No : 23
In a MapReduce job with 500 map tasks, how many map task attempts will there be?
A. It depends on the number of reduces in the job.
B. Between 500 and 1000.
C. At most 500.
D. At least 500.
E. Exactly 500.
Answer: D
Explanation:
From Cloudera Training Course:
Task attempt is a particular instance of an attempt to execute a task
– There will be at least as many task attempts as there are tasks
– If a task attempt fails, another will be started by the JobTracker
– Speculative execution can also result in more task attempts than completed tasks

Question No : 24
Which HDFS command uploads a local file X into an existing HDFS directory Y?
A. hadoop scp X Y
B. hadoop fs -localPut X Y
C. hadoop fs-put X Y
D. hadoop fs -get X Y
Answer: C

Question No : 25
Which one of the following files is required in every Oozie Workflow application?
A. job.properties
B. Config-default.xml
C. Workflow.xml
D. Oozie.xml
Answer: C

Question No : 26
Given the following Pig command:
logevents = LOAD &apos;input/my.log&apos; AS (date:chararray, levehstring, code:int,
message:string);
Which one of the following statements is true?
A. The logevents relation represents the data from the my.log file, using a comma as the
parsing delimiter
B. The logevents relation represents the data from the my.log file, using a tab as the
parsing delimiter
C. The first field of logevents must be a properly-formatted date string or table return an
error
D. The statement is not a valid Pig command
Answer: B

Question No : 27
What does Pig provide to the overall Hadoop solution?
A. Legacy language Integration with MapReduce framework
B. Simple scripting language for writing MapReduce programs
C. Database table and storage management services
D. C++ interface to MapReduce and data warehouse infrastructure
Answer: B

Question No : 28
You want to perform analysis on a large collection of images. You want to store this data in
HDFS and process it with MapReduce but you also want to give your data analysts and
data scientists the ability to process the data directly from HDFS with an interpreted highlevel
programming language like Python. Which format should you use to store this data in
HDFS?
A. SequenceFiles
B. Avro
C. JSON
D. HTML
E. XML
F. CSV
Answer: B
Reference: Hadoop binary files processing introduced by image duplicates finder
Question No : 29
Which best describes how TextInputFormat processes input files and line breaks?
A. Input file splits may cross line breaks. A line that crosses file splits is read by the
RecordReader of the split that contains the beginning of the broken line.
B. Input file splits may cross line breaks. A line that crosses file splits is read by the
RecordReaders of both splits containing the broken line.
C. The input file is split exactly at the line breaks, so each RecordReader will read a series
of complete lines.
D. Input file splits may cross line breaks. A line that crosses file splits is ignored.
E. Input file splits may cross line breaks. A line that crosses file splits is read by the
RecordReader of the split that contains the end of the broken line.
Answer: A
Reference: How Map and Reduce operations are actually carried out

Question No : 30
For each intermediate key, each reducer task can emit:
A. As many final key-value pairs as desired. There are no restrictions on the types of those
key-value pairs (i.e., they can be heterogeneous).
B. As many final key-value pairs as desired, but they must have the same type as the
intermediate key-value pairs.
C. As many final key-value pairs as desired, as long as all the keys have the same type
and all the values have the same type.
D. One final key-value pair per value associated with the key; no restrictions on the type.
E. One final key-value pair per key; no restrictions on the type.
Answer: C
Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop Tutorial, Module 4: MapReduce

Question No : 31
You want to run Hadoop jobs on your development workstation for testing before you
submit them to your production cluster. Which mode of operation in Hadoop allows you to
most closely simulate a production cluster while using a single machine?
A. Run all the nodes in your production cluster as virtual machines on your development
workstation.
B. Run the hadoop command with the –jt local and the –fs file:///options.
C. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single
machine.
D. Run simldooop, the Apache open-source software for simulating Hadoop clusters.
Answer: C

Question No : 32
Assuming default settings, which best describes the order of data provided to a reducer’s
reduce method:
A. The keys given to a reducer aren’t in a predictable order, but the values associated with
those keys always are.
B. Both the keys and values passed to a reducer always appear in sorted order.
C. Neither keys nor values are in any predictable order.
D. The keys given to a reducer are in sorted order but the values associated with each key
are in no predictable order
Answer: D
Explanation: Reducer has 3 primary phases:
1. Shuffle
The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since different Mappers may have
output the same key).
The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they
are merged.
SecondarySort
To achieve a secondary sort on the values returned by the value iterator, the application
should extend the key with the secondary key and define a grouping comparator. The keys
will be sorted using the entire key, but will be grouped using the grouping comparator to
decide which keys and values are sent in the same call to reduce.
3. Reduce
In this phase the reduce(Object, Iterable, Context) method is called for each <key,
(collection of values)> in the sorted inputs.
The output of the reduce task is typically written to a RecordWriter via
TaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class
Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Question No : 33
MapReduce v2 (MRv2/YARN) is designed to address which two issues?
A. Single point of failure in the NameNode.
B. Resource pressure on the JobTracker.
C. HDFS latency.
D. Ability to run frameworks other than MapReduce, such as MPI.
E. Reduce complexity of the MapReduce APIs.
F. Standardize on a single MapReduce API.
Answer: A,B
Reference: Apache Hadoop YARN – Concepts & Applications

No comments: