Hortonworks Certified Apache Hadoop Developer Test

Hortonworks-Certified-Apache-Hadoop-2.0-Developer : Practice Test

Question No : 1

What does the following command do?

A. Invokes the user-defined functions contained in the jar file

B. Assigns a name to a user-defined function or streaming command

C. Transforms Pig user-defined functions into a format that Hive can accept

D. Specifies the location of the JAR file containing the user-defined functions

Answer: D

Question No : 2

Consider the following two relations, A and B.

A Pig JOIN statement that combined relations A by its first field and B by its second field

would produce what output?

A. 2 Jim Chris 2

3 Terry 3

4 Brian 4

B. 2 cherry

2 cherry

3 orange

4 peach

C. 2 cherry Jim, Chris

3 orange Terry

4 peach Brian

D. 2 cherry Jim 2

2 cherry Chris 2

3 orange Terry 3

4 peach Brian 4

Answer: D

Question No : 3

You have user profile records in your OLPT database, that you want to join with web logs

you have already ingested into the Hadoop file system. How will you obtain these user

records?

A. HDFS command

B. Pig LOAD command

C. Sqoop import

D. Hive LOAD DATA command

E. Ingest with Flume agents

F. Ingest with Hadoop Streaming

Answer: C

Reference: Hadoop and Pig for Large-Scale Web Log Analysis

Question No : 4

You have written a Mapper which invokes the following five calls to the

OutputColletor.collect method:

output.collect (new Text (“Apple”), new Text (“Red”) ) ;

output.collect (new Text (“Banana”), new Text (“Yellow”) ) ;

output.collect (new Text (“Apple”), new Text (“Yellow”) ) ;

output.collect (new Text (“Cherry”), new Text (“Red”) ) ;

output.collect (new Text (“Apple”), new Text (“Green”) ) ;

How many times will the Reducer’s reduce method be invoked?

A. 6

B. 3

C. 1

D. 0

E. 5

Answer: B

Explanation: reduce() gets called once for each [key, (list of values)] pair. To explain, let's

say you called:

out.collect(new Text("Car"),new Text("Subaru");

out.collect(new Text("Car"),new Text("Honda");

out.collect(new Text("Car"),new Text("Ford");

out.collect(new Text("Truck"),new Text("Dodge");

out.collect(new Text("Truck"),new Text("Chevy");

Then reduce() would be called twice with the pairs

reduce(Car, <Subaru, Honda, Ford>)

reduce(Truck, <Dodge, Chevy>)

Reference: Mapper output.collect()?

Question No : 5

Given the following Pig commands:

Which one of the following statements is true?

A. The $1 variable represents the first column of data in 'my.log'

B. The $1 variable represents the second column of data in 'my.log'

C. The severe relation is not valid

D. The grouped relation is not valid

Answer: B

Question No : 6

What data does a Reducer reduce method process?

A. All the data in a single input file.

B. All data produced by a single mapper.

C. All data for a given key, regardless of which mapper(s) produced it.

D. All data for a given value, regardless of which mapper(s) produced it.

Answer: C

Explanation: Reducing lets you aggregate values together. A reducer function receives an

iterator of input values from an input list. It then combines these values together, returning

a single output value.

All values with the same key are presented to a single reduce task.

Reference: Yahoo! Hadoop Tutorial, Module 4: MapReduce

Question No : 7

All keys used for intermediate output from mappers must:

A. Implement a splittable compression algorithm.

B. Be a subclass of FileInputFormat.

C. Implement WritableComparable.

D. Override isSplitable.

E. Implement a comparator for speedy sorting.

Answer: C

Explanation: The MapReduce framework operates exclusively on <key, value> pairs, that

is, the framework views the input to the job as a set of <key, value> pairs and produces a

set of <key, value> pairs as the output of the job, conceivably of different types.

The key and value classes have to be serializable by the framework and hence need to

implement the Writable interface. Additionally, the key classes have to implement the

WritableComparable interface to facilitate sorting by the framework.

Reference: MapReduce Tutorial

Question No : 8

Which Hadoop component is responsible for managing the distributed file system

metadata?

A. NameNode

B. Metanode

C. DataNode

D. NameSpaceManager

Answer: A

Question No : 9

You need to move a file titled “weblogs” into HDFS. When you try to copy the file, you can’t.

You know you have ample space on your DataNodes. Which action should you take to

relieve this situation and store more files in HDFS?

A. Increase the block size on all current files in HDFS.

B. Increase the block size on your remaining files.

C. Decrease the block size on your remaining files.

D. Increase the amount of memory for the NameNode.

E. Increase the number of disks (or size) for the NameNode.

F. Decrease the block size on all current files in HDFS.

Answer: C

Question No : 10

In the reducer, the MapReduce API provides you with an iterator over Writable values.

What does calling the next () method return?

A. It returns a reference to a different Writable object time.

B. It returns a reference to a Writable object from an object pool.

C. It returns a reference to the same Writable object each time, but populated with different

data.

D. It returns a reference to a Writable object. The API leaves unspecified whether this is a

reused object or a new object.

E. It returns a reference to the same Writable object if the next value is the same as the

previous value, or a new Writable object otherwise.

Answer: C

Explanation: Calling Iterator.next() will always return the SAME EXACT instance of

IntWritable, with the contents of that instance replaced with the next value.

Reference: manupulating iterator in mapreduce

Question No : 11

MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate

daemons? Select two.

A. Heath states checks (heartbeats)

B. Resource management

C. Job scheduling/monitoring

D. Job coordination between the ResourceManager and NodeManager

E. Launching tasks

F. Managing file system metadata

G. MapReduce metric reporting

H. Managing tasks

Answer: B,C

Explanation: The fundamental idea of MRv2 is to split up the two major functionalities of

the JobTracker, resource management and job scheduling/monitoring, into separate

daemons. The idea is to have a global ResourceManager (RM) and per-application

ApplicationMaster (AM). An application is either a single job in the classical sense of Map-

Reduce jobs or a DAG of jobs.

Note:

The central goal of YARN is to clearly separate two things that are unfortunately smushed

together in current Hadoop, specifically in (mainly) JobTracker:

/ Monitoring the status of the cluster with respect to which nodes have which resources

available. Under YARN, this will be global.

/ Managing the parallelization execution of any specific job. Under YARN, this will be done

separately for each job.

Reference: Apache Hadoop YARN – Concepts & Applications

Question No : 12

For each input key-value pair, mappers can emit:

A. As many intermediate key-value pairs as designed. There are no restrictions on the

types of those key-value pairs (i.e., they can be heterogeneous).

B. As many intermediate key-value pairs as designed, but they cannot be of the same type

as the input key-value pair.

C. One intermediate key-value pair, of a different type.

D. One intermediate key-value pair, but of the same type.

E. As many intermediate key-value pairs as designed, as long as all the keys have the

same types and all the values have the same type.

Answer: E

Explanation: Mapper maps input key/value pairs to a set of intermediate key/value pairs.

Maps are the individual tasks that transform input records into intermediate records. The

transformed intermediate records do not need to be of the same type as the input records.

A given input pair may map to zero or many output pairs.

Reference: Hadoop Map-Reduce Tutorial

Question No : 13

Which one of the following statements describes the relationship between the

ResourceManager and the ApplicationMaster?

A. The ApplicationMaster requests resources from the ResourceManager

B. The ApplicationMaster starts a single instance of the ResourceManager

C. The ResourceManager monitors and restarts any failed Containers of the

ApplicationMaster

D. The ApplicationMaster starts an instance of the ResourceManager within each

Container

Answer: A

Question No : 14

Given the following Hive commands:

Which one of the following statements Is true?

A. The file mydata.txt is copied to a subfolder of /apps/hive/warehouse

B. The file mydata.txt is moved to a subfolder of /apps/hive/warehouse

C. The file mydata.txt is copied into Hive's underlying relational database 0.

D. The file mydata.txt does not move from Its current location in HDFS

Answer: A

Question No : 15

Which YARN component is responsible for monitoring the success or failure of a

Container?

A. ResourceManager

B. ApplicationMaster

C. NodeManager

D. JobTracker

Answer: A

Question No : 16

When can a reduce class also serve as a combiner without affecting the output of a

MapReduce program?

A. When the types of the reduce operation’s input key and input value match the types of

the reducer’s output key and output value and when the reduce operation is both

communicative and associative.

B. When the signature of the reduce method matches the signature of the combine

method.

C. Always. Code can be reused in Java since it is a polymorphic object-oriented

programming language.

D. Always. The point of a combiner is to serve as a mini-reducer directly after the map

phase to increase performance.

E. Never. Combiners and reducers must be implemented separately because they serve

different purposes.

Answer: A

Explanation: You can use your reducer code as a combiner if the operation performed is

commutative and associative.

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What

are combiners? When should I use a combiner in my MapReduce Job?

Question No : 17

Review the following data and Pig code:

What command to define B would produce the output (M,62,95l02) when invoking the

DUMP operator on B?

A. B = FILTER A BY (zip = = '95102' AND gender = = M");

B. B= FOREACH A BY (gender = = 'M' AND zip = = '95102');

C. B = JOIN A BY (gender = = 'M' AND zip = = '95102');

D. B= GROUP A BY (zip = = '95102' AND gender = = 'M');

Answer: A

Question No : 18

Which one of the following Hive commands uses an HCatalog table named x?

A. SELECT * FROM x;

B. SELECT x.-FROM org.apache.hcatalog.hive.HCatLoader('x');

C. SELECT * FROM org.apache.hcatalog.hive.HCatLoader('x');

D. Hive commands cannot reference an HCatalog table

Answer: C

Question No : 19

To use a lava user-defined function (UDF) with Pig what must you do?

A. Define an alias to shorten the function name

B. Pass arguments to the constructor of UDFs implementation class

C. Register the JAR file containing the UDF

D. Put the JAR file into the user's home folder in HDFS

Answer: C

Question No : 20

What does the following WebHDFS command do?

Curl -1 -L “http://host:port/webhdfs/v1/foo/bar?op=OPEN”

A. Make a directory /foo/bar

B. Read a file /foo/bar

C. List a directory /foo

D. Delete a directory /foo/bar

Answer: B

Question No : 21

What are the TWO main components of the YARN ResourceManager process? Choose 2

answers

A. Job Tracker

B. Task Tracker

C. Scheduler

D. Applications Manager

Answer: C,D

Question No : 22

Given a directory of files with the following structure: line number, tab character, string:

Example:

1abialkjfjkaoasdfjksdlkjhqweroij

2kadfjhuwqounahagtnbvaswslmnbfgy

3kjfteiomndscxeqalkzhtopedkfsikj

You want to send each line as one record to your Mapper. Which InputFormat should you

use to complete the line: conf.setInputFormat (____.class) ; ?

A. SequenceFileAsTextInputFormat

B. SequenceFileInputFormat

C. KeyValueFileInputFormat

D. BDBInputFormat

Answer: C

Explanation:

http://stackoverflow.com/questions/9721754/how-to-parse-customwritable-from-text-inhadoop

Question No : 23

In a MapReduce job with 500 map tasks, how many map task attempts will there be?

A. It depends on the number of reduces in the job.

B. Between 500 and 1000.

C. At most 500.

D. At least 500.

E. Exactly 500.

Answer: D

Explanation:

From Cloudera Training Course:

Task attempt is a particular instance of an attempt to execute a task

– There will be at least as many task attempts as there are tasks

– If a task attempt fails, another will be started by the JobTracker

– Speculative execution can also result in more task attempts than completed tasks

Question No : 24

Which HDFS command uploads a local file X into an existing HDFS directory Y?

A. hadoop scp X Y

B. hadoop fs -localPut X Y

C. hadoop fs-put X Y

D. hadoop fs -get X Y

Answer: C

Question No : 25

Which one of the following files is required in every Oozie Workflow application?

A. job.properties

B. Config-default.xml

C. Workflow.xml

D. Oozie.xml

Answer: C

Question No : 26

Given the following Pig command:

logevents = LOAD 'input/my.log' AS (date:chararray, levehstring, code:int,

message:string);

Which one of the following statements is true?

A. The logevents relation represents the data from the my.log file, using a comma as the

parsing delimiter

B. The logevents relation represents the data from the my.log file, using a tab as the

parsing delimiter

C. The first field of logevents must be a properly-formatted date string or table return an

error

D. The statement is not a valid Pig command

Answer: B

Question No : 27

What does Pig provide to the overall Hadoop solution?

A. Legacy language Integration with MapReduce framework

B. Simple scripting language for writing MapReduce programs

C. Database table and storage management services

D. C++ interface to MapReduce and data warehouse infrastructure

Answer: B

Question No : 28

You want to perform analysis on a large collection of images. You want to store this data in

HDFS and process it with MapReduce but you also want to give your data analysts and

data scientists the ability to process the data directly from HDFS with an interpreted highlevel

programming language like Python. Which format should you use to store this data in

HDFS?

A. SequenceFiles

B. Avro

C. JSON

D. HTML

E. XML

F. CSV

Answer: B

Reference: Hadoop binary files processing introduced by image duplicates finder

Question No : 29

Which best describes how TextInputFormat processes input files and line breaks?

A. Input file splits may cross line breaks. A line that crosses file splits is read by the

RecordReader of the split that contains the beginning of the broken line.

B. Input file splits may cross line breaks. A line that crosses file splits is read by the

RecordReaders of both splits containing the broken line.

C. The input file is split exactly at the line breaks, so each RecordReader will read a series

of complete lines.

D. Input file splits may cross line breaks. A line that crosses file splits is ignored.

E. Input file splits may cross line breaks. A line that crosses file splits is read by the

RecordReader of the split that contains the end of the broken line.

Answer: A

Reference: How Map and Reduce operations are actually carried out

Question No : 30

For each intermediate key, each reducer task can emit:

A. As many final key-value pairs as desired. There are no restrictions on the types of those

key-value pairs (i.e., they can be heterogeneous).

B. As many final key-value pairs as desired, but they must have the same type as the

intermediate key-value pairs.

C. As many final key-value pairs as desired, as long as all the keys have the same type

and all the values have the same type.

D. One final key-value pair per value associated with the key; no restrictions on the type.

E. One final key-value pair per key; no restrictions on the type.

Answer: C

Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop Tutorial, Module 4: MapReduce

Question No : 31

You want to run Hadoop jobs on your development workstation for testing before you

submit them to your production cluster. Which mode of operation in Hadoop allows you to

most closely simulate a production cluster while using a single machine?

A. Run all the nodes in your production cluster as virtual machines on your development

workstation.

B. Run the hadoop command with the –jt local and the –fs file:///options.

C. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single

machine.

D. Run simldooop, the Apache open-source software for simulating Hadoop clusters.

Answer: C

Question No : 32

Assuming default settings, which best describes the order of data provided to a reducer’s

reduce method:

A. The keys given to a reducer aren’t in a predictable order, but the values associated with

those keys always are.

B. Both the keys and values passed to a reducer always appear in sorted order.

C. Neither keys nor values are in any predictable order.

D. The keys given to a reducer are in sorted order but the values associated with each key

are in no predictable order

Answer: D

Explanation: Reducer has 3 primary phases:

1. Shuffle

The Reducer copies the sorted output from each Mapper using HTTP across the network.

2. Sort

The framework merge sorts Reducer inputs by keys (since different Mappers may have

output the same key).

The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they

are merged.

SecondarySort

To achieve a secondary sort on the values returned by the value iterator, the application

should extend the key with the secondary key and define a grouping comparator. The keys

will be sorted using the entire key, but will be grouped using the grouping comparator to

decide which keys and values are sent in the same call to reduce.

3. Reduce

In this phase the reduce(Object, Iterable, Context) method is called for each <key,

(collection of values)> in the sorted inputs.

The output of the reduce task is typically written to a RecordWriter via

TaskInputOutputContext.write(Object, Object).

The output of the Reducer is not re-sorted.

Reference: org.apache.hadoop.mapreduce, Class

Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Question No : 33

MapReduce v2 (MRv2/YARN) is designed to address which two issues?

A. Single point of failure in the NameNode.

B. Resource pressure on the JobTracker.

C. HDFS latency.

D. Ability to run frameworks other than MapReduce, such as MPI.

E. Reduce complexity of the MapReduce APIs.

F. Standardize on a single MapReduce API.

Answer: A,B

Reference: Apache Hadoop YARN – Concepts & Applications

Service Now - Technical Stuff

Search This Blog

Hortonworks Certified Apache Hadoop Developer Test

No comments:

Total Pageviews

Get In Touch

SNOW Tech