HD

COMMENTS

STATISTICS

RECORDS

TAKE THE TEST

Title of test:

HD

Description:
test of HD

Author:

chmiel.pawell@gmail.com

Other tests from this author

Creation Date: 2023/04/13

Category: Others

Number of questions: 378

Rating:

(0)

Share the Test:

Nuevo Comentario

New Comment
NO RECORDS

Content:

You are working in an organization QuickTechie.com Inc which provides the Data based products to their various clients. Entire data processing is done using 3 CDP cluster. Your administrator told you that you can use Cloudera manager for manage, configure, and monitor. All 3 CDP Private Cloud Base Cluster, using single Cloudera manager. You need 3 Cloudera manager installed for each CDP cluster, to manage 3 cluster. All 3 clusters Cloudera Runtime Services can be managed using single Cloudera manager. Using one Cloudera manager instance, you can manage only 1 cluster Cloudera Runtime Services.

You are working as an Administrator in HadoopExam.com Inc, where a 30 node CDP cluster was created in North America Datacentre. Each host has the same configuration in the cluster and having CentOs 7.0 version installed. You are using Cloudera Manager to monitor the services installed across the entire cluster. Which of the following statement is correct for monitoring and managing a CDP cluster?. You would be installing Cloudera manager Server on each Host of the cluster. You would be installing Cloudera manager Agent on each Host of the cluster. You would be installing Cloudera manager Agent and Cloudera Server only on one of the hosts in a Cluster. You would be installing Cloudera manager Server only on the NameNode and 3 other nodes in the cluster. You would be installing Cloudera Manager Server only on one of the Host in the cluster.

You are working on a cluster setup at QuickTechie.com North America Data Center. Which of the following activity you can perform using Cloudera manager Admin UI, being an Administrator?. Cluster Stop & Start. Service Start and Stop. Configuring a new Service. Upgrading Cluster. Updating Configuration. Changing Security Configurations.

You have been given an activity to setup a schedule restart of the few services. Please note that there are 3 CDP cluster and all are managed by single Cloudera Manager. - Cluster-1: Hive Service Restart Every Sunday 6:00 PM IST - Cluster-2: Impala Service Restart Every Saturday 6:00 PM IST - Cluster-3: Spark Service Restart Every Sunday 6:00 PM EST Which of the following option, you would choose to implement this?. You would use Cloudera Manager Console and Schedule all this activity using Crone Expression. You would use Cloudera Manager Console and Schedule all this activity by putting explicit time for each service on the cluster. You would use Cloudera Manager API and Schedule all this activity by putting explicit time for each service on the cluster. You have to restart entire cluster for each mentioned time. Since, you can not restart the individual service.

Which of the following statements are correct with regards to the Cloudera manager?. Cloudera Manager API can be used to Automate Admin tasks. Cloudera Manager Version is not dependent on the CDP Private Cloud Base Version. Cloudera Manager Can manage older version of the CDH Cluster. Cloudera Manager Admin console provides a cluster wide real-time view of hosts and running services. Using Cloudera Manager, you can also manage security and encryption functionality.

Can you please map the following, keep in mind everything is related to Cloudera Manager only. Category of functionality on Cloudera Manager. It represents one of the running Service on the Cloudera Manager. Each service can have more than one of this. It represents an individual or separate running process or component on a Host in CDP cluster. Set of configuration properties.

You are working with a Cluster which is setup in QuickTechie.com North America Datacentres. Which currently having 25 Nodes (Host). And following services are running in this cluster on each node. - Spark DataNode - HDFS DataNode Now, you want to add 3 new nodes in the cluster and it should have the same services running on the cluster. Which of the following feature, you are going to use, to add 3 new nodes and having same services running on that. Config Template. Host Template. Rack Template. Service Template.

Can you please map the following, keep in mind everything is related to Cloudera Manager only. For Cloudera Manager, it represents the configuration of Cloudera Manager and all the clusters it manages. Represents a physical entity which contains a set of physical hosts and typically served by the same switch. This is a set of configuration properties for a set of Role instances. It is a named configuration of resources and a policy for scheduling the resources among YARN applications.

Select all the statements which are correct for CDP Cloudera Cluster and Cloudera manager?. Each individual cluster can only be associated with a single Cloudera Manager Server. In a single CDP Cluster, we can have more the one Cloudera Runtime Environment. A single Host of CDP cluster can join more than one CDP Cluster, which all are managed by Single Cloudera Manager. Gateway node role can be used to provide access to a client for a Specific service from Cluster services.

CDP Cluster supports different types of Edge or Gateway nodes. Please select correct statements with regards to Gateway node?. Gateway Nide is a type of role that provides access to a client for a specific service from Cluster Services. It is mandatory to have Name for the Gateway Role. To have Client Configuration deployed, you have to use Deploy Client Configuration options on Cloudera Manager Admin Consoler. Hue Kerberos Ticket Renewer is a gateway role that proxies tickets from Kerberos.

Which of the following things are correct with regards to Cloudera manager and it hosts. A Cloudera Manager hosts an Admin Console. The Cloudera Manager hosts API’s. Cloudera Manager Server Responsible for installing software. Cloudera Manager Server Configuring services. Cloudera Manager Server Starting & stopping services. Cloudera Manager Server is responsible for Managing the cluster on which the services run.

Which of the following are the responsibility of the Agents which are installed on all the host in a CDP cluster?. Starting and stopping processes. Unpacking configurations. Triggering installations. Monitoring hosts. Hosting API. Agent from each host sent the heartbeat every 15 seconds to Cloudera Manager Server.

Cloudera manager maintains the two states Model State and Runtime State and that is stored in a Database. One of the examples where you have to update the port for Hue WebServer. Then which state is updated. Runtime State. Model State. In Memory State. None of the above.

Cloudera manager maintains the two states Model State and Runtime State and that is stored in a Database. Suppose you have updated model state for a Service and process was already running in that case Model State and Runtime state are in ___________ state. Mismatched. Matched. Sync.

Cloudera manager maintains the two states Model State and Runtime State and that is stored in a Database. Which of the following statements are correct for the Runtime State?. This tells you what processes are running and where. This tells you what commands it is running. The runtime state includes the exact configuration files needed to run a process. This tells you what processes are Not running and where it supposed to run.

There are two different software distributions are possible for the Cloudera Manager - Package - Parcels Which of the following is included in a Parcel?. Compiled Code. Program files. MetaData which is used by Cloudera Manager.

Which all are the advantages of the Parcels over the packages?. You don’t have to manually update entire cluster. Rather just a single click will bring down entire cluster, do Cloudera Runtime Updates and then Restart the cluster. For Parcels, if multiple versions are installed you can mark one of them as active one. If you want to do rolling upgrades then Parcels are required. Instead of having a separate package for each component of Cloudera Runtime, parcels are distributed as a single object.

Which of the following open-source tool is used by Cloudera Manager to start and stop role instances?. supervisord. supervisor. init.d. start.d.

Select the correct statements with regards to Cloudera Manager Server and Agents. If Cloudera Manager Server, which is managing your CDP cluster is down then respective cluster is not reachable and would be down too. If Cloudera Agent, which is installed on host. If down then services on that host will also be down. If Cloudera Agent, which is installed on host. If down, then services on that host will not be down. If Cloudera Manager Server, which is managing your CDP cluster is down then respective cluster is reachable and would not be down.

When Agent gets an information from the Cloudera Manager Server as part of heartbeat to start a service, then which all of the following action would be taken by an Agent running on a particular host. Agent will create a new directory under “/var/run/cloudera-scm-agent/. Agent will create a new directory “/var/run/cloudera-scm-agent-$service/ , here $service represent name of the service i.e. HDFS. Agent unpacks the required configurations. Agent contacts to init.d , which will starts the process. Agent contacts to supervisord, which will starts the process.

Cloudera SDX (Shared Data Experience) is used for?. Cluster Configuration. Security. Governance. MetaData. Cluster Monitoring.

When you create a CDP Private Cloud Base, you have specific Node Architecture created. And that architecture has different types of nodes. Each node has a physical configuration that is specialized for its role in the cluster. These nodes are further specialized through the software services that are assigned to them. Can you please map below?. This node runs all the services that are required to manage the cluster storage and compute services. This node runs Cloudera Manager and the Cloudera Management Services. This node contains all client-facing configurations and services, including gateway configurations. This node runs all the services that are required to store blocks of data on the local hard drives, and run processing tasks against that data.

Which of the following services, you should consider for the One of the Master node in your Cluster?. NameNode. FailoverController. Cloudera Manager. Apache Ranger. Node Manager. JournalNode.

You are working in a company Called HadoopExam Inc. And you have been asked to setup a CDP private cloud cluster. However, with some analysis you contact your manager for setting up such a cluster. He promptly asked, what should be the size of cluster. Which of the following things would you try to find out to come up with better cluster size?. How much storage capacity is needed?. What is the Data Ingest volumes and growth rates?. What is the requirement of Memory and processor capacity?. What is the Service Level Agreements?.

When you are planning to expand or scale your live Cloudera CDP Private Cloud Base, then which of the following things can be considered for scaling?. Rack. Pod. Cluster. Data.

When you are considering to Scale your existing CDP Private Cloud Base. Then which of the below is considered the smallest size designation?. Rack. Pod. Cluster.

Which of the following statements are correct with regards to Pod in a CDP Private Cloud Base cluster?. A pod is the set of nodes that are connected to the first level of network switches in the cluster. Pod consists of one or more racks. A pod is a second-level fault zone above the rack level. A pod is a first-level fault zone above the rack level.

Which of the following can affect the storage capacity for the CDP Cluster?. Number of Nodes in the cluster. Raw Storage per Node. Compression Ratio. Storage Efficiency.

Which all are the supported Operating System for installing Cloudera CDP Private Cloud base?. Windows. MacOs. RHEL. CentOS. Oracle. Ubuntu. SLES.

Which of the following RDMS databases are supported, while installing CDP Private CloudBase?. Oracle. Sybase. MySQL. MariaDB. PostgreSL.

Which of the following types of Processors are supported by the Cloudera CDP Private Cloudbase host?. Dell Power. EMC Power. IBM Power System. X86. ARM.

Which of the below are the correct recommendations from the Cloudera, when you setup a Cluster using Linux Operating System?. All Runtime hosts in a logical cluster must run on the same major OS release. Cloudera Manager must run on the same OS release as one of the clusters it manages. Cloudera recommends running the same minor release on all cluster nodes. Cloudera Recommend to use Runtime cluster deployments in Docker containers.

You have setup a CDP Private Cloud base cluster. However, in your cluster you have multiple versions of Python installed like Python 2.7, Python 3.4 and Python 3.6. When Data Analyst complain, while using the PySpark. These multiple versions of Python is creating issue for his PySpark program, what will you do?. Uninstall the not needed Python Version. You will keep only the highest version of Python and rest you will uninstall. You would ask Data Analyst to use PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variable. You will setup a new cluster, since a single CDP can support multiple clusters. With required version of Python.

Which of the following File Systems are supported by Cloudera for HDFS?. Ext3. Ext4. XFS. S3.

When you do a read from a Linux filesystem then there is a metadata entry written (when this file was last accessed) to the Linux Filesystem. Hence, indirectly your read request is making a write request as well. Hence, it is recommended to disable this write request, to improve the performance of read?. True. False.

You have cluster running with the FIFO Scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to cluster, which you expect to run a couple of minutes only. You submit both jobs with the same priority. Which two best describes how the FIFO Scheduler arbitrates the cluster resources for a job and its tasks?. Given Jobs A and B submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B. The order of execution of tasks within a job may vary. and Tasks are scheduled in the order of their jobs' submission. The FIFO Scheduler will give, on average, equal share of the cluster resources over the job lifecycle. Because there is more than a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time. The FIFO Schedule will pass an exception back to the client when Job B is submitted, since all slots on the cluster are in use.

Your Hadoop cluster has 12 slave nodes, a block size set to 64MB, and a replication factor of three. Choose which best describes how the Hadoop Framework distributes block writes into HDFS from a Reducer outputting a 150MB file?. The Reducer will generate twelve blocks and write them to slave nodes nearest the node on which the Reducer runs. The Reducer will generate nine blocks and write them randomly to nodes throughout the cluster. The slave node on which the Reducer runs gets the first copy of every block written. Other block replicas will be placed on other nodes. Reducers don't write blocks into HDFS.

You has a cluster running with the Fail Scheduler enabled. There are currently no jobs running on the cluster you submit a job A, so that only job A is running on the cluster. A while later, you submit job B. Now job A and Job B are running on the cluster al the same time. How will the Fair' Scheduler handle these two Jobs?. When job A gets submitted, it consumes all the task slot. When job A gets submitted, it doesn't consume all the task slot. When job B gets submitted, Job A has to finish first, before job it can get scheduled. When job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.

In the context of configuring a Hadoop cluster for HDFS High Availability (HA), `fencing' refers to: Isolating a failed NameNode from write access to the fsimage and edits files so that is cannot resume write operations if it recovers. Isolating the cluster's master daemon to limit write access only to authorized clients. Isolating both HA NameNodes to prevent a client application from killing the NameNode daemons. Isolating the standby NameNode from write access to the fsimage and edits files.

You are planning a Hadoop duster, and you expect to be receiving just under 1TB of data per week which will be stored on the cluster, using Hadoop's default replication. You decide that your slave nodes will be configured with 4 x 1TB disks. Calculate how many slave nodes you need to deploy at a minimum to store one year's worth of data. 1000 slave nodes. 100 slave nodes. 10 slave nodes. 50 slave nodes.

Your cluster Mode size is set to 128MB. A client application (client application A) is writing a 500MB file to HDFS. After client application A has written 300MB of data, another client (client application B) attempts to read the file. What is the effect of a second client requesting a file during a write?. Application B can read 256MB of the file. Client application B returns an error. Client application on B can read the 300MB that has been written so far. Client application B must wait until the entire file has been written, and will then read its entire contents.

Your cluster has nodes in seven racks, and you have provided a rack topology script. What is Hadoop's block placement policy, assuming a block replication factor of three?. One copy of the block is written to a node in each of three racks. One copy of the block is written to a node in one rack; two copies are written to two nodes in a different rack. All three of the blocks are written to nodes on the same rack. Because there are seven racks the block is written to a node on each rack.

You have a cluster running 32 slave nodes and 3 master nodes running mapreduce V1 (MRv1). You execute the command: $ hadoop fsck / What four cluster conditions running this command will return to you?. The current state of the file system returned from scanning individual blocks on each DataNode. Number of dead DataNodes. Configure capacity of your cluster. Under-replicated blocks. Blocks replicated improperly or that don't satisfy your cluster enhancement policy (e. g. , too many blocks replicated on the same node). Number of DataNodes. The current state of the file system according to the NameNode. The location for every block.

You’re running a hadoop cluster with a name node on the host mynamenode. What are two ways you can determine available HDFS space in your cluster?. Run hadoop fs -du / and locate the dfs remaining value. Connect to http://mynamenode:50070/and locate the dfs remaining value. Run hadoop dfsadmin - report and locate the DFS remaining value. Run hadoop DFSAdmin -spaceQuota and subtract DFS used % from configured capacity.

What is the best disk configuration for slave nodes in hadoop cluster where each node has 6x2TB drives?. Three RAID 1 arrays. A RAID 5 array. Six separate volumes. A single Linux LVM (Logical volume Manager) volume.

What is the rule governing the formatting of the underlying filesystem in the hadoop cluster?. They must all use the same file system but this does not need to be the same filesystem as the filesystem used by the NameNode. They must all be left as formatted raw disk, hadoop format them automatically. They must all use the same filesystem as the NameNode. They must all be left as unformatted, rawdisk; hadoop uses raw unformatted disk for HDFS. They can use different file system.

You have a cluster running with the fair scheduler enabled and configured. You submit multiple jobs to the cluster. Each job is assigned to a pool. What are the two key points to remember about how jobs are scheduled with the fair scheduler?. Each pool gets 1/M of the total available task’s slots, where M is the no. of nodes in the cluster. Pools are assigned priorities. Pools with higher priorities an executed b4 pools with lower priorities. Each pool gets 1/N of the total available task’s slots, where N is the no of jobs running on the cluster. Pools get a dynamically-allocated share of the available task slots (subject to additional constraints). Each pool share of the tasks slots remains static within the execution of any individual job. Each pool share of task slots may change throughout the course of job execution.

Your cluster has 9 slave nodes. The cluster block size is set to 128 MB and it's replication factor set to three. How will the hadoop framework distribute block writes from a reducer into HDFS from a reducer outputting a 300MB file?. Reducer don't write blocks into HDFS. The node on which the reducer is running will receive one copy of each block. The other replicas will be placed on other nodes in the cluster. The 9 blocks will be return randomly to the nodes; some may receive multiple blocks some may receive none. All 9 nodes will each receive exactly one block. The 9 blocks will be return to 3 nodes, such that each of the three get's one copy of each block.

What does each block of a file contain where it is return into HDFS?. Each block writes a separate meta file containing information on the file name of which the block is a part. Each block has a header and footer containing metadata. Each block contains only data from the file. Each block as a header containing metadata.

Which two occur when individual blocks are returned to DATANODE on a cluster local filesystem?. The DataNode updates its log of checksum verification. The DataNode writes a metadata file with the name of the file the block is associated with. A metadata file is written to the DataNode containing the checksums for each block. A metadata file is return to the DataNode containing all the other node locations in the namespace. The DataNode runs a block scanner datablock scanner to verify the return blocks.

Once a client application validates its identity and is granted access to file in a cluster. What is the reminder of read path back to the client?. The NameNode gives the client the block ID's and a list of DataNodes on which those blocks are found. And the application reads the block directly from the DataNodes. The NameNode maps the read request against the block locations in its stored metadata and reads those blocks from the DataNodes. The client applications then read the block from the NameNode. The NameNode maps the read request against the block locations in its stored metadata the block ids are stored by their distance to the client and moved to the DataNode closest to the client according to hadoop rack topology. The client application then reads the blocks from the single DataNode.

What is the smallest number of slave nodes you would need to configure in your hadoop cluster to store 100TB of data, using Hadoop default replication values, on nodes with 10TB of RAW disk space per node?. 100. 25. 40. 10.

A slave node in your cluster has 24GB of Ram and 12 physical processor cores on hyper threading-enabled processor. You set the value of mapped. child.java.opts to -Xmx1G, and the value of mapred.tasktracker.map.tasks.maximum to 12. What is the appropriate value to set for mapred.Tastracker.reduce.tasks.maximum?. 24. 16. 6. 2. 12.

A client application opens a file write stream on your cluster. Which two metadata changes occur during a file write?. The NameNode triggers a block report to update block locations in the edits file. The change is written to the NameNode disk. The change is written to the edits file. The metadata in a Ram on the name node is updated. The metadata in Ram on the NameNode is flushed to disk. The change is written to the fsimage file. The change is written to the secondary NameNode.

Each slave node in your cluster has four 2 TB hard drives installed (4x2TB). You set a value of the dfs.du.DataNode.received parameter to 100GB on each slave node. How does this alter HDFS block storage?. 25 GB on each hard drive may not be used to store HDFS Blocks. 100 GB on each hard drive may not be used to store HDFS blocks. All Hard drives may be used to store HDFS blocks as long as at least 100GB in total is available on the node. A maximum of 100GB on each hard drive may be used to store HDFS blocks.

What are the permissions of a file in HDFS with the following:rw-rw-r-x?. HDFS runs in user space which makes all users with access to the namespace able to read, write and modify all files. The owner and group cannot delete the file, but other can. The owner and group can modify the contents of the file other can't. The owner and group can read the file other can't. No one can modify the content of the file.

Which three file actions can execute as you write a file into HDFS?. You can index the file. You can update the files contents. You can rename the file. You can delete the file. You can move the file.

You set the value of dfs.block.size to 64MB in hdfs-site.xml on a client machine, but you set the same property to 128MB on your clusters name node. What happens when the client writes a file to HDFS?. An execution will be thrown when the client attempts to write the file, because the values are different. A block size of 64MB will be used. A block size of 128MB will be used. The file will be written successfully with a block size of 64MB but client attempting to read the file will fail because the NameNode believes the blocks to be 128MB in size.

Using hadoop's default settings, how much data will be able to store on your hadoop cluster if it is has 12 nodes with 4TB raw diskspace per node allocated to HDFS storage?. Approximately 3TB. Approximately 12TB. Approximately 16TB. Approximately 48TB.

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?. The file will be marked as corrupted if data node B fails during the creation of the file. Each data node locks the local file to prohibit concurrent readers and writers of the file. Yes, so long as both tables fit into memory. Each data node stores a copy of the file in the local file system with the same name as the HDFS file. The file can be accessed if at least one of the data nodes storing the file is available.

Each node in your Hadoop cluster, running YARN, has 48GB memory and 12 cores. Your yarn-site.xml has the following configuration: You want YARN to launch a maxinum of 10 Containers per node. Enter the property value that would restrict YARN from launching more than 10 containers per node: 12. 10. 2048. 4096.

For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored?. Cached by the NodeManager managing the job containers, then written to a log directory on the NameNode. Cached in the YARN container running the task, then copied into HDFS on job completion. In HDFS, in the directory of the user who generates the job. On the local disk of the slave mode running the task.

Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn-site.xml has the following configuration: <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>32768</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>12</value> </property> You want YARN to launch no more than 16 containers per node. What should you do?. Modify yarn-site.xml with the following property: <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value>. Modify yarn-sites.xml with the following property: <name>yarn.scheduler.minimum-allocation-mb</name> <value>4096</value>. Modify yarn-site.xml with the following property: <name>yarn.nodemanager.resource.cpu-vccores</name>. No action is needed: YARN's dynamic resource allocation automatically optimizes the node memory and cores.

You want node to only swap Hadoop daemon data from RAM to disk when absolutely necessary. What should you do?. Delete the /dev/vmswap file on the node. Delete the /etc/swap file on the node. Set the ram.swap parameter to 0 in core-site.xml. Set vm.swappiness file on the node. Delete the /swapfile file on the node.

You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster's master nodes?. HMaster. ResourceManager. TaskManager. JobTracker. NameNode. DataNode.

You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?. For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O. Increase the io.sort.mb to 1GB. Decrease the io.sort.mb value to 0. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.

You are running a Hadoop cluster with a NameNode on host myNameNode, a secondary NameNode on host mysecondaryNameNode and several DataNodes. Which best describes how you determine when the last checkpoint happened?. Execute hdfs NameNode report on the command line and look at the Last Checkpoint information. Execute hdfs dfsadmin saveNamespace on the command line which returns to you the last checkpoint value in fstime file. Connect to the web UI of the Secondary NameNode (http://mysecondary:50090/) and look at the Last Checkpoint information. Connect to the web UI of the NameNode (http://myNameNode:50070) and look at the Last Checkpoint information.

Which YARN daemon or service monitors a Controller's per-application resource using (e.g., memory CPU)?. ApplicationMaster. NodeManager. ApplicationManagerService. ResourceManager.

Which is the default scheduler in YARN?. YARN doesn't configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml. Capacity Scheduler. Fair Scheduler. FIFO Scheduler.

Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs?. Complexity Fair Scheduler (CFS). Capacity Scheduler. Fair Scheduler. FIFO Scheduler.

What is data localization?. Before processing the data, bringing them to the local node. Hadoop will start the Map task on the node where data block is kept via HDFS. 1 and 2 both are possible. None of the above is correct.

Which daemon is responsible for the Housekeeping of the NameNode?. JobTracker. Tasktracker. NameNode itself. Secondary NameNode.

Which Daemon distributes individual task to machines. TaskTracker. JobTracker. MasterTracker. NameNode.

How does Hadoop process large volumes of data?. Hadoop uses a lot of machines in parallel. This optimizes data processing. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware. Hadoop ships the code to the data instead of sending the data to the code. Hadoop uses sophisticated caching techniques on NameNode to speed processing of data.

What action occurs automatically on a cluster when a DataNode is marked as dead?. The NameNode forces re-replication of all the blocks which were stored on the dead DataNode. The replication factor of the files which had blocks stored on the dead DataNode is temporarily reduced, until the dead DataNode is recovered and returned to the cluster. The NameNode informs the client which write the blocks that are no longer available, the clients then re-write the blocks to a different node. All of the above.

How will you define HDFS Federation?. HDFS federation is used to start communication between two NameNode which are not part of the same cluster. HDFS Federation allows the Standby NameNode automatically resume the services of an active NameNode. HDFS Federation can be used to help HDFS scale Horizontally and it divides the Single NameNode work to multiple Namenodes and each individual NameNode will differently part of the HDFS Namespace. If there is Network failure then one DataNode.

Which of the following component will start the Map or Reduce task. TaskTracker. NameNode. JobTracker. ResourceManager. None of the Above.

What happens when a running task fails in the hadoop... Failed task data will be lost. The master will detect that failure and re-assign the work to a different node on the system. If a failed node restarts, it is automatically added back to the system and assigned new tasks. 2 and 3 both are correct.

If a node appears to be running slowly then. The master can redundantly execute another instance of the same task. Result from the first to finish will be used. No new task will be restarted. 1 and 2 are correct.

Changing the configuration in the NameNode will be used by all the DataNodes, even without re-starting the whole cluster. True. False.

Without the metadata on the NameNode file can be recovered?. True. False.

HDFS balancer is a tool that balances disk space usage on an HDFS cluster when some DataNodes become full or when new empty nodes join the cluster. True. False.

Hadoop gives preference to intra-rack data transfer in order to conserve bandwidth. True. False.

In case of Fair scheduler there are multiple pools and each pool has their own priority. True. False.

Deploy compute cluster and storage cluster separately and it disaggregates storage from compute in Hadoop environment, which enables compute and storage to grow independently as per business requirements. True. False.

in CDP Private Cloud Base, you can have single File System based on HDFS and you can create multiple compute cluster on that?. True. False.

it's not recommended that you use “nscd” to cache both DNS name resolution and static name resolution for Kudu?. True. False.

CDP Private Cloud Base is supported on platforms with Security-Enhanced Linux (SELinux) enabled and in enforcing mode. Cloudera is not responsible for SELinux policy development, support, or enforcement. If you experience issues running Cloudera software with SELinux enabled, contact your OS provider for assistance. True. False.

Cloudera Recommends placing DataNode data directories on NAS Devices?. True. False.

In Hadoop Cluster increases number of drives on Data Nodes for more data storage does not affect the Network or Bandwidth requirement. True. False.

In CDP Private Cloud Base for Hadoop, It is generally better to have more average nodes than fewer super nodes. True. False.

Number of physical drives per host may be a limiting factor in determining the number of container slots configured per node?. True. False.

If a customer has experienced fsync delays and other I/O related issues with ZooKeeper, ZooKeeper’s dataDir and dataLogDir can be configured to use separate disks?. True. False.

On host you are using Kudu should not have filesystem hole punching capability enabled. True. False.

CDP Private Cloud Base deployments are restricted to data centers within a geographic region. Single clusters spanning large geographic regions are not supported. True. False.

You should always keep the Service Monitor and Reports Manager in the same host?. True. False.

What of the following is the functionality of the SecondaryNameNode Role. This is used in case of failover and work as a cold NameNode. It is a Backup Node. If Primary NameNode is very busy then Secondary NameNode is used to read the file block location. Secondary Namenode is responsible for the Housekeeping of the NameNode and it is not a backup of NameNode.

Which statement is true about the storing files in HDFS. Files are split in the block. All the blocks of the files should remain on same machine. A master node keeps the track for files all the block.

Which is the master node for tracking the files block in HDFS?. JOBTracker. DataNode. NameNode. DataMasteNode.

Select the correct statement for the NameNode?. NameNode daemon must be running at all the times. NameNode holds all its metadata in RAM for fast access. NameNode controls the complete JOB and assigns the block to process.

Your existing Hadoop cluster has 30 slave nodes, each of which has 4 x 2T hard drives. You plan to add another 10 nodes. How much disk space can your new nodes contain?. The new nodes must all contain 8TB of disk space, but it does not matter how the disks are configured. The new nodes cannot contain more than 8TB of disk space. The new nodes can contain any amount of disk space. The new nodes must all contain 4 x 2TB hard drives.

Select the correct option?. NameNode is the bottleneck for reading the file in HDFS. NameNode is used to determine the all the blocks of a file. Reading the block of the file is always routed to the NameNode. All of the above.

Which is the correct command to copy files from local to HDFS file systems. hadoop fs -copy pappu.txt pappu.txt. hadoop fs -copyFromPath pappu.txt pappu.txt. hadoop fs -copyFromLocal pappu.txt pappu.txt. None of the above.

Which is the correct command to list all the files from current directory. hadoop fs -ls. hadoop fs -list. hadoop fs . All of the above.

Arrange the life cycle of the Mapreduce Job based on below option. Clients submit the Mapreduce Job to the Jobtracker. The Jobtracker assigns Map and reduce Tasks to the other nodes on the cluster. Each Nodes which run SOFTWARE DAEMON known as Tasktracker. The TaskTracker is responsible for actually instantiating the Map and Reduce Task. Tasktracker report the tasks progress back to the JobTracker.

What is the meaning of concerning status on Cloudera Manager?. NameNode does not have enough space to hold files metadata space. DataNode does not have enough space to hold file block. Since last 60 minutes data node have not sent a heartbeat message.

What is the recommended disk configuration for DataNode in Hadoop Cluster. JBOD. RAID 5. RAID 10. JDOB.

HDFS is configured in High Availability using Quorum-Based storage and there is no HDFS federation is implemented. And to avoid split-brain situation how many NameNode should be configured. Three NameNode, to maintain three copies of metadata. Two NameNode and both should be in active mode. Two active and One Passive NameNode. One Active and one Standby Namenode.

Cluster Summary: 45 files and directories, 12 blocks = 57 total. Heap Size is 15.31 MB / 193.38MB (7%) Configured Capacity: 17.33 GB, DFS Used: 144 KB, Non DFS Used: 5.49 GB DFS Remaining: 11.84 GB, DFS Used%: 0%, DFS Remaining%: 68.32% , Live Nodes : 6 Dead Nodes: 1, Decommissioning Nodes: 0, Number of Under-Replicated Nodes: 6 Refer to the above summary. You configure the Hadoop cluster with seven DataNodes and the Namenodes web UI displays the details shown as above. What does this tell you?. The HDFS cluster is in the safe mode. Your cluster has lost all HDFS data which had blocks stored on the dead DataNode. One physical host crashed. The DataNode JVM on one host is not active.

You observe that the number of spilled records from map tasks for exceeds the number of map output records. You child heap size is 1 GB and your io.sort.mb value is set to 100 MB. How would you tune your io.sort.mb value to achieve maximum memory disk I/O ratio. TaskTracker. JOBTracker. ResourceManager. NameNode. Application Master.

In your hadoop cluster you have 10 Nodes and each node has 3 TB hard drive attached so total available space is 30 TB, how much data you will be able to store. 30 TB. 10 TB. 9 TB. 90 TB.

In HDFS cluster which is using NameNode federation, one NameNode manages /pappu namespace and one NameNode manages /pintu Namespace. What happens when client tries to write /pintu/myfile.txt?. The file is successfully written to /pintu/myfile.txt and metadata for the file is managed by the first NameNode to which client connects. File write will be failed because while writing NameNode info is not provided. An exception will be thrown by saying NameNode info cannot be resolved.

What should be the slave node configuration consideration in Hadoop cluster design. Ratio between number of processor cores and mount of memory. Ratio between the number of processor and storage capacity. The ratio between number of processor cores and number of disk drives.

Will settings using Java API overwrite values in configuration files?. No. The configuration settings in the configuration file takes precedence. Yes. The configuration settings using Java API take precedence. It depends when the developer reads the configuration file. If it is read first then no. Only global configuration settings are captured in configuration files on NameNode. There are only a very few job parameters that can be set using Java API.

What is distributed cache?. The distributed cache is special component on NameNode that will cache frequently used data for faster client response. It is used during reduce step. The distributed cache is special component on DataNode that will cache frequently used data for faster client response. It is used during map step. The distributed cache is a component that caches java objects. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.

which of the following daemon failures will cause whole cluster unavailability. DataNode. Application Manager. TaskTracker. NameNode.

In a cluster configured with HDFS High Availability (HA) but NOT HDFS federation, each map task run: In the same Java Virtual Machine as the DataNode. In the same Java Virtual Machine as the TaskTracker. In its own Java Virtual Machine. In the same Java Virtual Machine as the JobTracker.

What happen if a DataNode loses network connection for a few minutes?. The NameNode will detect that a DataNode is not responsive and will start replication of the data from remaining replicas. When DataNode comes back online, administrator will need to manually delete the extra replicas. All data will be lost on that node. The administrator has to make sure the proper data distribution between nodes. If the DataNode comes back online just after few minutes, the cluster won’t detect that it was not available and will continue working normally. The NameNode will detect that a DataNode is not responsive and will start replication of the data from remaining replicas. When DataNode comes back online, the extra replicas will be deleted.

What happen if one of the DataNodes has much slower CPU? How will it affect the performance of the cluster?. The task execution will be as fast as the slowest worker. However, if speculative execution is enabled, the slowest worker will not have such big impact. The slowest worker will significantly impact job execution time. It will slow everything down. The NameNode will detect the slowest worker and will never send tasks to that node. It depends on the level of priority assigned to the task. All high priority tasks are executed in parallel twice. A slower DataNode would therefore be bypassed. If task is not high priority, however, performance will be affected.

Hadoop framework provides a mechanism for copying with machine issues such as faulty configuration or impeding hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. all the task run simultaneously and the task that finishes first are used. Which term describe this behaviour.. Partitioning. Combining. Identity. Speculative Execution.

Which is the component of the ResourceManager. Scheduler. Applications Manager. Node Manager. All of the above. Only Scheduler and Applications Manager are correct.

Which statement is true about ApplicationsManager. is responsible for accepting job-submissions. negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. monitoring their resource usage (CPU, memory, disk, network) and reporting the same to the ResourceManager or Scheduler.

You have cluster configured with FIFO scheduler enabled and there two jobs Job1 which take 10 Hrs to finish and another job, Job2 which take 1 Hr to finish and submitted in this order only then. Job1 was submitted before Job2, hence it will guarantee that all tasks form Job1 will finish before all tasks of Job2. Scheduler will pre-empt the job submitted later in FIFO. Processing order may vary as per availability of resources. Submission of Job2 will not be accepted.

How must you format the underlying filesystem of your Hadoop cluster's slave nodes running on Linux?. They may be formatted in nay Linux Filesystem. They must be formatted as HDFS. They must be formatted as either ext3 or ext4. They must not be formatted--HDFS will format the filesystem automatically.

Identify four pieces of cluster information that are stored on disk on the NameNode?. A Catalog of DataNodes and the blocks that are stored on them. Names of the files in HDFS. The directory structure of the files in HDFS. An edit log of changes that have been made since the last snapshot of the NameNode. An edit log of changes that have been made since the last snapshot compaction by the Secondary NameNode. File permissions of the files in HDFS. The status of the heartbeats of each DataNode.

Which command does Hadoop offer to discover missing or corrupt HDFS data?. The map only checksum utility. Fsck. fck. fk.

Your hadoop cluster contain nodes in three racks. Choose which scenario results if you leave the dfs.hosts property in the Namenodes configuration file empty(blank)?. The NameNode will update dfs.hosts property to include machines running the DataNode daemon on the next NameNode reboot or with a dfsadmin-refreshnodes. Any machine running the DataNode daemon can immediately join the cluster. Presented with a blank dfs.hosts property ,the NameNode will permit DataNodes specified in mapred.hosts to join the cluster. No new can be added to the cluster until you specify them in the dfs.hosts file.

What metadata is stored on a DataNode when a block is written to it?. None, Only the block itself is written. checksum for the data in the block, as a separate file. Information on the file's location in HDFS. Node location of each block belonging to the same namespace.

IN High Availability HDFS cluster, and two Namenodes PN01 and PN02. What happens when following haadmin command executed. Hdfs haadmin -failover PN01 PN02. PN02 will be stand by NameNode and PN01 becomes the active node. PN01 is in stand by and PN01 becomes the active NameNode. PN01 is fenced, and PN02 becomes the active NameNode. PN02 becomes the standby NameNode and PN02 becomes the active NameNode.

There is a configuration of HDFS cluster with the NameNode federation, one NameNode manages the /pappu namespace and another NameNode manages the /pintu namespace. How to configure client machine so that it can access both the directories. It is not possible. No configuration required. add both the node in core-site.xml.

Where are Hadoop's task log files are stored?. In HDFS directory of the user who run the Job. On the local disk of the slave node running the task. There would not be log generated for individual task.

Which statement is true if we consider NameNode and DataNode. NameNode requires more memory but less disk capacity. NameNode requires less memory and less disk capacity that the DataNodes. NameNode and DataNodes should have the same hardware configuration. None of the above correct.

Which scheduler should be used in the cluster which allows small jobs to finish within a reasonable time without starving long-running jobs. FIFO. Capacity. Fair. All of the above support such behaviour.

You have to design a Hadoop Cluster where you will receive 2 TB of data each week, and replication factor is 2 and your slave node has 10 TB space. So how many DataNodes are required to store 1 year data. 21. 40. 50. 100.

When should be higher speed 25GB GB Eathernet fabric should be considered for Hadoop Cluster. When the typical workloads generate a large amount of intermediate data, on the order of the input data itself. When the typical workloads consist of processor-intensive tasks. When the typical workloads consume a large amount of input data, relative to the entire capacity of HDFS. When the typical workloads generate a large amount of output data, significantly larger than the amount of intermediate data.

Hadoop file has permission as below rw_ r_ _r_ _ What it means. File can be deleted only by the Owner. Once file written to HDFS it can never be deleted. File cannot be modified by owner only. None of the above.

In YARN (MRv2) which daemon is responsible for launching application containers and monitoring application resources usage. Resource Manager. ApplicationMaster. NodeManager. JobTracker.

What happens if a Mapper Task goes in infinite loop?. After some time (As per the configuration) TaskTracker will kill that MapTask. The Job will keep running and administrator has to kill this Job. TaskTracker will restarted after some time.

When Hadoop cluster is configured properly, however one of the following scenerio will get undetected. HDFS is full. NameNode goes down. One DataNode leave the cluster. Map task is in infinite loop.

When setting the HDFs cluster, which of the following considered is the least important?. Amount of memory on the Namenode. Number of DataNodes. Disk capacity of the NameNode. Number of files that will be stored in HDFS.

Running HDFS balancer periodically can be used to. To Help HDFS deliver consistent performance under heavy load. Consistent disk utilization across the DataNodes. To ensure that there is capacity in HDFS for additional data.

Which of the following information is stored on the NameNode?. A Catalog of DataNodes and the blocks that are stored on them. Name of the files in HDFS. The directory structure of the files in HDFS. An edit log changes that have been made since the last snapshot compaction by secondary name node.

By default, log files for individual tasks in a job are stored: On the Task Tracker’s local disk, and in the job's output directory in HDFS. On the TaskTracker's local disk only. In the job's output directory in HDFS only. On the TaskTracker's local disk, and on the JobTracker's local disk. On the JobTracker's local disk only.

How does HDFS Federation help HDFS Scale horizontally?. HDFS Federation improves the resiliency of HDFS in the face of network issues by removing the NameNode as a single-point-of-failure. HDFS Federation allows the Standby NameNode to automatically resume the services of an active NameNode. HDFS Federation provides cross-data center (non-local) support for HDFS, allowing a cluster administrator to split the Block Storage outside the local cluster. HDFS Federation reduces the load on any single NameNode by using the multiple, independent NameNode to manage individual pars of the filesystem namespace.

Choose which best describe a Hadoop cluster's block size storage parameters once you set the HDFS default block size to 64MB?. The block size of files in the cluster can be determined as the block is written. The block size of files in the Cluster will all be multiples of 64MB. The block size of files in the duster will all at least be 64MB. The block size of files in the cluster will all be the exactly 64MB.

What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes. You must restart the NameNode daemon to apply the changes to the cluster and You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes. You must restart all six DataNode daemons to apply the changes to the cluster. You don't need to restart any daemon, as they will pick up changes automatically. You must modify the configuration files on each of the six DataNode machines. You must modify the configuration files on only one of the DataNode machine.

You install Cloudera Manager on a cluster where each host has 1 GB of RAM. All of the services show their status as concerning. However, all jobs submitted complete without an error. Why is Cloudera Manager showing the concerning status KM the services?. A slave node's disk ran out of space. The slave nodes, haven't sent a heartbeat in 60 minutes. The slave nodes are swapping. DataNode service instance has crashed.

What is the recommended disk configuration for slave nodes in your Hadoop cluster with 6 x 2 TB hard drives?. RAID 10. JBOD. RAID 5. RAID 1+0.

Your Hadoop cluster has 25 nodes with a total of 100 TB (4 TB per node) of raw disk space allocated HDFS storage. Assuming Hadoop's default configuration, how much data will you be able to store?. Approximately 100TB. Approximately 25TB. Approximately 10TB. Approximately 33 TB.

You set up the Hadoop cluster using NameNode Federation. One NameNode manages the/users namespace and one NameNode manages the/data namespace. What happens when client tries to write a file to/reports/myreport.txt?. The file successfully writes to /users/reports/myreports/myreport.txt. The client throws an exception. The file successfully writes to /report/myreport.txt. The metadata for the file is managed by the first NameNode to which the client connects. The file writes fail silently; no file is written, no error is reported.

You are working on a Replication of HDFS files from source CDP Cluster to Destination CDP cluster. However, there are thousands of files and subdirectories. You need to increase the memory, how would you do?. You need to increase the heap size in hadoop-env.sh file. To increase the heap size, add the key-value pair HADOOP_CLIENT_OPTS=-Xmx<memory_value>. To increase the heap size, add the key-value pair MAPRED_DISTCP_OPTS="-Xmx<memory_value>. To increase the heap size, add the key-value pair HADOOP_USER_PARAMS=-Xmx<memory_value>.

Every week you are replicating data from Production cluster to Test Cluster. However, you have found that Replication Manager logs are getting accumulated after each replication, how can you make sure logs are not getting retained?. You can set “Replication Manager Log Retention property value to 0. You can set “Replication Manager Log Retention property value to 90. You can delete “Replication Manager Log Retention property. You can set “Replication Manager Log Retention property value to -1.

You are setting up a Cloudera Private Cloud Base and for that you are preparing the host. Which of the following are the basic minimum requirement for the VM or Physical Machine to have CDP running?. Linux Based Operating System. Windows Based Operating System. JDK. MongoDB. HBase. PostgreSQL.

While setting up the Cloudera CDP Private Base Cluster, you have to have some mandatory services installed from the Cloudera Runtime. Which of the following are the mandatory Runtime Components or Services?. Hive Metastore. Ranger. Atlas. HDFS. Ozone. Arrow. Impala. Phoenix.

Which of the following statements are correct for the Cloudera CDP Private Cloud Base?. Cloudera CDP Private Cloud Base also includes the Data Experiences which are created using Kubernetes Clusters. CDP Private Cloud Base includes HDFS. CDP Private Cloud Base can be created on VMs or Physical Servers. CDP Private Cloud Base does not support a separation of compute and storage.

IBM Spectrum is a software for file and object storage. And it can be deployed as a software-defined storage management solution with sophisticated data management capabilities. Which of the following can be a best fit for Cloudera CDP Private Cloud Base, which can use IB Spectrum based storage?. Hive. Impala. NiFi. HDFS. Atlas.

Which of the following are the components of the CDP Private Cloud Base (CDP PCB)?. Cloudera Management Console. Cloudera CDP Data Warehouse. Cloudera Manager. Cloudera Data Hub Runtime Components.

HDFS supports HDFS Transparency, what are the advantages of using HDFS Transparency?. Hadoop applications can run unmodified over other Supported Object Storage. It provides, Immediate support for Hadoop applications and ISVs. It helps in having, Single namespace for Hadoop and non-Hadoop workloads. Reuse HDFS client as-is.

Which of the following statements are correct with regards to Cloudera Manager?. Users can manage multiple Hadoop clusters. Users can monitor multiple Hadoop clusters. Users can configure multiple Hadoop clusters. Cloudera Manager does not expose any API to end users.

Which of the following statements are true with regards to Cloudera Manager, Custom Service Descriptor?. Using Cloudera Service Descriptors (CSDs), you can add your own managed service. Using Cloudera Manager CSDs, you can use features of Cloudera Manager like monitoring, resource management, configuration, distribution and lifecycle management. A CSD is linked to one service type in Cloudera Manager. If you use CSD then your service is packaged and distributed as a .jar file.

Please map the following. Manages auditing HDFS resources and access control through a user interface that ensures consistent policy administration in CDP clusters. Provides a set of metadata management and governance services that enable you to manage CDP cluster assets. Is a distributed file system designed to run on commodity hardware. Highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration. is a network authentication protocol.

Please map the following hosts in Cloudera Manager or CDP Private Cloud Base Perspective. It runs Hadoop processes such as the HDFS NameNode and YARN Resource Manager. It runs the cluster processes such as Cloudera Manager and the Hive Metastore. These are client access points for launching jobs in the cluster. It runs DataNodes and other distributed processes such as Impalad.

When you are using Replication manager, then which of the following are correct?. The Run As Username field is used to launch MapReduce job for copying data. Run on Peer as Username field is used to run copy listing on source. Run on Peer as Username field is used to launch MapReduce job for copying data. Run As Username field is used to run copy listing on source.

Data Encryption at rest in HDFS is provided by. By Default, in CDP Private Cloud Base, HDFS has this feature enabled an no setup is needed. It is provided by Encryption Keys and Encryption Policy. It is provided by SSL protocol. It is provided by TLS protocol.

You are using third party storage layer for storage like IBM Spectrum instead of default HDFS provided by Cloudera CDP Private Base. And IBM Spectrum has built-in encryption per node basis and HDFS provides user level encryption. If User Level encryption is enabled, then you can not use built-in encryption. If User Level encryption is enabled, then also you can use built-in encryption, which is per node basis. If User Level encryption is disabled, then you can not use built-in encryption. If User Level encryption is disabled, then you can use built-in encryption, which is per node basis.

Which of the following you can have with the help of Cloudera Workload XM?. It provides the insights to help you gain in-depth understanding of the workloads you send to clusters managed by Cloudera Manager. It can provide information that can be used to troubleshoot failed jobs. It can help in optimizing the slow jobs, which run on the cluster managed by Cloudera Manager. It can be used to display metrics about the performance of a job. It can compare current run of a job to previous runs of the same job by creating baselines.

Which of the following should run on the worker Hosts?. Node Manager. Impalad. ZooKeeper. Schema Registry.

You are setting up a Cloudera Manager for your CDP Private Cloud base. You know, your storage requirement is more when a greater number of parcels are downloaded. Which of the following host partition directory, you would configure with higher storage for such scenario?. /usr. /var. /opt. /cm.

Please select correct statement. You can use Replication Manager to replicate data from an unsecure cluster, one that does not use Kerberos authentication, to a secure cluster, a cluster that uses Kerberos. You can use Replication Manager to replicate data from a secure cluster, one that does use Kerberos authentication, to a insecure cluster, a cluster that does not uses Kerberos. You can use Replication Manager to replicate data from a secure cluster, one that does use Kerberos authentication, to a secure cluster, a cluster that uses Kerberos.

To copy data from source cluster to destination cluster. You have to configure Peer Cloudera Manager in Source Cloudera Manager instance. You have to configure Peer Cloudera Manager in Destination Cloudera Manager instance. You have to configure Peer Cloudera Manager in both Source and Destination Cloudera Manager instance.

While adding Peer in a Destination cluster for Replication. You know, that your cluster is using SAML Authentication. Hence, which level of permissions at the minimum needed to Add Peer. Full Administrator. User Administrator. User.

Which of the following is true, while replicating HDFS data?. The destination service must be managed by the Cloudera Manager Server where the replication is being set up. Source service can be managed by that same server or by a peer Cloudera Manager Server. You can also replicate HDFS data within a cluster by specifying different source and destination directories. You can not replicate HDFS data within a cluster. Remote Replication Manager automatically copies HDFS metadata to the destination cluster as it copies files.

When you are using HDFS replication then which of the following statements are correct with regards to source data host?. A file added during replication does not get replicated. If you delete a file during replication, the replication fails. Replication fails if source files are open. You can configure the replication to continue despite errors.

Which of the following is/are correct with regards to Replication Manager performance and stability?. Maximum number of files for a single replication job can be upto 100 million. Maximum number of files for a replication policy that runs more frequently than once in 8 hours can be 10 million. The throughput of the replication job depends on the absolute read and write throughput of the source and destination clusters. Regular rebalancing of your HDFS clusters is required for efficient operation of replications.

Cloudera CDP Private Cloud Base, Kudu supports which of the following Filesystems?. Ext3. Ext4. XFS. S3. GCS. Azure Storage.

When you run below command on one of the Linux host in CDP Private Base Cluster. What it does? /dev/sdb1 /data1 ext4 defaults,noatime 0 mount -o remount /data1. It would maintain the Metadata updated timestamp accurate while reading data. It would maintain the Metadata updated timestamp accurate while writing data to disk. It would not maintain the Metadata updated timestamp while reading data.

Select correct statements with regards to mount and sync option for Filesystem?. Using the sync filesystem mount option reduces performance for services that write data to disks, such as HDFS, YARN, Kafka and Kudu. Using the sync filesystem mount option increases performance for services that write data to disks, such as HDFS, YARN, Kafka and Kudu. Using the sync filesystem mount option does not impact the performance for services that write data to disks, such as HDFS, YARN, Kafka and Kudu. Using the sync filesystem mount option may or may not impact the performance for services that write data to disks, such as HDFS, YARN, Kafka and Kudu.

select the correct options which is applicable to DataNode Data Directory mounts. NFS options are supported for use as DataNode Data Directory mounts, when using Hierarchical Storage features. NFS options are not supported for use as DataNode Data Directory mounts, even when using Hierarchical Storage features. NAS options are supported for use as DataNode Data Directory mounts, when using Hierarchical Storage features. NAS options are not supported for use as DataNode Data Directory mounts, even when using Hierarchical Storage features.

Select correct statements with regards to Cloudera filesystem and /tmp directory. Cloudera does not support mounting /tmp. Cloudera does support mounting /tmp with the noexec option. Cloudera does not support mounting /tmp with the noexec option.

Select the correct statements with regards to filesystem and nproc. Cloudera Manager automatically sets nproc configuration in /etc/security/limits.conf. Cloudera Manager automatically sets nproc configuration in /etc/security/limits.conf, but this configuration can be overridden by individual files in /etc/security/limits.d/. We need to make sure nproc limits are set sufficiently high, such as 65536 or 262144. Low value of nproc can be a problem for Apache Impala.

When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster cannot start or function correctly. Hence, Cloudera suggests your Database must be. RDBMS services should be highly-available. RDBMS services should be load balanced and with multiple active RDBMS services must ensure all connections are routed to a different RDBMS service instance at any given time. RDBMS services should be load balanced and with multiple active RDBMS services must ensure all connections are routed to a single RDBMS service at any given time. Cloudera components are not designed for and do not support High Availability.

Select correct statements with JDK requirement with Cloudera CDP Platform. Cloudera supports both JDK 32 and JDK 64 bit. Running Runtime nodes within the same cluster on different JDK releases are supported. All cluster hosts may use different version of supported JDK update level. Cloudera strongly recommends installing Oracle JDK at /usr/java/<jdk-version> and OpenJDK at /usr/lib/jvm. Running Runtime nodes within the same cluster on different JDK releases is not supported. All cluster hosts must use the same JDK update level.

Select correct statements with regards to networking and security for Cloudera Private Cloud Base setup. CDH Requires IPV6 and support for IPV4 must be disabled. By Default, Cloudera Runtime supports the Multihoming. Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. The Cloudera Manager Agent runs as root.

Select correct statements with regards to networking and security for Cloudera Private Cloud Base setup. Cloudera Supports Security Enhanced Linux enabled with enforcing mode. Iptables and firewalld must be enabled and should not be updated to avoid misconfigurations. For RHEL and CentOS, the /etc/sysconfig/network file on each host must contain the correct hostname. Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file.

In Hadoop Data at Rest encryption can be applied at. OS Filesystem Level. Network Level. HDFS Level. Binary Level.

Please select the correct statements with regards to Data at Rest Encryption in Cloudera Private Cloud Base. On Linux System If the entropy is consistently low (500 or less), you must increase it. Key Trustee Server installation requires the default umask of 0022. Cloudera recommends using self-signed certificates for the hostname of your Key Trustee Server, to ensure secure network traffic. Cryptographic operations do not mandate entropy when used with Cloudera Manager.

You have been asked to install Cloudera Trial Version of Private Cloud Base on 4 node of cluster which are setup on Google Cloud Platform. Which of the following are the correct statements for this setup. You must be able to log in to the Cloudera Manager Server host using the root user account. The Cloudera Manager Server host must have uniform SSH access on the same port to all hosts. All hosts must have access to standard package repositories for the operating system and either archive.cloudera.com or a local repository with the required installation files. SELinux must be disabled or set to permissive mode before running the installer.

Select the correct statements with the Cloudera CDP Private Base Network setup?. Cloudera recommends that Hadoop be placed in a separate physical network with its own core switches. Cloudera Hadoop supports the concept of rack locality and takes advantage of the network topology to minimize network congestion. Having redundant core switches in a full mesh configuration allow the cluster to continue operating in the event of a core switch failure. Cloudera recommends allowing access to the Cloudera Enterprise cluster through edge nodes only. If you completely disconnect the cluster from the Internet, you block access for software updates which makes maintenance difficult.

What are the advantages of the using Parcels for installing Cloudera Runtime using Cloudera Manager?. Flexible installation location. Installation without sudo. Reduced upgrade downtime. Rolling upgrades, and easy downgrades. No need to upgrade CDP Private Cloud Base. It supports hot deployment of new changes.

As you are aware while running MapReduce jobs data send to reducers, which of the following needs to be adjusted according to volume of data send to Reducers. Disk space under /lib directory. Disk space under /usr directory. Raw Disk space for temporary storage. Raw Disk space for YARN mandatory storage.

Select correct statements with regards to Kafka and Drives setup for this in CDP Private Base?. Kafka clusters are often run on dedicated servers that do not run HDFS data nodes or processing components such as YARN and Impala. Kafka is a message-based system, fast storage and network I/O is critical to performance. Kafka brokers should be configured with dedicated spinning hard drives for the log data directories. Kafka drives should also be configured as RAID 10 because the loss of a single drive on a Kafka broker will cause the broker to experience an outage.

Select correct statements with regards to Cloudera Private Cloud Base and Disk setup?. Cloudera does not support more than 200 TB per data node. Cloudera does not support drives larger than 8 TB. Running CDP DC on storage platforms other than direct-attached physical disks can provide suboptimal performance. Cloudera Runtime and the majority of the Hadoop platform are optimized to provide high performance by distributing work across a cluster that can utilize data locality and fast local I/O.

Select the correct statements with regards to various bottlenecks in CDP Private Cloud Cluster Base with regards to CPU, Disk and Network. In general, CPU resources are not a bottleneck for MapReduce and HBase. In most of the cases performance bottleneck is Drive and Network. In most of the cases performance bottleneck is CPU. In efficient Hive queries can cause performance bottleneck.

In CDP Private Cloud Base cluster, which of the following statements are wrong with respect to CPU core and multithreading. CPU clock speed does not matter, since performance is a function of Drive or network. CPU clock speed does matter, and you should try to purchase the fastest CPUs available. Computationally intensive Spark jobs benefit more from faster CPUs than I/O bound MapReduce applications. Within a given MapReduce job, a single task typically uses one thread at a time.

Please select correct statements with regards to Power Supplies for the Nodes in the CDP Private Cloud Base. Redundant hot-swap power supplies are necessary for worker nodes. Redundant hot-swap power supplies are not necessary for worker nodes. Redundant hot-swap power supplies are not necessary for Master nodes. Redundant hot-swap power supplies are necessary for master nodes. If using single power supplies on worker nodes, then use alternate the power feeds for each rack.

Which of the following is a recommended value for vm.swappiness for Cloudera CDP Private Cloud Base host?. 1. 10. 60. 99. 0.

Select correct statements with regards to Cloudera Operating System Filesystems for Logical Volume Manager. Logical Volume Manager (LVM) should not be used for data drives. Logical Volume Manager (LVM) should be used for data drives. Logical Volume Manager (LVM) should not be used for OS drives. Logical Volume Manager (LVM) should be used for OS drives. You should use an extent-based file system.

When creating ext4 filesystems for use with Hadoop data volumes, which of the following are Cloudera's correct recommendations?. Use one inode per 1 MB. Minimize the number of super block backups or sparse_super. Enable journaling or has_journal. Use b-tree indexes for directory trees or dir_index. Use extent-based allocations or extent.

Select correct statements regarding Disk Mount options for the host on Cloudera Private Cloud Base?. All drives used by DataNode machines for data need to be mounted without the use of RAID Drives. All drives used by DataNode machines for data need to in /etc/fstab be mounted using the noatime option. All drives used by DataNode machines for data need to in /etc/fstab be mounted without using the noatime option. In case of SSD or flash turn on TRIM by specifying the discard option when mounting, this reduces premature SSD wear and device failures, while primarily avoiding long garbage collection pauses.

Select correct statements with regards to Cloudera Private Cloud Base and various components in it. ZooKeeper is sensitive to disk latency. The NameNode memory should be increased over time as HDFS has more files and blocks stored. The block size can also be specified by an HDFS client on a per-file basis. Erasure Coding (EC) is an alternative to the 3x replication scheme. Hadoop optimizes performance and redundancy when rack awareness is configured for clusters that span across multiple racks. When setting up a multi-rack environment, place each master node on a different rack.

Please select the correct statements with regards to Cloudera HDFS Data Balancing. Hadoop can help mitigate this by rebalancing data across the cluster using the balancer tool. Running the balancer is a manual process that can be executed from within Cloudera Manager as well as from the command line. Running the balancer is an automated process within Cloudera Manager. By default, the maximum bandwidth a DataNode uses for rebalancing is set to 1 MB/second. It is not recommended running the balancer on an HBase cluster.

Select the correct statements with regards to YARN. The YARN service manages the MapReduce tasks. The YARN service manages the Spark tasks. Applications run in YARN containers, which use Linux Cgroups for resource management and process isolation. Applications run in YARN containers, which use Linux NameSpace for resource management and process isolation.

Select the correct statements with regards to Cloudera Private Cloud Base Storage options. The HDFS data directories should use local storage, which provides all the benefits of keeping compute resources close to the storage and not reading remotely over the network. To guard against data center failures, you can set up a scheduled distcp operation to persist data to a supported cloud object storage platform. To guard against data center failures, you can leverage Cloudera Manager’s Replication Manager features to backup data to another running cluster. The HDFS data directories should not use local storage, which provides all the benefits of keeping compute resources close to the storage and not reading remotely over the network.

Cloudera recommends which Java Garbage collection algorithms for Service Monitor?. Serial Garbage Collector. Parallel Garbage Collector. CMS Garbage Collector. G1GC Garbage Collector. Z Garbage Collector.

Which all are the correctly recommended specs by Cloudera for Report Manager components?. You should configure the JAVA Heap size for Report Manager 3-4 times the size of fsimage. Recommended CPU is 2-4 cores. Disk should be at least 20 times the size of fsimage. Cloudera Recommends HDD but dedicated disk for Reports Manager.

Which of the following are the correct statements with regards to CDP Private Cloud Base Operating System recommendations by Cloudera?. All Runtime hosts in a logical cluster must run on the same major OS release. It supports a temporarily mixed OS configuration during an OS upgrade project. Cloudera Manager can run on the different OS release as one of the clusters it manages. Cloudera recommends running the same minor release on all cluster nodes. Cloudera does support Runtime cluster deployments in Docker containers. Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled and in enforcing mode.

If a developer is running PySpark jobs and he is complaining to you that, his PySpark jobs are not correctly required version of Python. What do you recommend?. Ask Developer to set PYSPARK_PYTHON environment variable to point to correct Python executable before running pyspark command. Ask Developer to set PYSPARK_DRIVER_PYTHON environment variable to point to correct Python executable before running pyspark command. You need to have only one version of the Python on the Cloudera CDP Private Base cluster. You need to make the changes in Cloudera Manager, Spark service to point to correct Python version.

Cloudera CDP Private Cloud Base, HDFS supports which of the following Filesystems?. Ext3. Ext4. XFS. S3. GCS. Azure Storage.

In CDP Private Cloud Base cluster, which of the following correct statements with respect to RAM allocation. Applications such as Impala and Cloudera Search are often configured to use large amounts of heap. It is critical to performance that the total memory allocated to all Hadoop-related processes (including processes such as HBase) is less than the total memory on the node. Oversubscribing the memory on a system can led to the Linux kernel’s out-of-memory process killer being invoked and important processes being terminated. Performance might be affected by over-allocation of memory to a Hadoop process, as this can lead to long Java garbage collection pauses.

Which of the following tools comes with the Hadoop to benchmark and baseline the Hadoop overall performance. ZooKeeper. BenchmarkHadoop. Teragen. Terasort.

Which of the following is true about the Cloudera Manager Time Line?. It is a graphical representation of the performance metrics for individual services or roles. It is used to display events that have occurred in the Cloudera Manager cluster. It is a tool used to diagnose issues related to the Cloudera Manager server. It is used to monitor resource usage across the entire cluster.

When selecting a point in time or a time range for monitoring and diagnostics in Cloudera Manager, which of the following statements is true?. A point in time allows you to view all monitoring data up until that point, while a time range allows you to specify a specific period of time to view monitoring data. A point in time and a time range both allow you to view monitoring data for a specific period of time, but a point in time is only available for services that have been running for at least a week. A point in time allows you to view monitoring data for a specific period of time, while a time range allows you to view all monitoring data up until that point. A point in time and a time range both allow you to view monitoring data for a specific period of time, but a point in time is only available for services that have been running for at least a day.

Which of the following is a health test that can be performed using Cloudera Manager to ensure the health of your cluster?. HDFS Disk Balancer Health Test. YARN Resource Manager Health Test. Impala Metadata Daemon Health Test. ZooKeeper Ensemble Health Test.

Which of the following Health Tests can be run in Cloudera Manager for a CDP Private Cloud Base Cluster?. Disk Space Usage Test: This test checks the disk space usage on each host in the cluster and alerts when it exceeds the threshold. YARN Queue Latency Test: This test monitors the latency of YARN queues and alerts when the latency exceeds the threshold. Node Health Test: This test checks the health status of each host in the cluster and alerts when a host is down or unreachable. Network Bandwidth Test: This test checks the network bandwidth between hosts in the cluster and alerts when the bandwidth exceeds the threshold. ZooKeeper Ensemble Test: This test checks the health of the ZooKeeper ensemble and alerts when a ZooKeeper server is down or unreachable.

Which of the following statements is true about the Cloudera Manager Health Tests?. The Cloudera Manager Health Tests diagnose the health of all components in the Cloudera Manager environment. The Cloudera Manager Health Tests perform a comprehensive set of tests on a cluster's HDFS, YARN, and MapReduce services. The Cloudera Manager Health Tests provide real-time insights into the performance and health of a Cloudera Manager deployment. The Cloudera Manager Health Tests require manual configuration and execution by a Cloudera Administrator.

What are the two types of health tests described in the passage, and what are some examples of each type? Select all that apply: Pass-fail tests - compare a property to a numeric value, such as the amount of disk space used or free. Pass-fail tests - there are two types: compare a property to a yes-no value, such as whether a DataNode is connected to its NameNode. Pass-fail tests - exercise a service lightly to confirm it is working and responsive, such as HDFS (NameNode rolE , HBase, and ZooKeeper services perform. Metric tests - compare a property to a threshold, such as how many pages were swapped to disk in the previous 15 minutes. Metric tests - exercise a service lightly to confirm it is working and responsive, such as HDFS (NameNode rolE , HBase, and ZooKeeper services perform.

What are the options available in Cloudera Manager for suppressing health test results, and how do they work? Select all that apply: Disable the test altogether, so it is not run. Exclude the test from the calculation of overall health for the entity. Configure the test to ignore certain threshold values. Set the test to run only during specific times of the day. Suppress the test results, but still run the test.

Which of the following statements about viewing charts for instances in Cloudera Manager are true? (Select all that apply). Charts can only be viewed for services and not for individual instances. Charts can be viewed for both services and individual instances. The Metrics tab in Cloudera Manager provides a range of charts for monitoring instances. The Metrics tab only provides charts for CPU and memory usage. The charts can only be viewed for instances that are currently running.

Which of the following statements about configuring monitoring settings in Cloudera Manager are true? (Select all that apply). Monitoring settings can be configured at the cluster, service, and role level. The Metrics tab provides access to all the monitoring settings in Cloudera Manager. Monitoring settings can be configured for both CDP Private Cloud Base and CDP Data Center clusters. The monitoring settings can be adjusted for individual users or groups. The Cloudera Manager Agent must be installed on all hosts in the cluster to enable monitoring.

Which of the following statements about the Service Monitor role in Cloudera Manager are true? (Select all that apply). The Service Monitor role is responsible for monitoring the health and performance of the cluster. The Service Monitor role must be manually installed on a host in the cluster. The Service Monitor role communicates with the Cloudera Manager Server to collect and store monitoring data. Multiple Service Monitor roles can be installed in a cluster for redundancy and high availability. The Service Monitor role is only required in CDP Data Center clusters and not in CDP Private Cloud Base clusters.

Which of the following monitoring settings can be configured in Cloudera Manager? (Select all that apply). Health tests for services and roles. Free space monitoring for directories on hosts. Monitoring of Kafka topics and consumers. Configuration change alerts for specific services. Configuration of logging thresholds and log directories. Monitoring settings for monitoring roles themselves.

What are some ways to configure health monitoring in Cloudera Manager? (Select all that apply). Enabling or disabling health tests for services and roles. Configuring health monitoring for Apache Spark only. Modifying thresholds for the status of certain health tests. Setting up email notifications for critical health events. Configuring monitoring settings for monitoring roles themselves.

What can you configure for directory monitoring in Cloudera Manager? (Select all that apply). The directories to be monitored. The frequency of directory monitoring. The actions to be taken when a threshold is breached. The types of files to be excluded from monitoring.

What can be configured when setting up YARN application monitoring in Cloudera Manager? (Select all that apply). Collection of detailed performance metrics for YARN applications. Enabling of automatic application tag-based grouping. Configuration of alerts for specific YARN application types. Enabling of automatic log collection for YARN applications. Configuration of thresholds for specific YARN application metrics.

Which of the following can be configured when setting up Impala query monitoring in Cloudera Manager? (Select all that apply). Enabling of automatic query history collection. Configuration of alerts for specific Impala query types. Collection of detailed performance metrics for Impala queries. Enabling of automatic query profiling. Configuration of query latency thresholds.

What is the benefit of enabling automatic query history collection in Cloudera Manager for Impala queries?. It allows for the creation of detailed performance metrics for Impala queries. It allows for the configuration of alerts for specific Impala query types. It enables automatic log collection for Impala queries. It allows for the review and analysis of past queries and their results.

What is the purpose of configuring Impala query data store maximum size in Cloudera Manager? (Select all that apply). To limit the amount of memory used by Impala queries. To limit the size of the query data store. To improve Impala query performance. To configure automatic data compression for the query data store.

What happens if the Impala query data store maximum size is reached in Cloudera Manager?. Impala will stop executing queries. Older data will be evicted from the query data store. The query data store will automatically increase in size. The cluster will shut down to prevent data loss.

What is the purpose of a metrics filter in Cloudera Manager? (Select all that apply). To filter out irrelevant metrics from being displayed. To display only metrics related to a specific service or host. To configure thresholds for health tests. To monitor the performance of a specific component. To generate alerts based on certain metrics.

What can be configured in Cloudera Manager related to log events? (Select all that apply). Thresholds for logging levels. Directories to monitor for log events. Frequency of log event generation. Triggering of alerts based on log events. Configuration of event server and alert publisher settings.

What can be configured in Cloudera Manager related to logging thresholds? (Select all that apply). Log directory. Log event capture. Logging thresholds for roles. Frequency of log rotation. Severity level for log messages.

What are the benefits of enabling log event capture in Cloudera Manager? (Select all that apply). The ability to generate alerts based on specific log messages. Improved system performance by reducing the number of log messages captured. Improved troubleshooting by capturing detailed log information. Improved security by capturing and analyzing security-related log events.

What are the different types of log messages that can be converted into events in Cloudera Manager? (Select all that apply). Fatal messages. Error messages. Warning messages. Debug messages. Trace messages.

Which of the following statements regarding monitoring clusters in Cloudera Manager are true? (Select all that apply). Cloudera Manager can monitor multiple clusters simultaneously. Metrics are collected from various sources including agents and JMX. Metrics are stored in the Hadoop Distributed File System (HDFS). Cloudera Manager only provides monitoring for Hadoop services. The Cloudera Manager interface allows for real-time monitoring of cluster health.

Which of the following types of metrics can be monitored in Cloudera Manager? (Select all that apply). CPU utilization. Memory usage. Network traffic. Disk I/O. Application logs.

Which of the following statements are true about the Cloudera Manager cluster utilization report? (Select all that apply). The report provides information on the CPU and memory utilization of individual hosts in the cluster. The report can be used to identify the most resource-intensive services in the cluster. The report displays real-time data and cannot be configured to show historical data. The report can be exported to a PDF or CSV format. The report can be filtered by time range and service.

What types of data can be displayed in the Cloudera Manager cluster utilization report? (Select all that apply). CPU and memory utilization for individual hosts. Network bandwidth usage for individual hosts. Disk I/O statistics for individual hosts. Database query performance for individual services. Service-specific metrics, such as number of running jobs or queries.

What is the purpose of Cloudera Manager's Cluster Utilization Report? To monitor the performance of individual hosts in the cluster. To identify and troubleshoot issues with specific services. To provide insight into the overall utilization of the cluster's resources. To generate recommendations for optimizing cluster performance.

Which configuration properties need to be modified to enable YARN utilization metrics collection in Cloudera Manager? Select all that apply. Enable Impala Admission Control. Enable Dynamic Resource Pools. Container Usage MapReduce Job User. Cloudera Manager Container Usage Metrics Directory. Cloudera Manager Container Usage Output Directory. Container Usage MapReduce Job Pool.

What are the types of metrics that can be viewed in the Cluster Utilization Report? (Select all that apply). CPU usage. Memory usage. Network usage. Disk usage. User login data.

What is the difference between the Cluster Utilization Report and the Cluster Diagnostics Report in Cloudera Manager?. The Cluster Utilization Report analyzes resource usage over time, while the Cluster Diagnostics Report provides detailed information on specific problems in the cluster. The Cluster Utilization Report provides detailed information on specific problems in the cluster, while the Cluster Diagnostics Report analyzes resource usage over time. The Cluster Utilization Report and the Cluster Diagnostics Report are the same report with different names. The Cluster Utilization Report and the Cluster Diagnostics Report provide the same information but in different formats.

What can be viewed on the Overview tab of the Cluster Utilization Report in Cloudera Manager? (Select all that apply). A timeline of the cluster's overall usage. The amount of data stored in the HDFS. The number of containers allocated to each application. The CPU and memory utilization of each node in the cluster. The top users and applications by usage.

Which of the following statements are true about CPU and Memory Utilization reports in Cloudera Manager?. The Maximum Utilization value for CPU indicates the highest CPU utilization for the entire cluster during the reporting window. The Average Daily Peak value for CPU indicates the average daily peak CPU consumption for the entire cluster during the reporting period. The Memory Utilization report shows the average physical memory available in the cluster during the reporting window. Clicking on "View Time Series Chart" shows a chart of peak utilization for both CPU and Memory Utilization reports. You can view the details about jobs running when maximum utilization occurred by selecting either "View YARN Applications Running at the Time" or "View Impala Queries Running at the Time" from the drop-down menu next to the date.

Which of the following statements are true about CPU utilization in Cloudera Manager?. Maximum Utilization represents the maximum CPU utilization for the entire cluster, including resources consumed only by user applications. Average Daily Peak represents the average daily peak CPU consumption for the entire cluster during the reporting period. Utilization by Tenant displays overall CPU utilization for each tenant, which can be either pools or users. View Impala Queries Running at the Time is an option to view details about jobs running when maximum utilization occurred for YARN and Impala.

Which of the following statements are true about ARN + Impala Utilization in Cloudera Manager?. Maximum Utilization represents the maximum resource consumption by YARN applications and Impala queries that ran on the cluster, including resources consumed only by user applications. Average Daily Peak represents the average daily peak resource consumption by YARN applications and Impala queries during the reporting period, including resources consumed only by user applications. Utilization by Tenant displays overall utilization for each tenant, which can be either pools or users. View YARN Applications Running at the Time is an option to view details about jobs running when maximum utilization occurred for YARN and Impala.

Which of the following statements are true about Utilization by Tenant in Cloudera Manager?. It displays overall utilization for each tenant, which can be either pools or users. It represents the maximum CPU utilization for the entire cluster, including resources consumed by user applications and Cloudera Runtime services. It represents the maximum resource consumption by YARN applications and Impala queries that ran on the cluster, including resources consumed by user applications and Cloudera Runtime services. It is not applicable for CPU Utilization.

Which memory utilization metrics can be viewed on the Cloudera Manager cluster utilization report overview tab?. Overall Cluster Utilization. YARN + Impala Utilization. Utilization by Tenant. Disk Utilization.

What does the "Average Daily Peak" metric represent in the memory utilization section of the Cloudera Manager cluster utilization report overview tab?. The average memory consumption for the entire cluster during the reporting window. The maximum memory consumption for the entire cluster during the reporting window. The average daily peak memory consumption for the entire cluster during the reporting window. The maximum daily peak memory consumption for the entire cluster during the reporting window.

What can be viewed by clicking the drop-down menu next to the date in the YARN + Impala Utilization section of the Cloudera Manager cluster utilization report overview tab?. View YARN Applications Running at the Time. View Impala Queries Running at the Time. View Spark Applications Running at the Time. View Hive Queries Running at the Time.

Which tabs are available in the Impala tab for displaying CPU and memory utilization?. Queries Tab. Overview Tab. Peak Memory Usage Tab. Spilled Memory Tab.

Which tabs are available in the Impala section of the Cloudera Service Monitor, and what information do they display? Select all that apply: Overview. CPU Usage. Peak Memory Usage. Queries. Spilled Memory.

What information is displayed in the Peak Memory Usage Tab of the Impala Memory Utilization Report in Cloudera Manager? Select all that apply. Max Allocated. Peak Allocation Time. Utilization by User. Utilized at the Time. Histogram of Allocated Memory.

What information can be viewed on the Cloudera Manager's Cluster Utilization Report Overview tab for Impala queries at the time of maximum memory usage?. Peak Usage Time. Reserved at the Time. Average Utilization. Max Utilized. Histogram of Utilized Memory at Peak Usage Time.

What information can be found on the Spilled Memory tab in the Impala monitoring report?. The amount of disk spills for Impala queries by tenant. The average spill per query for each tenant. The maximum memory spilled per hour for each tenant. Recommendations for improving Impala query performance.

Which of the following statements are true regarding downloading Cluster Utilization Reports using Cloudera Manager API? Select all that apply: The Cluster Utilization Report can be downloaded as a JSON file. Cloudera Manager REST API does not provide any endpoints for downloading the Cluster Utilization Report. The Impala Utilization Report can be downloaded as a JSON file. The YARN Utilization Report can be downloaded as a JSON file.

Which of the following statements are true regarding creating a custom Cluster Utilization Report in Cloudera Manager?. Cloudera Manager provides a Cluster Utilization Report that displays aggregated utilization information for HDFS and ZooKeeper jobs. You can create a custom Cluster Utilization Report using different metrics and queries. Custom reports cannot be built based on the same metrics data using the Cloudera Manager Admin console or the Cloudera Manager API. Metrics and queries can be used to create a custom Cluster Utilization Report in Cloudera Manager. You can retrieve metric data and view the Cloudera Manager Service Monitor data storage granularities in Cloudera Manager. You can build charts that query time series data using the Cloudera Manager Admin console.

Which of the following metrics can be used to monitor the overall health of a Cloudera Manager instance?. Number of active services. CPU usage of the Cloudera Manager Server process. Memory usage of the Cloudera Manager Server process. Number of active hosts.

Which of the following is a recommended approach to diagnosing a performance issue in a cluster?. Restart all affected services. Review logs for relevant errors and warnings. Increase the number of worker nodes. Add more memory to the Cloudera Manager Server.

Which of the following statements about collecting diagnostic data in Cloudera Manager are true?. Collecting diagnostic data is done automatically and does not require any user input. Diagnostic data can be collected for a specific service or for the entire cluster. Diagnostic data collection can be scheduled to run at regular intervals. Diagnostic data is automatically deleted after 7 days.

Which of the following metrics can be used to monitor the health of a Hadoop Distributed File System (HDFS) instance?. Block count. File count. Free disk space. Total disk space.

Which of the following statements about Cloudera Manager alerts are true?. Alerts are triggered by events or thresholds exceeding specified limits. Alerts can only be sent via email. Alerts can be configured to send notifications to multiple recipients. Alerts can be configured to automatically remediate issues.

Which of the following statements are true regarding the handling of metric values in Cloudera Manager's Cluster Utilization reports? Select all that apply: The frequency of metric values returned by a query depends on the collection frequency of the metric itself and the data granularity used in the tsquery statement. YARN container metrics are generated every minute resulting in one raw metric value every minute. Hourly aggregates include a set of statistics that summarize the raw metric values, including the sum, maximum, minimum, and count. When using hourly granularity, you lose the single values of the raw metric values, but can still get peak usage data for such metrics. For some of the YARN metrics described in this topic, peak usage data can be obtained across pool and user combinations, as well as at the pool level. When calculating CPU/Memory usage percentage, it is important to pay attention to the units for each metric.

What information can be obtained from the "Data Duration Covered" table in Cloudera Manager's Service Monitor Storage section? Select all that apply: The earliest available data points for each level of granularity. The duration for which data at each granularity level is retained. The age of the oldest data in the table. The data points for each granularity level that have been purged.

What does the Cloudera Manager REST API provide functionality for? Select all that apply: Specifying the desired data granularity. Retrieving metrics during the specified time window. Aggregating data from raw metric values. Providing information on the duration for which data at each granularity level is retained.

Which of the following statements are true about Cloudera Manager's Network Performance Inspector? Select all that apply: The inspector can be used to diagnose networking issues that can affect the performance of workloads such as MapReduce jobs, Spark jobs, Hive queries, and Impala queries. The inspector can only be run on-demand. The latency test run by the inspector reports the average ping time and packet loss percentage. The bandwidth test run by the inspector measures the speed of data transfer between hosts in the same cluster. The bandwidth test can only be run between clusters managed by an instance of Cloudera Manager.

Which types of inspections are run by Cloudera Manager's Network Performance Inspector? Select all that apply: Disk I/O test. Memory test. Latency test. CPU test. Bandwidth test.

Which endpoints can be used to invoke the Network Performance Inspector using the Cloudera Manager API? Select all that apply: /cm/commands/hostsPerfInspector. /cm/commands/clusterPerfInspector. /cm/clusters/clusterName/commands/perfInspector. /cm/commands/allHostsPerfInspector.

Which of the following statements is true about invoking the Network Performance Inspector using the Cloudera Manager API? Select all that apply: The inspector can only be invoked on the hosts that are part of the cluster. The endpoint /cm/commands/hostsPerfInspector can be used to invoke the inspector across an arbitrary set of hosts, including hosts that are not part of the cluster. The endpoint /cm/commands/clusterPerfInspector can be used to invoke the inspector across the hosts in two clusters. The endpoint /cm/clusters/clusterName/commands/perfInspector can be used to invoke the inspector across the hosts of a specified cluster.

Which of the following functions are included in Cloudera Manager's Service Monitoring feature? Select all that apply: Presents health and performance data in a variety of formats, including interactive charts. Monitors metrics against static, unchangeable thresholds. Generates events related to system and service health and critical log entries and makes them available for searching and alerting. Maintains a complete record of service-related actions and configuration changes. Provides access to diagnostic utility tools such as "collect stack traces" and "heap dump" for all role processes. Allows you to enable and configure the periodic collection of thread stack traces in Cloudera Manager. Allows you to view and run recent commands for a cluster, service, or role.

What types of information can you view by monitoring service and role instance status in Cloudera Manager? Select all that apply: A summary of the status for each service. The name of the role instance, the host on which it is running, the rack assignment, and more. The status for a role instance. A complete record of service-related actions and configuration changes. Recent and running commands for a cluster, service, or role.

Question: What actions can you perform from a service page in Cloudera Manager?. Monitor the status of the services running on your clusters. Manage the software packages installed on the hosts in your clusters. Add new services to your clusters. View the maintenance mode status of a cluster. View the log files for a specific service instance. View the health and status of a role instance. View the URLs of the client configuration files generated by Cloudera Manager.

What can be done with Cloudera Manager's client configuration files? Select all that apply: They can be generated automatically by Cloudera Manager based on the installed services. They can be manually downloaded and distributed to users if needed. They can be used to configure the services on the cluster. They can be used to view the status of the cluster. They can be accessed by clicking the "View Client Configuration URLs" button in the Actions menu on the Status page.

Which services in Cloudera Manager provide a Summary panel with additional statistics about their operation and performance? Select all that apply: HDFS. Hue. Oozie. MapReduce. Impala. Flume. ZooKeeper.

Which diagnostic utility tools can a Cluster Administrator run against most Java-based role processes using Cloudera Manager? Select all that apply. List Open Ports. Collect Stack Traces. Heap Dump. Heap Histogram.

Which of the following statements are true about periodic stacks collection in Cloudera Manager? Select all that apply. Periodic stacks collection can help diagnose performance issues such as deadlock, slow processing, or excessive numbers of threads. Stacks collection may impact performance for the processes being collected as well as other processes on the host. Stacks collection is turned on by default for all roles in Cloudera Manager. Cloudera Support may ask you to enable stacks collection and send the resulting logs for analysis. Stacks collection is not available for some roles in Cloudera Manager.

What are the benefits of enabling periodic stacks collection in Cloudera Manager? Select all that apply. It helps to diagnose performance issues such as deadlock, slow processing, or excessive numbers of threads. It improves the performance of the processes being collected as well as other processes on the host. It is turned on by default for all roles in Cloudera Manager. Cloudera Support does not require stacks collection logs for performance troubleshooting. It is available for all roles in Cloudera Manager.

Which of the following statements are true about viewing and downloading Stacks logs in Cloudera Manager UI and API? Select all that apply. Stacks logs are collected and logged to a compressed, rotated log file. The log data is always stored in an uncompressed file. Once the total number of files exceeds the configured retention limit, the newest files are deleted. Collected stacks data is available for download through the Cloudera Manager UI and API. To view or download stacks logs through the UI, you need to complete certain steps.

What information can be obtained from the Commands tab for a selected service or role instance?. The status, progress, and results of currently running commands. The status and results of commands run in the past. The cause of a service or role not starting up or shutting down correctly. The version of the software running on the service or role instance.

Which of the following statements are true about Cloudera Manager host monitoring?. Host monitoring allows for the collection of metrics on a per-host basis. The Cloudera Manager Agent is responsible for collecting host monitoring metrics. Host monitoring is only available for physical hosts, not virtual hosts. Host monitoring metrics can be used to detect hardware failures or performance issues.

Which of the following metrics are collected by Cloudera Manager host monitoring?. CPU usage. Memory usage. Network activity. Disk usage.

What information can be viewed in the Hosts tab in Cloudera Manager?. The list of hosts in the cluster. The status of services running on each host. The hardware configuration of each host. The number of users logged in to each host.

What information can be obtained from the Disks Overview page in Cloudera Manager?. The total number of disks in the cluster. The amount of free space and used space on each disk. The disk type, such as HDD or SSD. The average CPU utilization of the nodes in the cluster.

What actions can be performed on the Disks Overview page in Cloudera Manager?. Adding or removing disks from the cluster. Defragmenting disks for better performance. Filtering the disks by usage or type. Generating alerts for low disk space.

Which of the following types of information can be viewed in the Cloudera Manager Host Details page?. Hardware information. Network information. Operating system information. Application-specific information.

What are the benefits of using the Cloudera Manager Host Details page? Select all that apply. It provides real-time monitoring of hardware and software components. It allows for easy troubleshooting of host-related issues. It allows for easy customization of host configurations. It provides detailed logs and metrics for each host.

Which of the following can be found on the Status page of a selected host in Cloudera Manager? (Select all that apply). Basic system configuration. Health test results. Disk and CPU resources summary. Detailed information about the Host agent. Health history record. Role instances running on the host. Charts for each host instance in the cluster.

Which of the following types of process health checks are performed by Cloudera Manager? (Select all that apply). JVM memory usage. Network connectivity. Disk usage. CPU usage. Service uptime.

Which of the following resources can be monitored using Cloudera Manager? (Select all that apply). CPU usage. Memory usage. Network bandwidth. Disk I/O. Power consumption.

Which of the following monitoring options are available in Cloudera Manager? (Select all that apply). Host-level monitoring. Service-level monitoring. Role-level monitoring. Metric-level monitoring. User-level monitoring.

Which of the following are true about the Configuration page in Cloudera Manager? (Select all that apply). The Configuration page shows all the configuration properties of a service or host. The Configuration page allows you to edit configuration properties for a service or host. The Configuration page provides information about the status of a service or host. The Configuration page can be accessed from the Service or Host tabs in Cloudera Manager. The Configuration page displays a search box that can be used to filter the configuration properties. The Configuration page displays only the default configuration properties of a service or host.

Which of the following statements are true about Cloudera Manager's configuration management capabilities? (Select all that apply). Cloudera Manager allows you to set configuration properties for individual services or for the entire cluster. Cloudera Manager can automatically detect changes in the configuration of a service or host. Cloudera Manager allows you to revert configuration changes made to a service or host. Cloudera Manager can send alerts when configuration changes are made to a service or host. Cloudera Manager provides a read-only view of the configuration properties for a service or host. Cloudera Manager can be used to backup and restore the configuration of a service or host.

Which of the following statements are true about the Cloudera Manager Host Inspector tool? (Select all that apply). It is used to collect diagnostic information about a host in a Cloudera Manager-managed cluster. It can only be run from the command line interface (CLI). It can be used to identify performance bottlenecks on a specific host. It can collect information on system configuration, resource utilization, and network settings. It can only be run on hosts with Cloudera Manager agents installed. It can collect information on running services and their logs.

Which of the following types of information can be collected by the Cloudera Manager Host Inspector tool? (Select all that apply). System configuration information. Network settings information. Running services and their logs. Performance metrics of the entire cluster. Resource utilization information. Security-related information.

Which of the following statements are true about Cloudera Manager's activity monitoring capability? (Select all that apply). It can monitor Hive, Oozie, and streaming jobs. It can only monitor MapReduce jobs and not individual jobs. It can monitor YARN applications. It can display all possible statistics in the Activities list by default. It allows sorting of the Activities list by the contents of any column. It can filter the list of activities based on values of any of the metrics that are available. The Compare tab shows the performance of the selected job compared with the performance of other different jobs.

Which chart type can be used to view the resource usage of a cluster in Cloudera Manager's Activity Monitor?. Line Chart. Area Chart. Bar Chart. Pie Chart.

Which of the following statements is true regarding viewing jobs in a Pig, Oozie, or Hive activity in Cloudera Manager?. The jobs can be viewed from the Cloudera Manager Home page. The jobs can only be viewed from the respective application UI. The jobs can only be viewed using command-line tools. The jobs cannot be viewed in Cloudera Manager.

What is the purpose of the Task Distribution Chart in Cloudera Manager? Select all that apply. To monitor the performance of individual tasks in real-time. To visualize the distribution of running tasks across nodes. To track the utilization of resources by running tasks. To display the dependencies between different tasks in a workflow.

Which of the following statements is/are true regarding the TaskTracker table in Cloudera Manager?. The table shows the TaskTracker hosts that processed the tasks in the selected cell. The table shows the number of task attempts each host executed. The table allows you to view the status of the TaskTracker instances. Clicking on a TaskTracker hostname takes you to the Role Status page for that TaskTracker instance. The area above the TaskTracker table shows the type of task and range of data volume (or User CPUs) and duration times for the task attempts that fall within the cell.

What is the default time interval for Impala queries in Cloudera Manager's Impala Queries page?. 5 seconds. 30 seconds. 1 minute. 5 minutes.

What actions can be performed on the Impala Best Practices page in Cloudera Manager?. Adjust the time range to see data on queries run at different times. Click the charts to get more detail on individual queries. Use the filter box to adjust which data is shown on the page. Create a trigger based on any best practice. View a list of recommended best practices. Delete any best practice that is not being followed.

What is the primary purpose of the CM Query Details page in Cloudera Manager?. To view and analyze data collected by Cloudera Manager agents. To create and execute custom SQL queries against Cloudera Manager's internal database. To configure and manage monitoring and diagnostic settings for Cloudera Manager. To view and manage active alerts generated by Cloudera Manager.

Which of the following is a tool provided by Cloudera Manager for monitoring YARN applications?. YARN ResourceManager. YARN NodeManager. YARN ApplicationMaster. YARN Timeline Server. YARN Fair Scheduler.

Which of the following statements are true about viewing jobs in Cloudera Manager? Select all that apply. The jobs page shows information about completed, running, and failed jobs. The jobs page can only be accessed by the cluster administrator. The jobs page allows users to view job details, logs, and metrics. The jobs page can only display information for jobs that were submitted via Cloudera Manager.

Which configuration property needs to be set to enable YARN application monitoring in Cloudera Manager?. yarn.nodemanager.log-aggregation.enabled. yarn.log-aggregation-enable. yarn.log-aggregation-enable. yarn.application-monitoring.enable.

What is the purpose of enabling "YARN Application Diagnostic Data Collection" in Cloudera Manager? Select all that apply. To allow Cloudera Support to diagnose and troubleshoot issues related to YARN applications. To collect and send data on YARN application usage to Cloudera for analysis and optimization. To monitor YARN application performance and resource utilization. To automatically fix issues in YARN applications.

Which of the following components can be monitored for Spark applications using Cloudera Manager?. Spark driver and executor logs. Spark application CPU utilization. Spark application network traffic. Spark application memory usage.

Which of the following statements are true about managing Spark driver logs in Cloudera Manager? (Select two.). Cloudera Manager can manage Spark driver logs automatically. Spark driver logs can be managed manually by modifying configuration files. Spark driver logs are stored on the local file system of each worker node. Spark driver logs can be viewed and downloaded from Cloudera Manager's web interface.

When using the Web Application UI in Cloudera Manager to visualize Spark applications, which of the following statements is correct?. The Timeline tab shows the entire lifecycle of the Spark application. The Executors tab displays information about each executor in the cluster. The Storage tab shows detailed information about the Spark application's input and output data. The Event Timeline tab displays Spark events in a timeline format.

Which of the following is true regarding Cloudera Manager events? Select all that apply. Events are notifications triggered by specific actions or conditions in the Cloudera Manager environment. Events can be viewed in the Cloudera Manager UI. Events can only be sent via email. Events can be configured to trigger alerts and actions.

What are the categories of events supported by Cloudera Manager? Select all that apply. AUDIT_EVENT. CONFIG_EVENT. HEALTH_CHECK. SYSTEM_EVENT. LOG_MESSAGE.

What can be filtered for on the Events page in Cloudera Manager? Select all that apply. Services or role instances. Hosts. Users. Commands. Categories of events.

What are the available alert types in Cloudera Manager? Select all that apply. Service Alerts. Host Alerts. Event Alerts. Metric Alerts. Configuration Alerts.

Which of the following can be used as a notification method for Cloudera Manager alerts?. Email. SNMP. Slack. All of the above.

Which of the following is true about configuring alert thresholds in Cloudera Manager?. Alert thresholds can be set globally for all services in Cloudera Manager. Alert thresholds can be set for specific services and hosts. Alert thresholds cannot be customized. Alert thresholds are set automatically and cannot be modified.

Which of the following statements are true about configuring log events in Cloudera Manager? Select all that apply: Log event thresholds can be configured based on log level, host, service, and category. Log event thresholds can only be configured at the service level. Once a log event threshold is reached, an alert can be triggered. Log event thresholds can only be configured for HDFS and YARN services.

Which of the following is NOT a benefit of using Cloudera Manager for log event management? Centralized log management. Automatic log rotation. Real-time log monitoring. Log aggregation across multiple hosts.

Which of the following statements are true about triggers in Cloudera Manager?. Triggers can be used to automate certain actions based on specific events in Cloudera Manager. Triggers can only be created by using Cloudera Manager API. Triggers can only be used to send notifications via email. Triggers can be used to stop or start specific services or roles based on certain conditions.

Which of the following are the two types of conditions that can be used in triggers in Cloudera Manager?. Health-based conditions. Time-based conditions. Configuration-based conditions. Role-based conditions.

Which use cases are suitable for CM Triggers in Cloudera Manager?. Monitoring hardware failures. Alerting when a service is down. Notifying when a role reaches a specific memory utilization threshold. Alerting when a user exceeds their HDFS quota. All of the above.

What is the purpose of configuring a trigger threshold in Cloudera Manager?. To specify the minimum number of times an event must occur before triggering an alert. To define the maximum threshold value for a given metric. To determine the frequency at which an alert should be triggered. To set a threshold value for a metric or service and trigger an alert when that value is exceeded. None of the above.

Which of the following are the consequences of running out of memory on a cluster node in Cloudera Manager? Select all that apply: The node becomes unresponsive or crashes. The cluster may become unstable or slow. Data corruption may occur. The node automatically restarts to free up memory.

What is the purpose of memory triggers in Cloudera Manager? Select all that apply: To alert administrators when memory usage exceeds a certain threshold. To automatically adjust memory allocation on cluster nodes. To shut down services or nodes when memory usage is too high. To generate recommendations for optimizing memory usage.

Which of the following actions can be taken by Cloudera Manager when a CPU capacity trigger is violated?. Send an email notification to the cluster administrator. Restart the affected service or services. Scale out the cluster by adding new nodes. Adjust the CPU usage thresholds for the trigger.

What is a CPU capacity trigger in Cloudera Manager?. A configuration setting that limits the amount of CPU resources that can be allocated to a particular service. A tool for monitoring the CPU usage of nodes in a cluster and triggering actions when usage exceeds specified thresholds. A metric used to track the total amount of CPU capacity available in a cluster. A process for tuning the performance of CPU-intensive workloads running in a cluster.

What are some benefits of using Cloudera Manager for security auditing and lifecycle management in CDP Private Cloud Base? (Select all that apply). Automatic auditing of all security events in the cluster. Streamlined installation and configuration of security components. Integration with LDAP/AD for centralized user authentication and authorization. Ability to view detailed logs of user activity and cluster events. Automated remediation of security issues.

What types of audit events can be logged by Cloudera Manager in CDP Private Cloud Base? (Select all that apply). User logins and logouts. Configuration changes. File system events. Cluster service restarts. Network traffic monitoring.

Which of the following options are true regarding downloading audit events in Cloudera Manager? Select all that apply. Audit events can be downloaded in CSV format. The Audit events can be filtered by event type. Cloudera Manager does not allow downloading audit events for security reasons. Audit events can be downloaded from the Cloudera Manager home page.

What is the recommended method to download audit events in Cloudera Manager? Select one option. Use the Cloudera Manager API to download audit events. Use the Cloudera Manager user interface to download audit events. Use the Cloudera Manager command line interface to download audit events. Use a third-party tool to download audit events.

Which of the following statements about the CM Tsquery Language is TRUE?. Tsquery is used to create and manipulate time-series data. Tsquery is used to monitor and diagnose Cloudera Manager (CM) clusters. Tsquery can only be used to query time-series data stored in Prometheus. Tsquery is used to configure Cloudera Manager (CM) settings.

Which of the following operators is used in the CM Tsquery Language to group multiple conditions together?. AND. OR. NOT. ().

Which of the following statements are true about metric aggregation in Cloudera Manager? Select all that apply: Metric aggregation can be used to improve monitoring efficiency by reducing the amount of data that needs to be stored. Metric aggregation is a feature exclusive to Cloudera Manager Enterprise edition. Metric aggregation can be configured to aggregate metrics across clusters and services. Metric aggregation can only be used with predefined metrics and cannot be used to aggregate custom metrics.

What are the benefits of using metric aggregation in Cloudera Manager? Select all that apply: Reducing storage requirements for monitoring data. Improving monitoring efficiency by reducing the number of metrics to be monitored. Allowing for more precise monitoring of individual services and components. Increasing the number of data points available for analysis.

Which of the following are types of aggregate data that Cloudera Manager presents? Select all that apply. Per-second metrics. Per-minute metrics. Per-hour metrics. Per-day metrics. Per-week metrics.

Which of the following statements is true about the aggregation of metrics in Cloudera Manager? Select one: Cloudera Manager does not aggregate metrics from different hosts. Cloudera Manager aggregates metrics from different hosts into a single value. Cloudera Manager aggregates metrics from different services into a single value. Cloudera Manager only aggregates metrics for HDFS.

Which metrics can you filter in Cloudera Manager?. Service. Role. Host. Configuration. All of the above.

Which of the following are ways to access the Cloudera Manager log files? Select all that apply. Using the Cloudera Manager Admin Console. Using the command line interface (CLI) on the Cloudera Manager server. Using the command line interface (CLI) on the Cloudera Manager agent. Accessing the log files directly on the host machine.

Which of the following is NOT a type of log file generated by Cloudera Manager?. Service logs. Audit logs. Performance logs. User logs.

Which of the following statements is true about the log files in Cloudera Manager?. Log files are stored in a centralized location on the Cloudera Manager Server host only. Log files are rotated based on size and time. All log files are compressed and archived after rotation. Log files contain information about the status and health of the entire cluster.

Which of the following log files contains information about the Cloudera Manager Agent?. /var/log/messages. /var/log/cloudera-scm-agent/cloudera-scm-agent.log. /var/log/cloudera-scm-server/cloudera-scm-server.log. /var/log/hadoop-yarn/yarn-yarn-nodemanager-*.log.

Which of the following statements are true regarding the Cloudera Manager Server log? Select all that apply: The Cloudera Manager Server log contains information about the server's startup and shutdown processes. The Cloudera Manager Server log is stored in a file named cmf-server.log. The Cloudera Manager Server log can be viewed using the Cloudera Manager Admin Console. The Cloudera Manager Server log contains information about the health and status of the managed clusters.

What are the steps to view the Cloudera Manager Server log using the Cloudera Manager Admin Console? Select all that apply: Log in to the Cloudera Manager Admin Console. Click on the "Logs" tab in the top navigation bar. Select the "Cloudera Manager Server" role from the dropdown menu. Click on the "Actions" button and select "View Logs".

Which of the following options describes the recommended approach to viewing Cloudera Manager (CM) agent logs?. Use the "tail -f" command on the command line to view logs in real-time. Use the CM user interface to access the logs for a specific host or service. Use the CM API to programmatically retrieve logs for a specific time range. Manually navigate to the directory containing the logs and view them with a text editor.

What is the purpose of the "Log Aggregation" feature in Cloudera Manager?. To automatically aggregate logs from multiple hosts into a central location. To provide real-time monitoring of log files for critical errors or warnings. To automatically rotate and compress log files to save disk space. To provide an interface for searching and filtering log files.

What is the recommended threshold for the maximum percentage of disk space used by Cloudera Manager (CM) logs?. 50%. 60%. 70%. 80%.

What is the default retention period for Cloudera Manager (CM) logs?. 3 days. 7 days. 14 days. 30 days.

Which of the following tools can be used to monitor disk usage for Cloudera Manager (CM) logs?. df command. du command. Cloudera Manager UI. htop command.

Which of the following is NOT a category of reports available in Cloudera Manager?. User Activity. Service Activity. System Activity. Role Activity. Cluster Activity.

What is the purpose of the "Report Period" setting in Cloudera Manager?. To specify the time range for which the report data should be displayed. To specify the name of the report to be generated. To specify the format of the report output. To specify the recipient of the report.

Which of the following is a type of report available in Cloudera Manager?. Table Report. Line Report. Bar Report. Pie Report. Scatter Report.

What are the benefits of using the CM Directory Usage report in Cloudera Manager? Select all that apply: Provides information about inactive user accounts. Helps in identifying overutilized directories. Allows for analysis of user and group access patterns. Provides information about the amount of data stored in the directories.

Which of the following directories are included in the CM Directory Usage report in Cloudera Manager? Select all that apply: /tmp. /var/log/hadoop-yarn. /var/run/cloudera-scm-agent/process. /etc/hive/conf.

Which of the following statements are true about the Directory Usage Report in Cloudera Manager?. It provides information on the total number of files and directories in a given Hadoop file system. It helps to identify top users or applications that are utilizing the most space in HDFS. It provides real-time alerts for changes made to HDFS directories. It is a built-in report that requires no additional setup or configuration.

Which of the following actions can be performed using the Directory Usage Report in Cloudera Manager?. Deleting files and directories in HDFS. Modifying file permissions in HDFS. Identifying and investigating storage usage patterns in HDFS. Scheduling regular backups of HDFS directories.

Which of the following statements are true about disk usage reports in Cloudera Manager? Select all that apply: Disk usage reports can be generated at the cluster level. Disk usage reports can be generated for individual hosts. Disk usage reports can be generated for individual services. Disk usage reports can only be generated for HDFS.

What is the purpose of the "Filter by Mount Point" option in the disk usage report? Select all that apply: It allows you to view disk usage for only specific mount points. It allows you to exclude certain mount points from the report. It allows you to see which services are using the most disk space. It allows you to see how much disk space is being used by each file system.

What information can you obtain from the "Disk Usage by User Group or Directory" report in Cloudera Manager? Select all that apply: Total disk usage for each user. Total disk usage for each user group. Total disk usage for each directory. Percentage of disk usage for each user. Percentage of disk usage for each user group. Percentage of disk usage for each directory.

Which of the following actions can you perform using the "Disk Usage by User Group or Directory" report in Cloudera Manager? Select all that apply: Delete files and directories. Change ownership of files and directories. View files and directories for each user group or directory. Set disk usage quotas for each user group or directory. Monitor disk usage trends over time.

Which of the following reports can be generated using Cloudera Manager's Activity Monitor? Select all that apply. Application Reports. Query Reports. Service Reports. Host Reports.

What type of reports can be generated using Cloudera Manager's Activity Monitor for applications? Select all that apply. Resource Allocation Reports. Node Level Reports. Fair Scheduler Reports. Queue Reports.

Which of the following actions can be performed using the Cloudera Manager File Browser? Select all that apply. View the contents of a file. Edit the contents of a file. Download a file. Upload a file. Delete a file.

Which of the following statements about the Cloudera Manager File Browser are true? Select all that apply. The File Browser is only accessible to the Cloudera Manager Administrator. The File Browser can be used to manage files on hosts managed by Cloudera Manager. The File Browser can be used to manage files on hosts not managed by Cloudera Manager. The File Browser allows users to edit the contents of files. The File Browser allows users to view the contents of directories.

Which of the following statements is true about HDFS directory access permission reports in Cloudera Manager?. These reports show the access permissions for all directories in HDFS. These reports can be generated for a specific directory or for all directories in HDFS. These reports can be downloaded in CSV, JSON, and PDF formats. These reports can only be generated by the HDFS administrator.

What is the purpose of the "hdfs.directory.access.report.includePermission" configuration property in Cloudera Manager?. It specifies which users have access to a particular directory in HDFS. It controls whether or not HDFS directory access permission reports are generated. It specifies which HDFS directories are included in the permission reports. It sets the permission level required to generate HDFS directory access permission reports.

Which of the following statements are true about the diagnostic data collected by Cloudera Manager (CM) in a Cloudera Data Platform (CDP) Private Cloud Base cluster?. CM collects diagnostic data for all services in the cluster, including the operating system and network. The diagnostic data collected by CM is stored on the hosts running the services. CM collects configuration and metadata information along with the diagnostic data. CM collects performance data for the services in the cluster, but not for the underlying operating system and network.

Which of the following methods can be used to collect diagnostic data from a CDP Private Cloud Base cluster using Cloudera Manager (CM)?. Manually copying log files from the hosts running the services. Using the command-line interface (CLI) on the Cloudera Manager Server. Using the CM Diagnostic Bundle option in the CM Admin Console. Running a diagnostic script on each host in the cluster.

Which of the following types of sensitive information can be redacted from diagnostic bundles in Cloudera Manager?. IP addresses. Usernames. Passwords. Java stack traces.

What is the purpose of redacting sensitive information from diagnostic bundles in Cloudera Manager?. To protect confidential information. To improve system performance. To reduce the size of diagnostic bundles. To enable remote troubleshooting.

Which of the following commands can be used to manually trigger the collection and transfer of diagnostic data in Cloudera Manager?. sudo service cloudera-scm-agent hard_stop; sudo service cloudera-scm-agent start. sudo systemctl stop cloudera-scm-agent; sudo systemctl start cloudera-scm-agent. sudo systemctl restart cloudera-scm-agent. sudo systemctl start cloudera-scm-agent.

Which of the following actions can you perform with the diagnostic data collected in Cloudera Manager? (Select all that apply). Upload the data to a remote server for analysis. Download the data to your local machine for analysis. View the data directly in Cloudera Manager. Use the data to troubleshoot issues with your Cloudera environment. Use the data to generate reports on the health of your Cloudera environment.

What are the steps you should take to resolve the issue of the Cloudera Manager service not running due to an Out of Memory error? (Select all that apply). Restart the Cloudera Manager service using the command "sudo service cloudera-scm-server restart". Examine the heap dump file in the /tmp directory with file extension .hprof and file permission of 600. Delete the heap dump file in the /tmp directory to free up space. Increase the amount of memory allocated to the Cloudera Manager service. Check the Cloudera Manager Server log file at /var/log/cloudera-scm-server/cloudera-scm-server.log for a stacktrace with "java.lang.OutOfMemoryError" logged.

What are the possible reasons for the failure of the service cloudera-scm-server start command on a Cloudera Manager server, and what is the recommended solution for each? (Select all that apply). The server has been disconnected from the network. The server has insufficient disk space. The database has stopped responding or has shut down. The Java Virtual Machine has run out of memory. The configuration file for the Cloudera Manager server is missing or incorrect.

What are the possible solutions for the issue of logs including APPARENT DEADLOCK entries for c3p0 in Cloudera Manager? (Select all that apply). Ignore the log entries if system performance is not affected. Modify the timer triggers to stop the log entries from occurring. Increase the number of threads in the c3p0 pool to increase resources for task progress. Restart the Cloudera Manager service. Remove c3p0 from the Cloudera Manager configuration.

Which of the following steps can be taken to address the problem of the Finished status not displaying after clicking the Start button to start a service in Cloudera Manager? (Select all that apply). Check the network connectivity between the host and the server. Check the logs for the service for potential causes of the problem. Restart the agents on the hosts where the heartbeats are missing. Reinstall the Cloudera Manager server and agents on all affected hosts. Wait for a certain amount of time before checking the status again.

When starting a service in Cloudera Manager, what are possible reasons why the Finished status may not be displayed, and what are the corresponding solutions? (Select all that apply). Network connectivity issues caused the service start to fail. Subcommands failed resulting in errors in the log file. The service started successfully, but the Finished status did not update in the UI. The target port was already occupied, preventing the service start. The Cloudera Manager server was down during the service start.

Which of the following statements are true regarding the solution for starting services in Cloudera Manager when a port specified in the Configuration tab of the service is already being used in your cluster? (Select all that apply). The subcommands to start service components (such as JobTracker and one or more TaskTrackers) start successfully. The Finished status displays after clicking Start to start a service. An available port number needs to be entered in the port property (such as JobTracker port) in the Configuration tab of the service. The reason for the error messages is that the service components are not installed correctly. The error messages prevent the service from starting.

What are the possible solutions to address a job failure caused by no space left on device issue in Hadoop? (Select all that apply). Use a system monitoring tool such as Nagios to alert on disk space. Quickly check disk space across all systems. Check the % used column on the NameNode Live Nodes page. Drill down from the job, to the map or reduce, to the task attempt details to see which TaskTracker the task executed and failed on due to disk space. Increase the replication factor of Hadoop distributed file system.

Which of the following steps can you take to diagnose and fix SMTP errors when Cloudera Manager is not sending any alerts? (Select all that apply). Use the Send Test Alert link under Administration > Alerts. Change the Mail Server Protocol to smtp (or smtps). Change the Alerts: Mail Server TCP Port to 587 (or to 465 for SMTPS). Restart the Cloudera Manager Admin Console. Restart the Alert Publisher.

What information can be found in Cloudera Manager logs and events? (Select all that apply). Configuration changes made to the Cloudera Manager Admin Console. Metrics related to the performance of the Cloudera environment. Status updates on services and hosts in the Cloudera environment. Detailed debugging information for individual Cloudera components. Information on user logins and authentication events.

Report abuse

▲