Hope you are doing well.
Please find below the solution related to your query.
Q.3.You observe that the number of spilled records from map tasks for exceeds the number of map output records. Your child heap size is 1 GB and your io.sort.mb
value is set to 100MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
Solution :- io.sort.mb is the total amount of buffer memory to use while sorting files, in megabytes and the default value is 100 and we can tune it further until we got the number of spilled records equals the number of map output records.
Q.16.You are running two Data Nodes with 2 TB storage each. You have added two Data Nodes of 3 TB each to meet your business needs. What will be the total HDFS storage available?
Solution :- The correct for this question is 10 TB, since the total storage size of all the 4 datanodes is 10 TB so we can use the full the storage for HDFS as well.
For eg -> You have 10 TB storage capacity and out of that 2 TB is already used by non HDFS data then you can use only 8 TB and if non HDFS data is 0KB then you can make a use of full storage i.e 10 TB
Q.19.On a cluster runningMapReduce v1 (MRv1), the value of the mapred.tasktracker.map.tasks.maximum configuration parameter in the mapred-site.xml file should be set to: The maximum number of Map tasks can run simultaneously on an individual node.
Solution :- Correct answer is the maximum number of Map tasks can run simultaneously on an individual node.
For eg -> In a machine there are 4 CPU so we can run only 4 mappers+reducers in that particular and same value we can set in mapred-site.xml file as well to execute the task.
Mapper + reducer tasks = No of CPUs in a machine.
Feel free to contact us in case you have any query.
Kindly share your feedback by clicking on either of the smiley's.
Please note if you are not happy with the response on this ticket, please escalate it to email@example.com.
We assure you that we will get back to you within 24 hours