1)We have concept of optimal path in RDMS, meaning when we execute the complex query to retrieve data, during the first time RDMS will take the optimal path and will use the same route/path during every execution of the query to retrieve data.
Do we have same concept in MAPREDUCE where it know the previous data nodes and blocks from which it fetched last time and go directly to that Data node
Every time it is a new query/job for Hadoop, where it goes to the Name node and get the latest available Data nodes and blocks to fetch the data, when we execute the job multiple times.
Sol: In hadoop Every time it is a new query/job for Hadoop, where it goes to the Name node and get the latest available Data nodes and blocks to fetch the data, when we execute the job multiple times.
2) Do we have RUN STATS concept or LOAD BALANCING the same as RUN STATS?
Sol: Any new blocks in the HDFS will be placed in the new data node, because it is the least utilized in terms of storage. The existing blocks from other nodes won't be automatically moved to the new node, the start-balancer.sh and stop-balancer.sh scripts have to run for balancing the blocks across the new and the old data nodes.
Please feel free to revert if you need any further help