Below are few of the FAQ's we have received related to Big Data and Hadoop Quiz.
1.Assume that there are 50 nodes in your Hadoop cluster with a total of 200 TB (4 TB per node) of raw disk space allocated HDFS storage. Assuming Hadoop's default configuration, how much data will you be able to store?
According to the question,
We have total number of nodes = 50.
Disk Space of Each node = 4 TB.
So total disk space = 200TB
Considering Hadoop default configuration (Mentioned in the Question) has the replication factor of 3,
data that can be stored by considering the replication factor = 200/3 = 66 TB
2.If there is file of 100 TB when we copy the file to HDFS system..
step1; will file be copied to name node first
Step2: Divide the data into blocks, store meta data and then move it to available blocks in datanodes.
while copying itself data will be directly pushed to available blocks in datanodes
and only meta data is stored in name node?
while copying itself data will be directly pushed to available blocks in data nodes
and only meta data is stored in name node.
3.You need to move a file titled “weblogs” into HDFS. When you try to copy the file, you can’t. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS
In HDFS data is split into blocks and distributed across multiple nodes in the cluster. Each block is typically 64Mb or 128Mb in size. Each block is replicated multiple times. Default is to replicate each block three times. Replicas are stored on different nodes. HDFS utilizes the local file system to store each HDFS block as a separate file. HDFS Block size can not be compared with the traditional file system block size.
4.You use the hadoop fs –put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this life?
If you are coping the files to HDFS, The another user can able to see the how much data is copied over the cluster.
If you are writing 300 MB of the data and block size is 64 MB, after completion of the 200 MB data writing, the another use can able to see the 200 MB of the data in a cluster.
Please feel free to revert if you need any further help.