98e2ad6d5e2555ef92c9193e4e17ed16.ppt
- Количество слайдов: 15
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT
Introduction • Physical machine can have a number of smaller virtual machines (VMs), each running a separate operating system instance. • Challenges – – – partitioning of a machine concurrent execution of multiple operating systems Isolation of virtual machines from one another Support heterogeneity of applications Low performance overhead • Xen is a virtual machine monitor for x 86 that supports execution of multiple guest operating systems hypervisor, kernel and user space applications
Objective • Automation of creation and deletion of a virtual cluster for hosting Hadoop using Xen • A large physical cluster can be simulated on few physical machines Steps • Input user configuration by editing configuration files. • Generates user specified number of VM running Hadoop. • Users can manage the Hadoop file system • Users can submit jobs for each physical machine.
Need for virtualisation • Ability to recover from software problems quickly by saving a copy of guest image. • High availability by relocating guests when a server machine in inoperable. • Dynamic load balancing by migrating guests from server machines. • Consolidation of many services in one physical machine and administer them independently in VM. • Usage of abundant computational power on the physical machine. Minimisation of cost. • Switch between applications on different OS using hypervisors.
HADOOP CLUSTER CONFIGURATION Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave
Master is the Host. OS which acts as job tracker/Name node. Slave is the Guest. OS which acts as task tracker/Data node.
Steps in implementing • • Installation of Xen kernel Creation of Guest OS Configuration of Guest OS Installation of Java Development Kit Extraction and Configuration of Hadoop Cluster Creating OS image for new Guest Machines Creation and removal of other Virtual machines, copy the OS images
Automated Creation of a Hadoop Virtual cluster XML file has configuration details of new VM
Automated Shut down of Hadoop Virtual cluster
Advantages of automated virtualization in Hadoop 1. Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable. 2. The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters. 3. The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)
Enhancements 1. Providing a graphical console for monitoring and managing virtual cluster. 2. Creation and Migration of virtual machine for the purpose of load balancing. 3. Enabling snapshot of the virtual machine. For checkpointing 4. Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.
Performance of Physical vs Virtual clusters
Master as a Physical Node 7 Nodes Data nodes – 6 Virtual nodes Name node – 1 physical node
Master as a Virtual Node 7 Nodes Data nodes – 1 physical node + 5 Virtual nodes Name node – 1 virtual node
Performance with varying number of Virtual nodes