Скачать презентацию VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Скачать презентацию VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam

98e2ad6d5e2555ef92c9193e4e17ed16.ppt

  • Количество слайдов: 15

VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Introduction • Physical machine can have a number of smaller virtual machines (VMs), each Introduction • Physical machine can have a number of smaller virtual machines (VMs), each running a separate operating system instance. • Challenges – – – partitioning of a machine concurrent execution of multiple operating systems Isolation of virtual machines from one another Support heterogeneity of applications Low performance overhead • Xen is a virtual machine monitor for x 86 that supports execution of multiple guest operating systems hypervisor, kernel and user space applications

Objective • Automation of creation and deletion of a virtual cluster for hosting Hadoop Objective • Automation of creation and deletion of a virtual cluster for hosting Hadoop using Xen • A large physical cluster can be simulated on few physical machines Steps • Input user configuration by editing configuration files. • Generates user specified number of VM running Hadoop. • Users can manage the Hadoop file system • Users can submit jobs for each physical machine.

Need for virtualisation • Ability to recover from software problems quickly by saving a Need for virtualisation • Ability to recover from software problems quickly by saving a copy of guest image. • High availability by relocating guests when a server machine in inoperable. • Dynamic load balancing by migrating guests from server machines. • Consolidation of many services in one physical machine and administer them independently in VM. • Usage of abundant computational power on the physical machine. Minimisation of cost. • Switch between applications on different OS using hypervisors.

HADOOP CLUSTER CONFIGURATION Host node is configured as master (NN) and also acts as HADOOP CLUSTER CONFIGURATION Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave

Master is the Host. OS which acts as job tracker/Name node. Slave is the Master is the Host. OS which acts as job tracker/Name node. Slave is the Guest. OS which acts as task tracker/Data node.

Steps in implementing • • Installation of Xen kernel Creation of Guest OS Configuration Steps in implementing • • Installation of Xen kernel Creation of Guest OS Configuration of Guest OS Installation of Java Development Kit Extraction and Configuration of Hadoop Cluster Creating OS image for new Guest Machines Creation and removal of other Virtual machines, copy the OS images

Automated Creation of a Hadoop Virtual cluster XML file has configuration details of new Automated Creation of a Hadoop Virtual cluster XML file has configuration details of new VM

Automated Shut down of Hadoop Virtual cluster Automated Shut down of Hadoop Virtual cluster

Advantages of automated virtualization in Hadoop 1. Effective isolation of the datanode from the Advantages of automated virtualization in Hadoop 1. Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable. 2. The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters. 3. The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)

Enhancements 1. Providing a graphical console for monitoring and managing virtual cluster. 2. Creation Enhancements 1. Providing a graphical console for monitoring and managing virtual cluster. 2. Creation and Migration of virtual machine for the purpose of load balancing. 3. Enabling snapshot of the virtual machine. For checkpointing 4. Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.

Performance of Physical vs Virtual clusters Performance of Physical vs Virtual clusters

Master as a Physical Node 7 Nodes Data nodes – 6 Virtual nodes Name Master as a Physical Node 7 Nodes Data nodes – 6 Virtual nodes Name node – 1 physical node

Master as a Virtual Node 7 Nodes Data nodes – 1 physical node + Master as a Virtual Node 7 Nodes Data nodes – 1 physical node + 5 Virtual nodes Name node – 1 virtual node

Performance with varying number of Virtual nodes Performance with varying number of Virtual nodes