Advanced I O Techniques for Efficient and Highly Available

Advanced I/O Techniques for Efficient and Highly Available Process Crash Recovery Protocols Thesis Presentation Jason Cornwell 03/15/2011

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Computing Intensive Applications

Network Centric Services

Recent Advances

Motivation & Goals Demand for more computing power and high-bandwidth network connections Advances in Microprocessors and Networks Parallel Computing Performance and Scalability Reliability and Availability Simplicity and Accessibility

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Reliability Problems Large numbers of CPUs, Memory Modules, Hard Disk Drives, Network Interfaces, Network Switches Low Mean-Time-To-Failure (MTTF) and/or High Failure-In-Time (FIT)

Classification of Failure • Transient Failure – Power glitch – System patch and reboot – ECC trap • Partial “Permanent” Failure – Disk failure – Partial network failure • Wholesale “Permanent” Failure – Total hardware failure – Natural disaster

Availability Problems Large numbers Processes, Threads, Software Barriers, Busy Waiting Temporarily Unresponsive and/or Unavailable

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Possible Solutions • Transient Failure – Restart/replay/resume on the same node – Task-migration is possible • Permanent Partial Failure – Rebalance the workload on surviving nodes – Partial task-migration is needed • Permanent Wholesale Failure – Reconfigure the applications and services – Massive task-migration to new platform

Checkpointing • Common feature in high-performance computing (HPC) platforms • Saves the execution state • Application or system-level • Mechanism for task migration

Application vs System Level • Application-level Recovery Point – Developed application specific – Generally smaller footprint – Data accessiblity restrictions • Kernel-level Recovery Point – Snapshot processes – Full resource restoration – Flexibility due to system level preemption

Berkeley Labs Checkpoint/Restart • • • System-level Kernel-module Checkpoint creation implemented Process recovery implemented Linked to BLCR libraries at execution Stores checkpoint data locally (stack, heap, registers, signals, etc. )

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Contribution • Enhanced BLCR performance through latency tolerant technique • Increased BLCR availability through novel checkpoint creation technique

I/O Optimization • Avoided extreme modification to BLCR • Reduce the disk latency of checkpoint creation • Implemented a caching technique • Improved I/O performance 4 -fold or more • System overhead less than 300 KB in experimental test results

Checkpoint Caching • Buffer used as temporary storage • Storage block flushed in large volume • Trade-off between resource consumption and improved I/O efficiency cr_copy(chkpt. Data, count) if(chkpt. Buf is NULL) kmalloc size of count for chkpt. Buf space; copy chkpt. Data into chkpt. Buf; else kmalloc size of count + chkpt. Buf size for temp. Buf space; copy chkpt. Buf into temp. Buf; krealloc chkpt. Buf for its expanded size; memmove temp. Buf into chkpt. Buf; kfree memory for temp. Buf; end if

Optimized Write Operation

Remote Checkpoint • BLCR is limited to local disk storage • Remote checkpoint offers off-site storage option • Uses sockets to transmit data • Needs predefined destination • Outperforms BLCR in some experimental tests

Remote Checkpoint Server • Single thread daemon • Used GCC compiler • Stores the recovery point external to the client node • Could be ported to Microsoft derivative while(true) create socket; bind to address; listen for incoming connections; wait for client to connect; create file descriptor; while(data buffered received) write checkpoint data; close file descriptor; close socket;

Modified Write Operation • TCP packets • MTU must be reached before delivery • Only modification is to the write operation of BLCR if(remote chkpt) if(socket is NULL) create socket; establish connection, if handshake fails break and perform the original_chkpt; end if package checkpoint data; send data message; end if if(original_chkpt) original BLCR write operation; end if

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Design I/O Optimization Write write(chkpt. Data, count) if(chkpt. Buf has space for the incoming chkpt. Data) cr_copy(ckpt. Data, count); else vfs_write(chkpt. Buf); vfs_write(chkpt. Data); kfree(chkpt. Buf); end if Remote Checkpoint Write

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Experimental Setup I/O Optimization Remote Checkpoint • • Dell Workstation, 3. 06 GHz Intel Pentium 4, 1 GB Memory, 5, 400 RPM Hard Disk, Linux 2. 6 BLCR Implementation Optimized BLCR (O-BLCR) Implementation • • Dell Power. Edge 700, 2. 80 GHz Dual-processor Intel Pentium 4, 3 GB Memory, 5, 400 RPM Hard Disk, Linux 2. 6 Dell Workstation, 3. 06 GHz Intel Pentium 4, 1 GB Memory, 5, 400 RPM Hard Disk, Linux 2. 6 BLCR Implementation BLCR with NFS (BLCR+NFS) BLCR with our Remote Checkpoint Technique (BLCR+R)

Benchmarks Resource Utilization Benchmark • • Memory I/O TSP Program CPU High Low AES High Low Medium GE Low High Medium NP-Complete HC Data Encryption Linear Equation Solver File Compression

I/O Optimization Results

Remote Checkpoint Results

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Conclusion • Minimal modification to BLCR • I/O optimization technique reduced the write latency of BLCR • Remote checkpoint increases BLCR availability with new feature • These techniques should be deployed into the foundation of BLCR source code

Agenda • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work

Future Work • Server authentication protocol • Data packet encryption • Automated process load balancing

Questions