Cluster Grid with Web and Semantic Services

Cluster / Grid with Web and Semantic Services Dr G Sudha Sadasivam Professor, CSE PSG College of Technology Coimbatore 641 004

Agenda • • • Web Services SOA Semantics Grid Architecture 3 rd Generation Grid Architecture Semantic Grid Cluster Architecture Hadoop Amazon Web Services Work at Grid and Cloud Computing Lab PSGCT ORGANISING A BIRTHDAY PARTY? ?

PRODUCTS AND SERVICES – A TRADITIONAL WAY OF DISCOVERING AND ACCESSING

INFORMATION SERVICES

1. Web Service A service is a set of actions that form a coherent whole from the point of view of service providers and service requesters - Arranging for a birthday party. Web services provide a standard means of interoperating between different software applications, running on a variety of platforms and/or frameworks in a transparent and loosely coupled manner A Web service is a software system designed • to support interoperable machine to machine interaction • has an interface described in a machine processable format (WSDL). • communication using standard SOAP messages, on HTTP • with an XML serialization in conjunction with other Web related std. • UDDI registry • identified by URI Web service is an entity that can be: • Described (using WSDL) • Published • Discovered • Invoked by a client W 3 C technology standardization process

Web Service Interactions

COMPONENTS • A Web service is an abstract notion that is implemented by a concrete agent. • Elements – The provider entity is the person or organization that provides an appropriate agent to implement a particular service. – A requester entity is a person or organization that wishes to make use of a provider entity's Web service. – Registry – to register the services • Web Service Discovery: – Before message exchange, the requester entity and the provider entity must first agree on both the semantics and the mechanics of the message exchange – The service description (WSD) (message formats, datatypes, transport protocols, and transport serialization formats) represents a contract governing the mechanics of interacting with a particular service. – The semantics represents a contract governing the meaning (consequence and purpose) of that interaction.

2. SOA • Aim: Alignment of Business needs with IT • Architectural style of building enterprise solutions based on services • SOA is a blueprint that governs creation, deployment, execution and management of reusable business services. • WSA is an instance of SOA (Architecture – independent of tech. ) • Services provide independent, loosely coupled, transparent, composable invocation of tasks in a standard way. • SOA separates functions into distinct units (services), which can be distributed over a network and can be combined and reused to create business applications. These services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services. • Guiding principles – Reusability, Open standards

Alignment of Business needs with IT

services

• Services created using an SOA and provided by an organisation’s IT should directly support the services that the organisation provides to its customers. (BP – IT) SO business delivers services to its customers SOA is a blueprint that governs creation, deployment, execution and management of reusable business services It aligns Business and Technology Human mediated service Self service System system delivery service Service Oriented architecture contract Legacy system New system Composite system

SOA roles • Business Role: SOA is viewed as a set of services that a business wants to expose to customers and clients. • Architectural Role: SOA is an architectural style which requires a service provider, requestor and a service description. It provides services that fosters modularity, encapsulation, loose coupling, separation of concerns, reuse, composable and single implementation. • Implementation Role: SOA is a complete programming model (process) with standards, tools, methods and techniques, technologies.

SOA suite Model and Capture business processes and policies Activity monitoring to gain real time information on BP SOA Apply runtime policies to services and govern them Deploy composite applications and to perform service level management Integrate the services using ESB and orchestrate the services into BP Develop, connect and bind services to build composite applications

Service • A service is a manager entity that consists of a collection of components that work together to deliver the business function (currency conversion/airline reservations) • A service maps to a business function but a component maps to business entities and the business rules that operate on them. • Bank teller application – components loan component, savings bank component (withdrawal / deposit), account manager (to create new accounts). – Service the interfaces of all components (group) can be composed and exposed as services creation of new accounts, withdrawal and deposit services and loan service.

SERVICES SUPPORTS HAS COMBINED BUSINESS GOALS SERVICE DESCRIPTION CHOREOGRAPHY DYNAMIC RECONFIGURATION

UI, Business processes, Service Layer, Component Layer, Object Layer PRESENTATION – portal for aggregation of contents to users Business Process Layer Automation logic Orchestration of services. Service layer – collection of units of work (interfaces) Processing logic Component layer – operations that are units of work. SLA Object layer / legacy – Messages for communication (Operational)

Terms in SOA • • • Services Service provider Service consumer (or Service requestor Service locator or service registry Service broker – passes service requests to one or more service providers.

SOA LIFE CYCLE CREATION OF SERVICES FROM EXISTING / new COMPONENTS Expose Business Drivers Incremental Iterative Consume Compose COMBINE EXISTING SERVICES USE SERVICES Consumer view : Service identification Service Categorisation Service exposure Choreography Qo. S Provider view : Component identification Component Specification Service realisation Service management Standards Implementation

Advantages • • • standardisation Faster time to market Operational efficiency and adaptability Agility to collaborate Continuous improvement Aligns business to IT Ease of introducing new technologies Return of Investment (ROI) Vendor diversity Services – encap, loose coupling, contract, reuse, composability, autonomous, dynamic, higher granularity

SERVICE ORIENTED ARCHITECTURE Business Process Service Management Transaction Transport layer Security Service Communication Protocol (ESB) Policy Service Registry Service Description

Problems in Web services (Point – Point) • Service consumers need to be modified whenever the service provider interface changes. (dynamic) • Every consumer should have a suitable protocol adapter for each provider it is connected to. (interoperability) ESB • ESB acts as a mediator that transforms, routes, notifies and augments information. • It provides virtualization of the enterprise resources. • The Enterprise Service Bus is an enterprise class messaging bus. • It has the following facilities: messaging infrastructure message transformation facility between consumer and provider Content based routing between service consumers and providers. Capability to convert transport protocols between consumer and provider.

SOA based Web services Business Process (BPEL) Management (WSManageability) Transport layer (HTTP, JMS, SMTP) Transaction (WSTransaction) Service Communication Protocol (SOAP) Security (WSSecurity) Service Description (XML, WSDL) Policy (WSPolicy) Service Registry (UDDI) Service

SAHANA PRESENTATION / UI Office Systems Responders Laptop/PDA/Cell Wired Channel Access Web Client Mobile Internet Match Person Org Camp Requests Shelter SMS Family Services Person Aids Place Alerts Search Vol Search Match BUSINESS PROCESSES Search procedures DDo. S and Load Balancing Missing person Org Reg Camps Reg Request Mgmt BUSINESS SERVICES OFFERRED BY SERVER GRID Shelter Reg Mobile

• Missing person’s registry with efficient search • Organisation registry with efficient match and volunteer coordination • Camps registry • Request management registry with inventory management and optimisation – search • Shelter registry • Messaging alerts • Damages registry • Grid management module to manage coordination efforts among districts and relief organisations • Bulletin board – user area

SOA – screen shots 1. Organisation Registry • New Organization Registration with the System • Maintaining details about each organization with unique ID • Updating Organization’s services

DESCRIPTION • When a Organization wants to provide service it must provide the Organization name, city, branch to the system • By Default, every Organization that registers for the first time has to provide a single service • On successful registration, an automatically generated Organization Id will be displayed to the Organization authority • To update the service provided, both Organization ID and password are validated • The various services are displayed in the form and from which Service provider have to select their additional service

NEW ORGANIZATION REGISTRATION: SERVICE PROVIDER ORG NAME CITY REGISTRATIO N SYSTEM BRANCH SERVICE ORGANIZATION DB UPDATION ORGANIZATION ID

ORGANIZATION’S SERVICE UPDATION SERVICE PROVIDER ORG ID AND PASSWORD REGISTRATION SYSTEM RECORD RETRIVAL AND VALIDATION RESULT SERVICES LIST SELECTED SERVICE UPDATION SERVICE INFROMATION UPDATED FORM

BUSINESS PROCESSES • Service Provider registers to the system • Service provider login validation • Services updating FORMS 3 X Forms • LOGIN XFORM • ORGANIZATION DETAILS XFORM • SERVICE UPDATED XFORM

BUSINESS PROCESS

LOG IN X FORM

ORGANISATION DETAILS X FORM & GETTING DETAILS XML

SERVICE UPDATED X FORM

SERVICE SELECTION XFORM

DATABASE RELATIONSHIPS

High Throughput Computing Distributed Computing, loosely coupled Disparate Autonomous heterogenous systems Computation intensive – Sharing , single adm High Performance Computing Tightly coupled, fine grain parallelism Homogenous Systems high computing power, short period Low latency communication Clusters P 2 P Mainly for file sharing Geographically dispersed peers Autonomous nodes Decentralised Resource sharing Close to each other, Usually homogenous Centralised control, cooperative working GRID System integration Parallel systems, multicore Divide and conquer synchronization Tightly Coupled CLOUD Heterogeneous systems, HTC VO – trust groups, dynamic, cross organisational Geographically dispersed Resource sharing Scientific, distribution of work among all resources Virtualisation Shared Memory Computing Heterogeneous systems , HPC On demand resource provisioning over Internet Data centric with grid backbone, utility value Elastic , Business, full utilization of resources Web Services Application integration Separation of concerns Data integration, interop Virtualisation Viewing a single system as multiple resources Multi tenancy Sharing a resource among multiple clients

Some Characteristics of Grids Numerous resources Owned by multiple organizations & individuals Connected by heterogeneous, multi level networks Different security requirements & policies Unreliable resources and environments Different resource management policies Resources are heterogeneous Geographically separated

Stages to using the Grid – Classical View write (code) to solve problem “compile” against middleware submit to Grid middleware security advertise Stage data accounting Steering and visualisation Deploy to resources Select resources

Technical capabilities • • Resource modeling Monitoring and notification Allocation Provisioning, life cycle management, and decommissioning • Accounting and auditing • security

G 2 Fabric layer: Provides the resources for shared access Connectivity layer: Core communication and authentication protocols Resource layer: Protocols for secure negotiations, initiation, monitoring control, accounting on individual resources. Collective Layer: Protocols and services to capture interactions among a collection of resources. Application Layer: User applications that operate within VO environment.

G 3 Services OGSA • Service based infrastructure for grid • Grid aims to integrate, virtualize, and manage resources and services within distributed, heterogeneous, dynamic “virtual organizations” • Standardization is critical to create interoperable, portable, secure robust, scalable and reusable components and systems • Goal is to standardize grid services by specifying set of standard interfaces. • Aims to develop a common , standard and open architecture for grid based applications. • Service oriented architecture, based the Open Grid Services Architecture (OGSA), addresses this need for standardization by defining a set of core capabilities and behaviors that address key concerns in Grid systems. • OGSA is based on Grid Service ( extension of web service).

• OGSA realizes the logical middle layer in terms of services, the interfaces these services expose, the individual and collective state of resources belonging to these services, and the interaction between these services within a service-oriented architecture (SOA). • The architecture is not layered, • Services are loosely coupled peers that, either work single or part of an interacting group of services,

OGSI • Requirements not met in Web services were implemented as Grid services confirming to OGSI specifications • OGSI specification defines – How grid service instances are named and referenced – How the interfaces and behaviors are common to all Grid services – How to specify additional interfaces, behaviors and extensions • GWSDL (Grid WSDL) • Introduces Service Data Elements (SDEs) • port. Type inheritance • Grid Service Handle (GSH) • Grid Service Reference (GSR) • Factory • Handle resolver • Notification • Service groups (light weight registries)

Service relationships

Grid vs Web services • Web Services • Messages exchange • Documents • No notion of “pointer” • Service orientation? • Grid Services • The architecture encourages everything to be exposed through an interface rather than being sent as a document • GSH is the “pointer” • Object orientation? (CORBA? ) • 2 level naming scheme – GSH and GSR • SDE – Web services static discovery vs SDE – dynamic • Instantiation and life cycle management factory

STATEFUL WEBSERVICE

1 2. CREATE 3

G 4 Grid WSRF OGSA services defined and implemented as Web Services

3. Semantic Web • information management – Keywords, – Statistical, – Natural Language, – Semantic Web • Semantic Web architecture – automated conversion and storage of unstructured text machine process able format – automatically extract and process the concepts and context in the database –uses intelligent techniques – Uses metadata to capture meaning of the information

To capture Knowledge • Metadata • Ontology – – formal specification of information – A network of concepts, relationships, and constraints that provide context for data and information as well as processes. – classes (concepts) and relationships (hierarchy) in the domain. It provides a shared understanding of the domain. – Ontology languages XML, RDF, OWL • Logic – – formal languages for representing knowledge with semantics – Reasoners to infer conclusions • Agents – Pieces of software that work autonomously and proactively – Eg search personalisation

Semantic Web Architecture

Architecture • Unicode – International encoding standard – Any language can be used on the web using one standardized form. • Uniform Resource Identifier (URI) – uniquely identify resources (e. g. , doc) – URL+URN • XML – language to write structured web documents with user defined vocabulary – To send documents across the Web • RDF – Data model (representation) of web objects – XML based syntax

• RDFS – Has modeling to organise web objects into hierarchies (taxonomies) – class, subclass, properties, domain and range restriction – Based on RDF – Used to write ontology • Logic Layer – Application specific declarative knowledge – RIF and SWRL • Proof layer – Deductive process – SPARQL can be used for querying ontologies and knowledge bases – SQL like • Trust layer – Users trust using Web services

RDF • triples subject-predicate-object in RDF • Joe Smith has homepage http: //www. example. org/~joe – http: //www. example. org/~joe/contact. rdf#joesmith (subject) is intended to identify Joe Smith – http: //xmlns. com/foaf/0. 1/homepage (predicate) – (object) is Joe's homepage http: //www. example. org/~joe/.

"Joe has family name Smith" RDF graph describing Joe Smith

RDFS for the company ( resource) http: //www. w 3. org/Organization/contact#Webify. Solutions identified by URI http: //www. w 3. org/Organization/contact#Webify. Solutions; Name is Webify Solutions, e mail address is info@webifysolutions. com, and phone number is 1 800 4 WEBIFY.

OWL • Classes named class, intersection classes, union classes, complement classes, restrictions, and enumerated classes • Properties – Object type – Data type – Property types • Functional • Inverse functional • Symmetric • Transitive • Individuals – instances of classes and properties relate them

• Bank Need for ontology in IT – Offers a number of services which can use the same data but with redundancy – New services can be added – but reuse existing data / functionality • An ontology-driven approach – can capture and represent its total product knowledge in a language neutral form – deploy the knowledge in a central repository (shared). – a single, unified view of data across its applications. – precise retrieval of information and seamless enterprise integration, – business processes and various data sources can map to each other through a common meta model. – shared ontology • • • eliminates point to point integration simplifies application integration reduces data redundancy and provides the same semantic meaning across applications, eases the bank's maintenance and upgrades.

Need for semantic web – WWW has vast amount of heterogenous information • Searching is based on contents • Semantic meaning attached to content items describes the information precisely • Relevancy of information extraction can be improved. – Provided services can be tagged with meaning; • Web based software agents can dynamically find these services on the fly and use them to your benefit or in collaboration with other services.

Need for semantics in SOA • In SOA service representations of the available services must be maintained. – Metadata to discover and organize services – Metadata to model and assemble services – metadata to encapsulate business logic for dynamic binding, – Metadata manage with metadata. • Ontology provide a very powerful and flexible way to aggregate, visualize, and normalize service metadata layer. • Ontology enhance service discovery, modeling, assembly, mediation, and semantic interoperability • Semantic technologies provide an abstraction layer above existing IT technologies, one that enables the bridging and interconnection of data, content, and processes across business and IT silos.

Semantics for Business • A business ontology is a formal specification of business concepts and their interrelationships that facilitates machine reasoning and inference. • A business ontology ties systems together using metadata, much as a database ties together discrete pieces of data. • Organizations can provide a single, unified view of data across their applications, • Allows for precise retrieval of information, • simplifies enterprise and SOA integration, • reduces data redundancy, and • Provides uniform semantic meaning across applications. • eases development, maintenance, and upgrades across the enterprise.

Grid semantics • The Grid’s vision sharing diverse resources in a flexible, coordinated and secure manner through dynamic formation and disbanding of virtual communities, strongly depends on metadata. Ad hoc expression and use of metadata causes chronic dependency on human intervention • The Semantic Grid is an extension of the Grid in which resource metadata is exposed and handled explicitly, and shared and managed via Grid protocols. • It exposes semantically rich information associated with grid resources to build more intelligent grid services • The layering of an explicit semantic infrastructure over the Grid Infrastructure leads to increased interoperability and greater flexibility. • Reference Architecture that extends OGSA (standardisation) to support the explicit handling of semantics, and defines the associated knowledge services to support a spectrum of service capabilities. • S OGSA defines a model (abstraction), the capabilities (what) and the mechanisms (how) for the Semantic Grid.

• Metadata – to label grid resources and entities with concepts (data file according to appln domain) • Rules and classification based reasoning can be used to generate new metadata from existing metadata. (VO membership) • S OGSA has – Model (elements and relationships) – Capabilities (services for the components) – Mechanisms (elements to deliver the service)

S OGSA entities and relationships • Grid entities (id in grid) • Knowledge entities (K entities) – Grid entities to operate on knowledge. • Semantic Bindings – association between grid and knowledge entities. • Semantic grid entities – entities subject to semantic bindings, or semantic bindings, knowledge entity.

S OGSA

• Fabric layer – resources are virtualised through Web services • Grid middleware with services – OGSA interact with one another. It deploys web services with port types through which resources are accessed • OGSA is extended with light weight semantics and knowledge services to support a spectrum of service capabilities • Top – application layer • Semantics of middleware and fabric layers are considered.

• Services – Semantic provisioning services • Knowledge provisioning services • Semantic binding provisioning services – Semantic aware grid services • Consume semantic bindings and take actions based on knowledge and metadata

Semantic aware authorisation service Subject – John Doe, object – resource Semantic bindings based on match Ontology service provides knowledge to understand semantic bindings

Hadoop What is Hadoop? It's a framework for running applications on large clusters of commodity hardware which produces huge data and to process it Apache Software Foundation Project Open source Amazon’s EC 2, Google alpha (0. 21) release available for download Hadoop Includes HDFS a distributed filesystem Map/Reduce DFS implements this programming model. It H is an offline computing engine Concept Moving computation is more efficient than moving large data

• Data intensive applications with Petabytes of data. • Web pages 20+ billion web pages x 20 KB = 400+ terabytes – One computer can read 30 35 MB/sec from disk ~four months to read the web – same problem with 1000 machines, < 10 mins

FACTS Single-thread performance doesn’t matter We have large problems and total throughput/price more important than peak performance Stuff Breaks – more reliability • If you have one server, it may stay up three years (1, 000 days) • If you have 10, 000 servers, expect to lose ten a day “Ultra-reliable” hardware doesn’t really help At large scales, super fancy reliable hardware still fails, albeit less often software still needs to be fault tolerant Commodity machines without fancy hardware give better price – performance ratio. DECISION : COMMODITY HARDWARE. DFS : HADOOP – REASONS? ? ? WHAT SOFTWARE MODEL? ? ? ?

HDFS Why? Seek vs Transfer BTree (Relational DBS) – operate at seek rate, log(N) seeks/access memory / stream based sort/merge flat files (Map. Reduce) – operate at transfer rate, log(N) transfers/sort Batch based

Characteristics • Fault tolerant, scalable, Efficient, reliable distributed storage system • Moving computation to place of data • Single cluster with computation and data. • Process huge amounts of data. • Scalable: store and process petabytes of data. • Economical

• Data Model – Data is organized into files and directories – Files are divided into uniform sized blocks and distributed across cluster nodes – Replicate blocks to handle hardware failure – Checksums of data for corruption detection and recovery – Expose block placement so that computes can be migrated to data • large streaming reads and small random reads

• Files are broken in to large blocks. – Typically 128 MB block size – Blocks are replicated for reliability – One replica on local node, another replica on a remote rack, Third replica on local rack, Additional replicas are randomly placed • Understands rack locality – Data placement exposed so that computation can be migrated to data • Client talks to both Name. Node and Data. Nodes – Data is not sent through the namenode, clients access data directly from Data. Node – Throughput of file system scales nearly linearly with the number of nodes.

Block Placement

Hadoop Cluster Architecture:

Components • DFS Master “Namenode” – Manages the file system namespace – Controls read/write access to files – Manages block replication – Checkpoints namespace and journals namespace changes for reliability Metadata of Name node in Memory – The entire metadata is in main memory – No demand paging of FS metadata Types of Metadata: List of files, file and chunk namespaces; list of blocks, location of replicas; file attributes etc.

DFS SLAVES or DATA NODES • Serve read/write requests from clients • Perform replication tasks upon instruction by namenode Data nodes act as: 1) A Block Server – Stores data in the local file system – Stores metadata of a block (e. g. CRC) – Serves data and metadata to Clients 2) Block Report: Periodically sends a report of all existing blocks to the Name. Node 3) Periodically sends heartbeat to Name. Node (detect node failures) 4) Facilitates Pipelining of Data (to other specified Data. Nodes)

• Map/Reduce Master “Jobtracker” – Accepts MR jobs submitted by users – Assigns Map and Reduce tasks to Tasktrackers – Monitors task and tasktracker status, re executes tasks upon failure • Map/Reduce Slaves “Tasktrackers” – Run Map and Reduce tasks upon instruction from the Jobtracker – Manage storage and transmission of intermediate output.

SECONDARY NAME NODE • Copies Fs. Image and Transaction Log from Name. Node to a temporary directory • Merges FSImage and Transaction Log into a new FSImage in temporary directory • Uploads new FSImage to the Name. Node – Transaction Log on Name. Node is purged

HDFS Architecture • Name. Node: filename, offset > block id, block > datanode • Data. Node: maps block > local disk • Secondary Name. Node: periodically merges edit logs Block is also called chunk

JOBTRACKER, TASKTACKER AND JOBCLIENT

Software Model ? ? ? • Parallel programming improves performance and efficiency. • In a parallel program, the processing is broken up into parts, each of which can be executed concurrently • Identify whether the problem can be parallelised (fib) • Matrix operations with independency

CALCULATING PI The area of the square, denoted As = (2 r)^2 or 4 r^2. The area of the circle, denoted Ac, is pi * r 2. • pi= 4 * No of pts on the circle / num of points on the square • Count the number of generated points that are both in the circle and in the square MAP • PI = 4 * r REDUCE • Restricted parallel programming model meant for large clusters – User implements Map() and Reduce()

WORD COUNT EXAMPLE

• File Hello World Bye World Hello Hadoop Good. Bye Hadoop • Map For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> • The second map emits: < Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1>

The output of the first combine: < Bye, 1> < Hello, 1> < World, 2> The output of the second combine: < Goodbye, 1> < Hadoop, 2> < Hello, 1> Thus the output of the job (reduce) is: < Bye, 1> < Goodbye, 1> < Hadoop, 2> < Hello, 2> < World, 2>

• Map() – Input <filename, file text> – Parses file and emits <word, count> pairs • eg. <”hello”, 1> • Reduce() – Sums all values for the same key and emits <word, Total. Count> • eg. <”hello”, (3 5 2 7)> => <”hello”, 17>

• File Hello World Bye World Hello Hadoop Good. Bye Hadoop • Map For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> • The second map emits: < Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1>

MR model • Map() – Process a key/value pair to generate intermediate key/value pairs • Reduce() – Merge all intermediate values associated with the same key • Users implement interface of two primary methods: 1. Map: (key 1, val 1) → (key 2, val 2) 2. Reduce: (key 2, [val 2]) → [val 3] • Map - clause group-by (for Key) of an aggregate function of SQL • Reduce - aggregate function (e. g. , average) that is computed over all the rows with the same group-by attribute (key).

Cloud need • ‘Era of tera’ – ever growing datasets, – Changing demands/loads – unpredictable traffic patterns, and – the demand for faster response times. • Elasticity – use and relinquish resources as per demand • Software applications should be internet accessible • Large scale applications – cloud provides large number of machines, when needed, distributes work among them, provisions new machines on failure, auto scale, relinquish machines when not needed

Advantages • Almost zero upfront infrastructure investment • Just in time Infrastructure • More efficient resource utilization • Usage based costing • Potential for shrinking the processing time • Less time for development Basis – automated elasticity on demand elastic nature Example – e ticketing application

AWS The Amazon Web Services (AWS) cloud provides a highly reliable and scalable infrastructure for deploying web scale solutions, with minimal support and administration costs, and good flexibility

• Amazon Elastic Compute Cloud (Amazon EC 2) is a web service that provides resizable compute capacity in the cloud. • Operating system, application software and associated configuration settings can be bundled in an Amazon Machine Image (AMI). • Scale up / down is done by provisioning / decommissioning multiple instances using simple web service calls • On Demand Instances / Reserve instances / Spot Instances • Amazon S 3 to retrieve/store input /output datasets. – store / retrieve large amounts of data as objects in buckets (containers) on the web using standard HTTP – Copies can be made in 14 locations using Cloud. Front • Amazon Simple Queue Service (Amazon SQS) is a reliable, highly scalable, distributed queue for storing messages as they travel between computers and application components .

• Amazon Simple. DB is a web service for real time lookup and simple querying of structured data • Amazon Relational Database Service (Amazon RDS) provides an easy way to setup, operate and scale a relational database in the cloud • On demand hadoop cluster distributed processing, automatic parallelization, and job scheduling • Amazon Elastic Map. Reduce provides a hosted Hadoop framework running on the web scale infrastructure of Amazon Elastic Compute Cloud • Amazon Virtual Private Cloud (Amazon VPC) extends corporate network into a private cloud contained within AWS

• Availability Zones are distinct locations engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. • Elastic IP addresses allocates a static IP address and programmatically assigns it to an instance. • Cloud. Watch can monitor an Amazon EC 2 instance for resource utilization, operational performance, and overall demand patterns. • Auto scaling feature to create Auto-scaling Group. • Incoming traffic can be distributed using elastic load balancing service. • Amazon Elastic Block Storage (EBS) volumes provide network attached persistent storage to Amazon EC 2 instances. • AWS offers payment and billing services. • Amazon Cloud. Front. provides a high performance, globally distributed content delivery system

Grep. The. Web Application

Cloud Services best practices • Design for failure and nothing will fail - design, implement and deploy for automated recovery from failure. • In AWS – Failover gracefully using Elastic IPs – Utilize multiple Availability Zones – Maintain an Amazon Machine Image – Utilize Amazon Cloud. Watch • Decouple the components – based on SOA design principle of the loosely coupled the components for scalability – Message queues: If one component fails the system will buffer the messages and get them processed when the component comes back up.

1) 2) 3) 4) SQS for decoupling and buffering Service interfaces for components AMI created Stateless applications

• Implement elasticity • Think parallel The beauty of the cloud shines when you combine elasticity and parallelization • Keep dynamic data closer to the compute and static data closer to the end-user

PSG Yahoo Grid and Cloud Computing Lab 2008 till date • 54 rack servers – SC 145 & Power. Edge 2950 • 40 end connectors • 10 client nodes • RHEL • Hadoop • Globus • Open. VZ • Xen

• • • Courses conducted – 10 Papers published – 11 Internship – 3 Placement – 3 Ph. D – 4 Conference talks 3

• An Efficient Approach to Task Scheduling in Computational Grids • Data Discovery in Grid using Content Based Searching Technique • P 2 P Information Retrieval Framework for Digital Library System using Hadoop DFS. • Integration of Xen and Hadoop framework • DNA sequencing using hadoop data grids • DNA sequencing in public clouds • Virtualisation – using Xen and Open VZ a comparison of performance • Grid Security – a tree based dynamic approach • Study of some existing scheduling algorithms • Grid Task Scheduling using PPSO • Content based Image Retrieval • Modification of fairshare scheduling in Hadoop • Two level scheduler for clouds • Hybrid Search using content based and semantic approaches