Optimisation of Data Access in Grid Environment Darin

Скачать презентацию Optimisation of Data Access in Grid Environment Darin

6ddeffde4807b443f16b1cf357e8bde7.ppt

Количество слайдов: 35

Optimisation of Data Access in Grid Environment* Darin Nikolow 1 Łukasz Dutka 1 Piotr Nyczyk 1 Renata Słota 1 Jacek Kitowski 12 Mariusz Dziewierz 1 1 Institute of Computer Science - AGH 2 Academic Computer Centre CYFRONET - AGH University of Mining and Metallurgy, Cracow, Poland *Cross. Grid Project - Task 3. 4 Cracow Grid Workshop, Nov. 5 -6,

Outline Background l Bottom-top approach l Media management software l – middleware for existing HSM – dedicated VTSS Local component-expert systems l Global policy for migration/replication l FOR MORE INFO. . . http: //www. icsr. agh. edu. pl/

Motivation Big and growing stuff of data l Multimedia database systems (applications - medical, educational, virtual reality, virtual laboratories, digital libraries, advanced simulations, . . . ) l Solution: Tertiary Storage Systems (TSS) = Media Libraries + Management Software l Examples of existing TSS: l • HPSS, Data. Cutter, APRIL, Condor, Omni. Store, Uni. Tree, . . . l Possible directions – Data access time estimation system - efficient usage – Data distribution and grid implementation - large scale experiments – Expert system for data management

Background l PARMED Project Client Site 1 Video Server (Uni. of Klagenfurt - Uni. of Mining & Metall. Cracow) Video Server d 2 Site 2 Client Storage Server – to support physicians with telematic services for: • long distance collaboration of medical centers, • medical teleeducation • case archives d 1 a 2 a 1 r 2 r 1 Meta-Database WAN a 3 Client r 3 Site 3 Storage Server Client d 3 Site 4 Disk Server

Media Management Software and its usage in X# Darin Nikolow darin@uci. agh. edu. pl

Motivation l Main purpose of the developed TSS: efficient index-based retrieving of video fragments (instead of file fragments) – specific requirements for frequent data reading • startup latency • transfer time • minimal transfer rate > video bitrate l Two prototypes proposed and benchmarked – middleware layer for existing HSM – dedicated TSS l The developed systems are of general use -> possible grid implementations

Multimedia Storage and Retrieval System (MMSRS) l Middleware layer on HSM – use existing software (Uni. Tree HSM) l Consists of: Requirements l – reduce latency (start-up delay), i. e. reduce file granularity – file fragmentation (subfiles) l Implementation – splitting files into pieces of similar size – Automated Media Library – Uni. Tree HSM managing system – MPEG extension for HSM (MEH) l MEH receives the name of video file and the frame range - start/end frames

Video Tertiary Storage System (VTSS) l Repository Daemon REPD – keeps repository information l Tertiary File Manager Daemon TFMD – manages: filedb - tape ident and startup position of the fragment l Dedicated TSS tapedb l Client requests to VTSS can be of the following kinds: information about – write a new file to VTSS, read a file fragment from VTSS, delete a file tape usage from VTSS. l The fragment range is defined in the frame units l Two daemons implemented in C using Unix sockets

MMSRS and VTSS performance l Hardware (AML Quantum|ATL) – ATL 4/52 (DLT 2000) – ATL 7100 (DLT 7000) – HP D-class server (with Uni. Tree HSM) l Data – 790 MB MPEG 1 file with B=0. 4 MB/s bitrate (33 min. ) – subfile for MMSRS - 16 MB (8, 16, 32 MB tested) • as short as possible to keep reproducing smooth (low latency) • “optimal” subfile length depends on – positioning time – drive transfer rate – bitrate of the video file

Benchmarks l Startup latency - time elapsed from issuing the request to receiving the first byte l Transfer time - time from receiving the first byte till the end of transmission l Minimal rate - minimal transfer rate experienced by a client with endless buffer (should be greater than the bitrate of the video stream to have smooth

System performance for the whole video file transfer (DLT 2000)

Minimal transfer rate VTSS (DLT 2000) MMSRS (DLT 2000) For DLT 2000: – T = 10 GB – N = 64 – Br = 0. 4 MB/s Qdt = 400 s For DLT 7000: – T = 35 GB – N = 52 – Br = 0. 4 MB/s Qdt = 1723 s VTSS (DLT 7000)

Access Time Estimation: Motivation for X# l Retrieving a file from TSS could last few seconds or few hours l User’s satisfaction increases when the access time of data is known (e. g. user waiting to watch selected video; administrator recovering from backup) l Efficient use of storage resources in Grid environment (data replication subsystem)

Access Time Estimation: Approaches l Open TSS approach • source code changes • will be used as experimental platform l Black Box TSS approach for existing HSMs in X# sites • retrieving TSS’s state info via its native tools and available internal files

Access Time Estimation Black Box TSS Approach events collecting TSS Monitor update [4] TSS state [5] databases logs TSS fileid [9] Simulator conf. files ETA [6] Monitoring tools fileid [2] data [10] Disk cache queue state [3] Request Monitor & Proxyfeedback [12] TSS Needed info by Simulator: ðnr of drives ðtape labels ðmedia types ðposition of file in media ðnr of requests ð. . . data [11] fileid [8]ETA [7] Client fileid ETA? [1]

Conclusions MMSRS and VTSS more efficient than standard Uni. Tree HSM l MMSRS efficient enough to be used as a middleware for existing HSM of Uni. Tree type (in X# sites) l Proposed measurements could be used for: l – building more sophisticated distributed storage systems (faster access to files stored in TSS) – building access time estimation subsystem l Access time estimation subsystem --->>> an information provider for X# replication and migration of data http: //www. icsr. agh. edu. pl/

Basics of Component-Expert Technology and its usage in X# Łukasz Dutka dutka@agh. edu. pl

Classical component strategy

Component-expert strategy

Component structure

Component header structure

Structure of component code

Call-Environment Describe state of the call place l Describe call place requirements l Caries information about user or programmer wishes l Expert system processes Call-Environment and finds best component for given Call-Environment l

Expert Subsystem l Rule-based expert system l Typical rule looks like If log-expr Then action 1 Else action 2 l The rules describe what is meant by: The best component for given Call. Environment l Expert system logs calls and stores deduction results for further analysis

Profits from Component-Expert technology l l l l Dynamic expanding system possibility Ease of solving new problems Minimising programmer responsibility for component choice Ease of programming in heterogeneous environment Maximal reusable of components Internal simplicity of components code Increase efficiency of programming process

Component-Expert Technology for X# Task 3. 4

Basic analysis of Data-access problems in X# l l l l l Different data set types Huge data files Distributed environment Long distance connections Mission critical applications Heterogeneous data storing systems Heterogeneous computing systems Open system Unpredictable file types

Basic connection diagram

Sequence Diagram

Example of Component-Expert technology usage for data access in X# l Sample Attributes – User ID – Computing Node ID – Preferred replica localisation – Required throughput – Application purpose – Data sharing – Critical level – Replica expiration. . . l Example of local decisions – Devices choosing (according to availability and type) – Storing format (blocks, multimedia streams, . . . ) – Available delivering performance (network, storage devices, . . ) –. . . And much more. . .

System Management for Migration/Replication Strategies (2/2) In cooperation with other projects l High-level control system (e. g. cooperating with LDAP) l Two possible realizations l – heuristic reinforcement learning based on heuristic strategies for migration/replication and system state – classical rule-based expert system

Conclusions l Some elements have been defined and implemented l Working on higher level structure and cooperation with other X# modules and services