1db8e8494f71a4aafb892b50c4730f29.ppt
- Количество слайдов: 41
IBM Protec. TIER Deduplication Solutions Stanislav Dzúrik IBM FTSS Storage stanislav_ dzurik@sk. ibm. com
got data? too much And not enough ( blank ) to store it all? Time Money People Floor Space Electricity Air Conditioning Protect More. Store Less. ®
The tidal wave of data continues … a. The amount of digital information continues to grow exponentially b. And we need to keep more of it, longer c. And the costs of losing data are increasingly unacceptable a. Lost revenues b. Lost customer confidence c. Embarrassment in the market d. Fines from contracts, government agencies e. CEO and CFO could go to jail d. But budgets are not increasing 2005 We Need to do More with Less, and we need to do it smarter 2006 2007 2008 2009 2010 Data created and copied is expected to grow at 48% CAGR through 2010 Protect More. Store Less. ® Source: Various external consultant reports
Survey - what are your two biggest storage pain points? * The. Info. Pro Storage Study: F 1000 Sample. n=149. Other n=14. *Multiple responses recorded Protect More. Store Less. ®
Storage efficiency strategies and best practices a. Stop storing so much a. Move data to the right place a. Store more with what’s on the floor Protect More. Store Less. ®
A set of essential technologies enables storage efficiency a. Stop storing so much a. Data Compression b. Data Deduplication a. Move data to the right place a. Automated Tiering b. Automated Data Migration a. Store more with what’s on the floor a. Storage Virtualization b. Thin Provisioning Protect More. Store Less. ®
The pressures on backup administrators are growing More new data coming Backup takes longer Growth Backup Manage Recover Can’t buy more storage Recovery takes longer Protect More. Store Less. ®
Using the right balance of high density tape and high performance disk will help. . . a. Long Term Retention a. Cost effective capacity b. Removable & transportable b. Compliance a. Meet financial & regulatory requirements b. Data encryption, WORM a. Short Term Retention a. Use disk for daily backup & restore operations b. Performance a. Fast backups b. Even faster restores c. Meet “backup windows” Protect More. Store Less. ®
Compression and Deduplication use less physical storage a. Store data more efficiently b. Lower Operating Expenses: Power, cooling, floor space c. Keep more data online for analytics and fast restores Protect More. Store Less. ®
And data deduplication is the key to using more disk more cost effectively! Protect More. Store Less. ®
Data Deduplication Overview
Deduplication Architectures Storage Devices Server Client LAN Client side a. Reduce load on server • Reduces bandwidth on LAN • Adds load to client • No cross correlation among multiple clients SAN Server side a. Allows cross correlation of data among multiple clients • Adds load to server Protect More. Store Less. ® Block Storage Device a. Transparent to clients and servers • Reduces load on server and client • Adds load to storage device • No file or format awareness
Data Deduplication Process (simplified) Assume a Data-Object or -Stream as Subject for deduplication Data Object / Stream Data Object is split in Chunks (fixed or variable size) For each Chunk an identity characteristic is determined A B C D A E F F D B A F A B C D Identical Chunks E Duplicate chunks are identified a. Identical Chunks are referenced with pointers, references. • Non-identical chunks or single instances are effectively stored • Compression may be performed in addition. F Required Disk-Cache is reduced Protect More. Store Less. ®
Methods for Data Chunking Data Object / Stream 1. File based a. b. One chunk is one file, most appropriate for file systems E. g. TSM Incremental Backup forever helps eliminate redundant data a. Fixed block a. b. Data object is split into fixed blocks Used by block storage devices b. Format Aware a. b. Understands explicit data formats and chunk data object according to format Example: breaking a Power. Point deck into separate slides c. Format agnostic a. • Chunking is based on algorithm that looks for logical breaks or similar elements within a data object Chunking method influences deduplication ratio Protect More. Store Less. ®
Method for Determining Duplicates A B C D A E F F D 1. Hashing a. b. Computes a hash (MD-5, SHA-256) for each data chunk Compares hash with all hash of existing data c. Identical hash means most likely identical data d. Potential (small) Risk of Hash Collisions: identical hash and non identical data e. Must be prevented through secondary comparison (additional metadata, second hash method, binary comparison) A B C D E a. Binary Comparison a. Compares all bits of similar chunks – Delta Differencing Computes a “delta” between two “similar” chunks of data where one chunk is the baseline and the second is the delta o Since each delta is unique there is no possibility of collision o To reconstruct the original chunk the delta(s) have to be re-applied to the baseline chunk o Protect More. Store Less. ® F
In-Line Deduplication a. Data is deduplicated before it is actually stored b. Deduplication is performed as data flows into the secondary storage system Bac kup Ded upli cati on VTL Primar y Storag e Secon dary Storag e a. Advantages a. Processes data once, eliminates additional post-processing tasks b. Disadvantages a. b. CPU intensive deduplication process can create performance bottleneck One process per I/O stream Protect More. Store Less. ®
Out-Band Deduplication (Post-Processing) a. Data is first stored and deduplicated. Ded the background in Bac kup Primar y Storag e Secon dary Storag e upli cati on Secon dary Storag e VTL a. Advantages a. b. c. De-duplication CPU overhead no longer affects backup window Supports multiple I/O streams Potentially faster restore for first version (not deduplicated) b. Disadvantages Data is written, read and written – thus more I/O intensive Deduplication window must be coordinated with backup window as it take typically longer than in-line processing c. Requires larger secondary storage because first version is not deduplicated a. b. Protect More. Store Less. ®
3 × Deduplication in the IBM Portfolio Tape File LUN TSM API Protec. TIER TS 7650 G A-SIS N series Gateway Protect More. Store Less. ® TSM R 6
Protec. TIER Overview
Protect More. Store Less. ® Protec. TIER reduces the required backup disk capacity by up to 25 times! Protect More. Store Less. ®
IBM Protec. TIER Deduplication Innovation and Leadership 2003 2004 6 Ph. Ds begin researching massively scalable deduplication algorithms 2005 2006 First Deduplication Virtual Tape Library deployed into production First non-hash deduplication algorithm developed, designed for 100% data integrity 2007 2008 2009 First single node system to store over 1 PB of deduplicated data 2010 Fastest single node inline The only “true” deduplication enterprise-class solution deduplication solution on the IBM acquires market today First Diligent Deduplication solution for First to deliver VTL First true clustered System z IBM’s first solutions for both Open system with Global midrange solution and Mainframe Deduplication released environments 2011 First to deliver Many-to-Many replication Fastest restore speed – up to 2800 MB/sec! a. Installed in all major industries a. Over 1, 400 Protec. TIER systems sold to date b. Production systems range in size from 5 TB to over 700 TB c. Over 90 PB of physical disk capacity behind Protec. TIER servers in production protecting thousands of PBs of backup data Protect More. Store Less. ®
IBM’s Virtual Tape De-duplication SW Products a. Protec. TIER VT is a scalable and robust virtual tape solution that emulates tape libraries, enabling existing backup applications to send data to the Protec. TIER disk-based platform, rather than directly to tape. a. Hyper. Factor is a revolutionary de-duplication solution which eliminates redundant data, enabling customers to increase their effective capacity by up to 25 times. Protec. TIER is powered by Hyper. Factor and can radically reduce both physical disk capacity and total storage costs. Protect More. Store Less. ®
How Protec. TIER works Repository New Data Stream Hyper. Factor™ Memory Resident Index Protec. TIER™ Server Backup Servers a. Only 4 GB needed todeduplication a. Backup with Inline map b. b. Up of physical disk! server or 1 PB to 1400 MB/sec per 2000 MB/sec with 2 node cluster! “Filtered” data Protect More. Store Less. ®
Protec. TIER Deduplication Operation and Results Example a. Backup application writes data to Protec. TIER as it would to tape b. Only unique data is stored, existing duplicate data is referenced c. When data objects expire, references are removed and free space is reclaimed and reused Backup Amount Dedupe Event Received Stored Ratio First Full Backup 1 TB 250 GB 4: 1 Incremental Backup 100 GB 10 GB 4. 2: 1 Incremental Backup 100 GB 10 GB 4. 4: 1 1 2 3 4 5 Second Full Backup 1 TB 10 GB 7. 8: 1 Incremental Backup 100 GB 10 GB 8: 1 Third Full Backup 1 TB 10 GB 11: 1 A B C D E F G H I J After two months. . . 7. 8 TB 350 GB 22: 1 Protect More. Store Less. ®
Storage Impact from Protec. TIER Deduplication Represented capacity Master Server Backup Server Store up to Protec. TIER Server Physical capacity 25 times backup data on given physical storage capacity Protect More. Store Less. ®
Significantly Reduces Replication Bandwidth Primary Site Represented capacity Backup Server Protec. TIER Gateway Physical capacity Backup Server IP-based WAN link Deduplication enables a large amounts of data to be replicated with significantly less bandwidth Secondary Site Backup Server Protec. TIER Gateway Physical capacity Virtual cartridges can be cloned to tape at DR site Tape library Protect More. Store Less. ®
Protec. TIER Many-to-One Replication Overview Up to 12 Branch Offices (spokes): Gateways and/or Appliances 1 target (hub): Appliance, Gateway, single or two-node cluster IP based NR links Backup Server Protec. TIER Gateway Physical capacity Central / DR Site Protect More. Store Less. ® Virtual cartridges can be cloned to tape by the Main. Site B/U server Tape library
Protec. TIER Many-to-Many Native Replication Grid Site A Up to 4 hubs in a grid Site B Site C Backup Server Site D Protec. TIER Gateway Physical capacity Supports any combination of Gateways, Appliances, single or two-node clusters Protect More. Store Less. ®
Protec. TIER Support for Symantec Open. Storage (OST) a. OST API separates the backup logic from the storage appliance logic and implementation Net. Backup Server Protec. TIER OST Plugin Net. Backup Policy and Control Open. Storage API IBM Protec. TIER: Protec. TIER Server Backup storage appliance with Deduplication and Native Replication Protect More. Store Less. ®
IBM Protec. TIER® Deduplication Family TS 7650 Protec. TIER Appliances TS 7610 Protec. TIER Appliance Express Good Performance Entry Level Easy to Install Up to 100 MB/sec 4 TB and 5. 4 TB Useable Capacity TS 7650 G & TS 7680 Protec. TIER Gateways Highest Performance Largest Capacity High Availability Better Performance Larger Capacity Scalable Sc ala ble Ca pac ity an Up to 500 MB/sec d 7 TB to 36 TB Per Useable Capacity for ma nce Protect More. Store Less. ® Backup: Up to 2000 MB/sec Restore: Up to 2800 MB/sec Up to 1 PB Useable Capacity
Protec. TIER Differentiation
Protec. TIER Advantage: Data Integrity a. Unique and patented Hyper. Factor® deduplication technology b. The only production proven deduplication solution not based on a hash algorithm c. Designed for 100% data integrity d. Bit for bit comparison of data to ensure data is a duplicate e. Can NEVER lose data due to a hash collision Although the chance of losing data from a hash collision is low, it is NOT ZERO as it is with a Protec. TIER solution Protect More. Store Less. ®
Protec. TIER Advantage: Restore Performance a. Restoring data from a Protec. TIER solution is even FASTER than backing up b. Protec. TIER can easily restore at 2800 MB/sec! c. High restore performance not limited to certain backup applications or specific data sets like other vendors d. High restore performance achieved on real data with realistic 20% change rate in production environments e. Never requires agents on backup servers Other vendor’s “CPU-centric” architectures are optimized for processing hashes not moving data Protect More. Store Less. ®
Protec. TIER Advantage: Scalability a. A single Protec. TIER system can support up to 1 Petabyte of useable capacity b. Protec. TIER supports the use of any IBM storage system (DS 8000, DS 5000, XIV, etc. ) and most third party storage systems for the repository c. IBM has hundreds of Protec. TIER systems with over 100 TBs of useable capacity in production environments throughout the world d. IBM always states “Useable Capacity” and never uses the deceptive “RAW capacity” terms like other vendors The hidden costs associated with managing, maintaining, powering and cooling multiple appliances is significant and should not be ignored! Protect More. Store Less. ®
Protec. TIER Advantage: Global Deduplication a. Protec. TIER Cluster with true Global Deduplication has been Generally Available and in production since 2008 b. Supported with all major backup applications and available for all Open Systems, System z and System I platforms c. No agents or backup server upgrades required d. Other vendor’s Global Deduplication capabilities are immature and incomplete with very few if any systems in production e. Other vendor’s Global Dedupe restricted to certain models, only with Net. Backup OST and require agents to be installed Many vendors claim to have Global Deduplication but create multiple separate repositories that may contain redundant data! Protect More. Store Less. ®
Protec. TIER Advantage: Inline Deduplication Example: Disk activity needed to ingest and deduplicate 10 TBs of backup data Post Process Approach: Deduplicate after Storing 10 TB Data Hash-based Post Process Write 10 TB Read 10 TB 2 x Requires: a. > storage > I/Os > Time > Effort > Admin Protec. TIER Inline Approach: Deduplicate before Storing 10 TB Data Hyper. Factor Read or Write 10 TB Protect More. Store Less. ® 1 x Results: a. simple faster easier cheaper efficient
Protec. TIER Advantage: Inline Deduplication Inline Processing Backup Server Truck Protec. TIER VT Tape Library SLA is Met Dedupe 8: 00 PM 2: 00 AM 8: 00 PM Post Processing Dedupe Backup Overlap Server Truck VTL Tape Library Dedupe 8: 00 PM 2: 00 AM 8: 00 AM Protect More. Store Less. ® 8: 00 PM
With an IBM Protec. TIER Solution you can. . . a. Store up to 25 times more data on disk a. Up to 25: 1 reduction with 100% data integrity b. Reduce backup and restore times a. b. Fast inline deduplication up to 2000 MB/sec Even faster restores up to 2800 MB/sec c. Improve the reliability of backup operations a. Eliminates mechanical & handling failures d. Drive the cost of disk based backup down a. Reduces energy, cooling, and space required e. Increase data retention a. Store more backup data on disk for a longer time with very little additional cost Protect More. Store Less. ®
For More Information on IBM’s Protec. TIER IBM Customers The main Protec. TIER Web Page www. ibm. com/systems/storage/tape/protectier Protect More. Store Less. ®
Trademarks and Disclaimers 8 IBM Corporation 1994 -2011. All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http: //www. ibm. com/legal/copytrade. shtml. Intel, Intel logo, Intel Inside logo, Intel Centrino logo, Celeron, Intel Xeon, Intel Speed. Step, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. Protect More. Store Less. ®
Ďakujem za pozornosť Protect More. Store Less. ®
1db8e8494f71a4aafb892b50c4730f29.ppt