- Количество слайдов: 83
Review Content Concepts Access Management Week 4
Tonight • More detailed look at metadata description of content • Content management – Components – Implementation issues – Encryption and Digital Signatures
Looking more closely at basic Dublin Core • Last week, we saw the original 15 elements of Dublin Core and the extension to many more terms. • Here is a specific example, with the data filled in, as a reference to how these fields should be used.
A DL example • Library of Congress American Memory project – http: //memory. loc. gov/ammem/index. html – “American Memory provides free and open access through the Internet to written and spoken words, sound recordings, still and moving images, prints, maps, and sheet music that document the American experience. It is a digital record of American history and creativity. These materials, from the collections of the Library of Congress and other institutions, chronicle historical events, people, places, and ideas that continue to shape America, serving the public as a resource for education and lifelong learning. ”
Dublin Core for a map • Map found in the LOC American Memory collection – Map at http: //memory. loc. gov/ammem/gmdhtml/gmdhome. html • Dublin Core metadata illustration found at http: //webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm – Part of a DL course at U. of Alabama – no longer available
Go to web site to explore what is there -including copyright information, title, history, etc.
Ways to specify the metadata • Embed in the file with the resource – HTML meta tags • Illustrated shortly • Separate file with link in the resource file • Totally separate resource for metadata
Dublin Core: Title • Name given, usually by the creator or publisher < META name = “DC. Title” content = “Novi Belgii Novæque Angliæ: nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” > Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Subject • What the work is about, possibly keywords, terms from classification scheme if available. Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Description • Free text description, abstract, etc. Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Source • Is this object derived from another? Is this map a part of a larger map? Is this text a variation or revision of another piece of text?
Dublin Core: Language • Language of the content of the resource • For the map, there is no language content Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Relation • To what other object(s) or collection is this object related? Does it also exist in another collection? Is it derived from another document or image? How is it related? Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Creator • Person or organization responsible for the Intellectual Content of this object Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Publisher • Entity responsible for making the resource available in its present form • Not shown in the example, but should be something like this: Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Contributor • Any entity making a contribution to this object. • Example: someone who added some information to the original document or image • No entry for this map.
Dublin Core: Rights • A pointer to a copyright notice, a rights management statement, or a rights server.
Dublin Core: Date • Date on which this object was made available in its present form, possibly the date it was entered into this digital collection. Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Type or Category • What sort of thing is this? Some examples: home page, novel, poem, working paper, technical report, essay dictionary, … • Type should be selected from a controlled list. For example, see the DCMI Type Vocabulary: • http: //dublincore. org/documents/2006/08/28/dcmi-type-vocabulary/ Why is this recommended as a controlled vocabulary field?
DCMI Type Vocabulary • • • Collection Dataset Event Image Interactive. Resource Moving. Image • • • Physical. Object Service Software Sound Still. Image Text See the official page for explanations of the categories. Note that Image is a broad category and Moving Image and Still. Image are more restricted subcategories.
Dublin Core: Type • Category of this resource Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Dublin Core: Format • The way the content is encoded. This tells what resource is needed to access this content. http: //www. graphcomp. com/info/specs/mime. html
Dublin Core: Unique ID • The key for this object in the collection. • I cannot find one for the map we are looking at, but the ID for the map of which it is a part is g 3715 ct 000001 • The Metadata specification for that would be Source: http: //memory. loc. gov/cgi-bin/query/r? ammem/gmd: @filreq(@field(NUMBER+ @band(g 3715+ct 000001))[email protected](COLLID+dsxpmap))
Dublin Core: Coverage • The time, space or other measurement of the scope or completeness of the object. • No coverage entry specified, but might be this: would a controlled vocabulary be better?
International Concensus • Recognition of International Scope of Resource Discovery on Web • 17 Countries Currently Involved in DC Working Groups • 50+ Implementation Projects in 10 Countries Source: www. cs. cornell. edu/courses/cs 502/2002 sp/. . . /lecture%202 -26 -02. ppt
Spot Check • Find any entry at the American Memory collection that you understand well. • Make a complete (as much as you can) Dublin Core set of meta tags for it. • Work in groups of two or three
Guide to Good Practice • The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials • http: //www. nyu. edu/its/humanities/ninchguide/index. html
Access Control and Rights Management
Framework for Access Management Source: http: //www. cs. cornell. edu/wya/Dig. Lib/MS 1999/Chapter 7. html
Legal and Technical Issues • Legal: When is a resource available to digitize and make available. What requirements exist for controlling access. • Technical: How do we control access to a resource that is stored online? – Policies – Encoding – Distribution limitations
Public Domain • Definition: A public domain work is a creative work that is not protected by copyright and which may be freely used by everyone. The reasons that the work is not protected include: – (1) the term of copyright for the work has expired; – (2) the author failed to satisfy statutory formalities to perfect the copyright or – (3) the work is a work of the U. S. Government. Even in the case of public domain, the origin of the work should be noted if possible. source: http: //www. unc. edu/~unclng/public-d. htm
Date of work Protected from Term Created 1 -1 -78 or after When work is fixed in tangible Life + 70 years 1(or if work of corporate authorship, the shorter of 95 medium of expression Published before 1923 In public domain None Published 1923 63 When published with notice 28 years + could be renewed for 47 years, now extended by 20 years for a total renewal of 67 years. If not so renewed, now in public domain Published from 1964 - 77 When published with notice 28 years for first term; now automatic extension of 67 years for second term Created before 11 -78 but not published 1 -1 -78, the effective date of the 1976 Act which eliminated common law copyright Life + 70 years or 12 -31 -2002, whichever is greater Created before 1 -1 -78 but published between then and 12 -31 -2002 1 -1 -78, the effective date of the 1976 Act which eliminated common law copyright Life + 70 years or 12 -31 -2047 whichever is greater years from publication, or 120 years from creation Chart created by Lolly Gasaway. Updates at http: //www. unc. edu/~unclng/public-d. htm
Works for hire • Usual case -- works created by faculty are not the property of the university. – Faculty surrender copyright to publishers of journals and books – Some publishers allow faculty to retain copyright, giving the publisher specific limited rights to reproduce and distribute the work.
Fair use • No clear, easy answers. • Checklist provided in the article is a good guide to the issues. • Link to the checklist: http: //www. nyu. edu/its/humanities/ninchguide/IV/ – Search for checklist
Moral rights • Fair to the creator – Keep the identity of the creator of the work – Do not cut the work – Generally, be considerate of the person (or institution) that created the work.
Getting Permission • With the best will in the world, getting the appropriate permissions is not always easy. – Identify who holds the rights – Get in touch with the rights holder – Get a suitable agreement to cover the needs of your use. • Useful links: http: //www. loc. gov/copyright/ http: //www. copylaw. com/new_articles/permission. html http: //fairuse. stanford. edu/Copyright_and_Fair_Use_Overview/chapter 1/1 -b. html http: //www. k-state. edu/academicpersonnel/intprop/permission. htm - Includes sample letters requesting permission .
Plagiarism and Infringement Dear Rich: I am a romance novelist and occasionally I borrow material from other books for my historical romances. I’m confused about the difference between plagiarism and infringement. A plagiarist is a person who poses as the originator of words he did not write, ideas he did not conceive, or facts he did not discover. “Plagiarism” is not a legal term; it’s an ethical term. You can plagiarize someone without infringing. For example, if a plagiarist only copies public domain materials, he can’t be sued for copyright infringement. And you can infringe without plagiarizing. For example, this whole answer is pretty much lifted from Chapter 14 of Stephen Fishman’s Nolo book, The Public Domain. (See… I’ve provided attribution; let’s hope he doesn’t sue -- : -). ) Which is worse? A whiff of plagiarism can damage a romance novelist’s reputation, while infringement means dealing with lawyers and hefty judgments. Source: http: //fairuse. stanford. edu/Copyright_and_Fair_Use_Overview/chapter 1/1 -b. html
Checking copyright status Source: NINCH Guide to Good Practice. Chapter 4: Rights Management
Considering people depicted in the work Source: NINCH Guide to Good Practice. Chapter 4: Rights Management Copyright: Lauryn G. Grant
Spot check Part 1: 5 -7 minutes • Working in groups of two or three, construct a scenario of when a work might be used. – – Put it in your digital library? Quote it in a paper? Use it for an assignment? Use it to bolster an argument? • Be specific about the exact nature of the work. Image? text? when created, who created it, etc.
Spot check Part 2 – Rights management 10 – 15 minutes • Pass your scenario to another group, and receive one in turn. • Make a decision about the rights management issues related to the work you received. • Write a brief summary of the issues involved and what needs to be done
Technical issues • Link the resource to the copyright statements • Maintain that link when the resource is copied or used • Approaches: – – Steganography Encryption Digital Wrappers Digital Watermarks
Issues in Encryption • General cases for protection of controlled content: Concern for passive listening, active interference. – Listening: intruder gains information, may not be detected. Effects indirect. – Active interference • Intruder may prevent delivery of the message to the intended recipient. • Intruder may substitute a fake message for the intended one • Effects are direct and immediate • Less likely in the case of digital library content
Message interception Encoding Method Ciphertext Eavesdropping Decoding Method Masquerading Original message Received message (Plain text) Intruder
Types of Encryption Methods • Substitution – Simple adjustment, Caesar’s cipher • Each letter is replaced by one that is a fixed distance from it in the alphabet. A becomes D, B becomes E, etc. At the end, wrap around, so X becomes A, Y becomes B, Z becomes C. • May have been confusing the fist time it was done, but it would not have taken long to figure it out. • Note the simple example at geocaching. com: No intention to hide or confuse. Just keep a person from seeing too much information about the hide, unless the person wants to see the help. – Simple substitution of other characters for letters -- numbers, dancing men, etc. – More complex substitution. No pattern to the replacement scheme. • See common cryptogram puzzles. These are usually made easier by showing the spaces between the words. (For very modern version, see http: //www. cryptograms. org/)
Dancing Men? ? • Arthur Conan Doyle: The Adventure of the Dancing Men. A Sherlock Holmes Adventure. “Speaking roughly, T, A, O, I, N, S, H, R, D, and L are the numerical order in which letters occur; but T, A, O, and I are very nearly abreast of each other, and it would be an endless task to try each combination until a meaning was arrived at. ” Read the story online and see the images and analysis of the decoding at http: //camdenhouse. ignisart. com/canon/danc. htm
Types of encryption - 2 Hiding the text. • The wax tablet example – message written on the base of the tablet and wax put over top of it with another message on the wax • Steganography: (ste-g&n-o´gr&-fē) (n. ) The art and science of hiding information by embedding messages within other, seemingly harmless messages. Steganography works by replacing bits of useless or unused data in regular computer files (such as graphics, sound, text, HTML, or even floppy disks ) with bits of different, invisible information. This hidden information can be plain text, cipher text, or even images. • Special software is needed for steganography, and there are freeware versions available at any good download site. • Can be used to insert identification into a file to track its source. Definition from www. webopedia. com
Types of encryption - 3 • Key-based shuffling – Using a mnemonic to make the key easy to remember. • A machine to do the shuffling A A B B C C D D What shuffling is used? How would “CAB” look?
Monoalphabetic codes • Any kind of substitution in which just one letter (or other symbol) represents one letter from the original alphabet is called monoalphabetic encoding. – Such codes are easy to break. That is what you do when you solve cryptograms. – Frequency distribution of letters in normal text for a given language are well known. • “The twelve most frequently-used letters in the English language are ETAOIN SHRDL, in that order. ” -http: //www. cryptograms. org/
Letter distributions in English A 7. 81% N 7. 28% TH 3. 18 OU 0. 72 THE 6. 42 B 1. 28 O 8. 21 IN 1. 54 IT 0. 71 OF 4. 02 C 2. 93 P 2. 15 ER 1. 3 ES 0. 69 AND 3. 15 D 4. 11 Q 0. 14 RE 1. 30 ST 0. 68 TO 2. 36 E 13. 05 R 6. 64 AN 1. 08 OR 0. 68 A 2. 09 F 2. 88 S 6. 46 HE 1. 08 NT 0. 67 IN 1. 77 G 1. 39 T 9. 02 AR 102 HI 0. 68 THAT 1. 25 H 5. 85 U 2. 77 EN 1. 02 EA 0. 64 IS 1. 03 I 6. 77 V 1. 00 TI 1. 02 VE 0. 64 I 0. 94 J 0. 23 W 1. 49 TE 0. 98 CO 0. 59 IT 0. 93 K 0. 42 X 0. 30 AT 0. 88 DE 0. 55 FOR 0. 77 L 3. 60 Y 1. 51 ON 0. 84 RA 0. 55 AS 0. 76 M 2. 62 Z 0. 09 HA 0. 84 RO 0. 55 WITH 0. 76 SOURCE: Tannenbaum Computer Networks 1981 Prentice Hall
Spot Check • Go to the cryptogram site (www. cryptograms. org) and solve a puzzle. • Work in groups of two or three • What information is helpful? • What makes a puzzle hard? • Suppose there were no spaces between the words? Then what would you do?
Disguising frequencies • First trick: use more than 26 symbols and use several different symbols to represent the same letter. The goal is to even out the distribution. • Ex. Use the letters plus the digits. – 36 symbols – Assign five symbols to the letter E, two to the letter I, three to the letter N, two each to R and S.
More complex • Vigenere’s table • Arrange all the letters of the alphabet 26 times, in parallel columns, such that each column begins with a different letter, first A, then B, etc. • Encode each letter by using a different column for each successive letter of the message. • How to know which column to use? Use a keyword. Examples and breaking: http: //www. cs. trincoll. edu/~crypto/historical/vigenere. html
Vigenere Cypher Write out the message Write the key over the message, repeating as many times as necessary. To encrypt, use the ROW corresponding to the key letter and find the intersection with the COLUMN of the plaintext letter. Reverse to decrypt (Use the COLUMN of the key and scroll down to the row indicated by the cyphertext. The intersection shows the plaintext. • Question -- how long should the keyword be? Long is hard to remember, short repeats too often.
Spot Check • Make up a key • Encode a plain text message (not more than 20 characters, but at least 10) • Pass the key and the encoded message to another team. • Decode the message you receive.
How secure? • The Vigenere cipher looks really hard, but is not secure. Since the keyword repeats, it is really just a bunch of monoalphabetic codes. If you can figure out the length of the keyword, you can do standard analysis. • (It was considered unbreakable for nearly 300 years) • Making it harder - instead of a regular arrangement of the letter columns, scramble them in some arbitrary way. – Makes decoding much more difficult, but also makes it difficult to have the arrangement known to the people who are supposed to be able to read the message.
Enigma • Suppose we take a conversion for the first letter of the message and a different mapping for the next letter … • That is what we did with Vigenere • Add additional encodings. Rotate from a fixed starting point through 26 positions of the first set of columns, then iterate a second set of columns. Now have 676 different mappings. • To decode, must figure out the wiring inside each phase, and the order in which they are arranged in the machine.
Enigma • German engineer, Artur Scherbius (18781929) invented a machine of this type around 1918 and bought the patent rights to one invented in Holland also. He added a reflecting cylinder, which allowed the same machine to encode and decode. He called the machine enigma, from the Greek for riddle. • The enigma used by the Germans in WWII had three rotors, and later four.
Enigma - 2
Encryption/Decryption Keys • Problem is that you have to get the key to the receiver, secretly and accurately. • If you can get the key there, why not use the same method to send the whole message? (Efficiency of scale) • If the key is compromised without the communicators knowing it, the transmissions are open. • Exact working of the enigma machine: – http: //www. codesandciphers. org. uk/enigma/example 1. htm – How Polish mathematicians broke the enigma – http: //www. codesandciphers. org. uk/virtualbp/poles. htm
Summary of encryption goals • • High level of data protection Simple to understand Complex enough to deter intruders Protection based on the key, not the algorithm Economical to implement Adaptable for various applications Available at reasonable cost
Data Encryption Standard • Complex sequence of transformations – hardware implementations speed performance – modifications have made it very secure • Known algorithm – security based on difficulty in discovering the key • http: //www. itl. nist. gov/fipspubs/fip 46 -2. htm
The Data Encryption Standard Illustrated 64 bit blocks, 64 bit key Federal Information. Processing Standards 46 -2 http: //www. itl. nist. gov/fipspubs/fip 46 -2. htm
INTERNET-LINKED COMPUTERS CHALLENGE DATA ENCRYPTION STANDARD LOVELAND, COLORADO (June 18, 1997). Tens of thousands of computers, all across the U. S. and Canada, linked together via the Internet in an unprecedented cooperative supercomputing effort to decrypt a message encoded with the governmentendorsed Data Encryption Standard (DES). Responding to a challenge, including a prize of $10, 000, offered by RSA Data Security, Inc, the DESCHALL effort successfully decoded RSADSI's secret message. According to Rocke Verser, a contract programmer and consultant who developed the specialized software in his spare time, "Tens of thousands of computers worked cooperatively on the challenge in what is believed to be one of the largest supercomputing efforts ever undertaken outside of government. " Using a technique called "brute-force", computers participating in the challenge simply began trying every possible decryption key. There are over 72 quadrillion keys (72, 057, 594, 037, 927, 936). At the time the winning key was reported to RSADSI, the DESCHALL effort had searched almost 25% of the total. At its peak over the recent weekend, the DESCHALL effort was testing 7 billion keys per second.
Public Key encryption • Eliminates the need to deliver a key • Two keys: one for encoding, one for decoding • Known algorithm – security based on security of the decoding key – note, no key delivery problem • Essential element: – knowing the encoding key will not reveal the decoding key
Effective Public Key Encryption • Encoding method E and decoding method D are inverse functions on message M: – D(E(M)) = M • Computational cost of E, D reasonable • D cannot be determined from E, the algorithm, or any amount of plaintext attack with any computationally feasible technique • E cannot be broken without D (only D will accomplish the decoding) • Any method that meets these criteria is a valid Public Key Encryption technique
It all comes down to this: • key used for decoding is dependent upon the key used for encoding, but the relationship cannot be determined in any feasible computation or observation of transmitted data
Rivest, Shamir, Adelman (RSA) • Choose 2 large prime numbers, p and q, each more than 100 digits • Compute n=p*q and z=(p-1)*(q-1) • Choose d, relatively prime to z • Find e, such that e*d=1 mod (z) – or e*d mod z = 1, if you prefer. • This produces e and d, the two keys that define the E and D methods.
Public Key encoding • Convert M into a bit string • Break the bit string into blocks, P, of size k – k is the largest integer such that 2 k
An example: • • P=7; q=11; n=77; z=60 d=13; e=37; k=6 Test message = CAT Using A=1, etc and 5 -bit representation: – 00011 00001 10100 • Since k=6, regroup the bits (arrange right to left so that any padding needed will put 0's on the left and not change the value): – 000000 110100 (three leading zeros added to fill the block) • decimal equivalent: 0 48 52 • Each of those raised to the power 37 (e) mod n: 0 27 24 • Each of those values raised to the power 13 (d) mod n (convert back to the original): 0 48 52
A practical note • There is a lot more to security than encryption. • Encryption coding is done by a few experts • Understanding how the common encryption algorithms work is useful in choosing the right approach for your situation. • Our interest here is in providing assurance that access to protected resources will be limited to those with legitimate rights.
On a practical note: PGP • You can create your own real public and private keys using PGP (Pretty Good Privacy) • See the following Web site for full information. • (MIT site - obsolete) • http: //www. pgpi. org/products/pgp/versions/freeware/ • http: //www. freedownloadscenter. com/Utilities/Required_Files/ PGP. html
Issues • Intruder vulnerability – If an intruder intercepts a request from A for B’s public key, the intruder can masquerade as B and receive messages from B intended for A. The intruder can send those same or different messages to B, pretending to be A. – Prevention requires authentication of the public key to be used. • Computational expense – One approach is to use Public Key Encryption to send the Key for use in DES, then use the faster DES to transmit messages
Digital Signatures • Some messages do not need to be encrypted, but they do need to be authenticated: reliably associated with the real sender – Protect an individual against unauthorized access to resources or misrepresentation of the individual’s intentions – Protect the receiver against repudiation of a commitment by the originator
Digital Signature basic technique Intention to send Sender A E(Random Number) where E is A’s public key Message and D(E(Random Number)) = Random Number, decoded as only A could do Receiver B
Public key encryption with implied signature • Add the requirement that E(D(M)) = M • Sender A has encoding key EA, decoding key DA • Intended receiver has encoding (public) key E B. • A produces EB(DA(M)) • Receiver calculates EA(DB(EB(DA(M)))) – Result is M, but also establishes that only A could have encoded M
Digital Signature Standard (DSS) • Verifies that the message came from the specified source and also that the message has not been modified • More complexity than simple encoding of a random number, but less than encrypting the entire message • Message is not encoded. An authentication code is appended to it.
Digital Signature – SHA (Secure Hash Algorithm) FIPS Pub 186 - Digital Signature Standard http: //www. itl. nist. gov/fipspubs/fip 186. htm
Encryption summary • Problems – intruders can obtain sensitive information – intruder can interfere with correct information exchange • Solution – disguise messages so an intruder will not be able to obtain the contents or replace legitimate messages with others
Important methods • DES – fast, reasonably good encryption – key distribution problem • Public Key Encryption – more secure • based on the difficulty of factoring very large numbers – no key distribution problem – computationally intense
Digital signatures • Authenticate messages so the sender cannot repudiate the message later • Protect messages from changes during transmission or at the receiver’s site • Useful when the contents do not need encryption, but the contents must be accurate and correctly associated with the sender
Legal and ethical issues • People who work in these fields face problems with allowable exports, and are not always allowed to talk about their work. • Is it desirable to have government able to crack all codes? • What is the tradeoff between privacy of law abiding citizens vs. the ability of terrorists and drug traffickers to communicate in secret?
Tonight • Further detail of Dublin Core • Look at DL metadata example • Access management – Encryption – Digital Signatures