0ba82b5e2f44e6203c70977befb8a2f6.ppt
- Количество слайдов: 51
Internet Agents • Web search Agents • Information filtering agents • Off-line delivery agents • Notification agents • Service agents • Web site agents • Mobile agents
Information Search • Ways to Find Information – Browsing: Following hyper-links that seem of interest – Searching: Sending a query to a search engine such as Lycos – Categories: Following existing categories such as Yahoo • Problems – Spent a lot of time and effort to navigate. Can search be made more efficient? – Search but it is difficult to accurately express the user’s intention. – Search engines are not personalized
Search Engines Web etiquette guidelines for spiders • Identify the name of the agent • Identify the user deploying the agent • Announce the agent by posting a message to the comp. infosystems. www. providers Usenet newsgroups • Announce the agent to the Webmasters of the servers the agent will visit • Provide additional information (using the Referrer field) • Be accessible to fix problems the agent may cause • Design the agent so it does not consume lot of resources (e. g. does not use successive hits on a single server, does not loop, runs at appointed times, etc. )
Advantages and Disadvantages of Search Engines Feature Advantage Disadvantage Keyword query Ease of use Lost productivity due ot poor precision Instant response Increased productivity, Decreased productivity, due to chasing links If user knows what he Is looking for Hierarchical subject categories Increased productivity due to high precision Low recall in response to user needs Information discovery via spiders Reduced user workload Lack of scalability and bandwidth inefficiency
Limitations of current search engines • Lack of personalization; this results in low precision of answers • Unscaleability: *the robot must visit not only new links but also old ones to keep them up to date; *the information gathering is centralized Some solutions to scalability issues: • use specialized information brokers for building information indices • use massive replication and caching of popular information • distributed information gathering by placing gatherers on the provider’s site; thus information is ready for analysis as new information comes in, but the provider must implement the software.
Information Filtering Agents • Information Filtering agents find the content of interest to a user. • Information Filtering agents could gather information from different sources • They could filter information based on user’s personal interest • Filtering agents typically use a fixed number for information sources • Information filtering agents may use Information Retrieval techniques *Vector space models, where a document is represented as a vector of attributes *Tree structure, which represents a hierarchical view of a document
Filtering Agents Attributes Element Environment Description Internet Task Skills Information gathering, filtering, presentation Web, news in different domains HTTP, HTML, indexing protocols Knowledge Communication
Filtering Agent Architecture
Filtering Agent Architecture Figure 3. 4 Filtering based on word usage Insignificant Low-frequency words Insignificant High-frequency words Words usage frequency
Benefits of Information Filtering Agents Advantage Feature Information profile Easy-to use, Form Good for persistent based spec interests Web page delivery Info available as Web page Browser independent; Requires site visits E-mail delivery Proactive information delivery Eliminates site visits; e -mail clutter Profile filtering One-to-one “broadcasting” Reduced information overload Heterogeneous Combines hetero info sources Reduces subscription costs User-benefit
Functionality of Web. Mate • Learning user’s interests for information filtering – Multiple TF-IDF vectors representation – Incremental and adaptive Learning – Compile personal newspaper • Support for efficiently finding information – Automatic refinement using Trigger Pairs – Relevance feedback _______________ Chen, Sycara, “Web. Mate: A Personal Agent for Browsing and Searching”, Proceedings of the Second International Conference on Autonomous Agents, Minneapolis, MN, May 1998
Profile Representation • Multiple TF-IDF vectors representation • How many vectors are used? (Settable parameters; depends on # User’s interests, Computational complexity) • How many dimensions are used in a vector? (Computational complexity, typical lexicons in a domain)
Learning Algorithm • Preprocess: Parse HTML page, delete stop words, stemming • Extract TF-IDF vector of the current interesting document • If the number of vectors in the profile is less than predefined number, add the vector to the profile • Otherwise, calculate the cosine similarity between every two TF-IDF vectors in the profile • Combine the two vectors with the greatest similarity. • Sort the weights in the new vector in decreasing order and keep the highest several elements
Compile Personal Newspaper • Automatically spide a list of URLs or Construct a query from the profile • Calculate the similarity and check whether the similarity is greater than some threshold • Experiments: Accuracy in top 10 is between 50% and 60%; Accuracy in top 20 is about 50%; Accuracy in the whole is about 30%
Search Refinement • Trigger Pairs Based Automated Refinement – If a word S is significantly¹ correlated with another word T, then (S, T) is considered a “trigger pair”, with S being the trigger and T the triggered word. • Relevance Feedback – The context of the search keywords in the “relevant” pages is used to automatically refine the search • Parallel Search and Rerank • Similarity-based Query __________ ¹Significance is measured by mutual information (MI):
Examples of Trigger Pairs • Broadcast News Corpus: 140 M words, Distance between S and T is 500 • Examples 1: product << {maker, company, corporation, industry, incorporate, sale, computer, market, business, …} • Example 2: car <<{motor, auto, model, maker, vehicle, for, buick, honda, inventory, assembly, chevrolet, sale, …} • Example 3: fare << {airline, maxsaver, carrier, discount, air, coach, flight, traveler, continental, unrestrict, ticket, …} • Example 4: music << {symphony, orchestra, composer, song, concert, tune, concerto, sound, musician, album, …}
Automatic Search Refinement • The user chooses the domain, and the system automatically expands the query using domain specific triggers or ontology • The user chooses the intended definition of the ambiguous words, and the system according to the definition expands the query • For a search with only one keyword, the top several triggers to the keyword are used to expand the search • For a search with more than 2 keywords, the intersection of the triggers to the keywords are used to expand the search
Relevance Feedback Algorithm • The context of the search keywords in the “relevant” pages is used to refine the search • Given a relevant page, the system looks for the context of the keywords, and calculates the frequency in order to use the top several frequent words to expand the query
The Query Restart Problem • Agent A sends query to Agent B. • Agent B can complete the query in time X, where T X = 1 with probability p. T X=c (c > 1) with probability 1 - p. Expectation: EX = p + (1 - p) c • If not done by time 1, should agent A abort and restart, or wait? • Can restarting reduce expectation? The variance? Both? • Does it help to repeatedly restart k times? ____________ Chalasani, Jha, Shehory, Sycara, “Query Restart Strategies for Web Agents”, Proceedings of Autonomous Agents 98, Minneapolis, MN, May 1998
A Simple Scenario: Single restart Strategy: restart just after time 1, if not done by then. Let Xi = completion time of i'th query, i = 1, 2. X 1, X 2 are independent, identically distributed. New completion time is Y: { Y= New expectation EY 1 if X 1 = 1, 1 + X 2 if X 1 = c. = p + (1 - p)(1 + E X 2) = 1 + p (1 - p) + (1 - p) c If (and only if) c > 1 + 1 / p, EY < X 1 ! (X 1, X 2 indep. )
A Simple Scenario: k Restarts Number of Restarts k
Off-Line Delivery Agents Information filtering agents that deliver personalized information without the need for a direct Internet connection Off-line Delivery of Agents Attributes Element Description Environment Internet, news feeds Task skills Information Knowledge Web, news, finance, sports, weather Communication skills HTTP, Meta tags, Desktop OS
Benefits of Off-line Delivery Agents Feature Direct delivery Advantage Transparent delivery Benefit User does not need to visit sites Automatic delivery Delivery according Avoidance of peak to user specified traffic hours schedule Local Viewing HTML links are locally resolved Avoids the need to get on-line Disk management New information Relieves user from replaces out of date disk management task
Notification Agents A notification agent is one that notifies a user of significant events, i. e. a change in the state of information, e. g. • Content change in a particular Web page • Search engine additions for specific keyword queries • User-specified reminders for personal events (e. g. birthdays) • Notification Agent Attributes Element Description Environment Internet Task Skills Monitoring, determining, and notifying change in information Knowledge Web Communication Skills HTTP, Meta Tag, IDML
Benefits of Notification Agents Feature Monitoring Advantage Monitors for change in information Benefit Reduces user workload Browserless monitoring Monitor only header file or body text Increased network efficiency Change determination Machine check of document change Reduced user workload Server Checks each resource for implementation multiple clients Eliminates bandwidth waste Notification Increases site visits Notifies user of changes
Other Service Agents • • • Announcement Agents Business information monitoring agents Classified ads agents: search database of ads Direct mail agents: deliver direct mail advertising Financial service agents: deliver e-mails with prices or other financial news Food and wine agents Job agents: virtual recruiters to find appropriate employees Entertainment agents: find communities of interests similar to the user and recommend items, such as music, movies etc. Shopping agents: comparison shopping for user-specified items Site agents: virtual hosts at sites
Shopbots Advantages: • Provide unified interface to different stores, thus mitigating need to navigate and deal with different interfaces • Find best price and availability of a product Challenges • Virtual stores stop agents since they do not want to be compared on price and availability alone • User’s trust in a shopbots’s ability to notice sales and promotions. Solutions: • Cooperative vendor/agent model • Vendor form learning agent
Collaborative Filtering A collaborative filtering system makes recommendations based on the preferences of similar users. People: Yenta, Referral Web Products: Firefly, Tunes, Syskill & Webert Readings: Wisewire, Phoaks
Content vs. Collaboration • Content-based retrieval returns documents that are similar to a query (search) or a user profile (preference) • Collaborative recommendation retrieves documents liked by others with similar profiles
Early Apps • Group Lens (1994) Filtered newsgroups. . news client displays predicted scores & user rates after reading. . • Phoaks Recommended webpages. . uses frequency of mention data within Usenet news groups to rate URL’s
Getting the Data Explicit: Firefly rate match recommend Implicit: Amazon purchase match recommend Priming the Pump: Lifestyle Finder uses demographic data to assign users to market research categories Over the Shoulder: Letizia uses observed browsing behavior & heuristics to recommend links
Problems in Collaborative Filtering Incentives & Startup • Need a critical mass of users/recommenders to make meaningful predictions • Need mechanisms to maintain participation Reliability • Spoofing- will content providers inflate their ratings • Technical problems with clustering & similarity measures Privacy • Once you share your profile who else may want it?
Synthetic Agents (e. g. Julia) Julia is a chatterbot that tries to convince users of its humanlike behavior: • · Repeating user’s input in questions • · Admitting ignorance • · Changing the topic of conversation • · Using conversational statements • · Using humorous statements • · Providing excerpts fro Usenet News • · Simulating typing, mimicking a user’s imperfect performance Possible applications of chatterbots: • · Visiting on-line chatroooms on topics of interest to your company • · Initiating interesting conversations in chatrooms • · Presenting comparison ads against your rivals • · Querying information requests about your products • · Serving as a site guide for finding information • · Serving as a product guide on your site (e. g. demonstrate an automobile)
Intranets Business applications of intranets: • • • Effective communication medium for enterprises Create virtual communities within an enterprise Automating order tracking and transaction processing Marketing support automation Customer service and knowledge sharing among customers • Internal help desk to provide guidance for corporate processes and resources • Human resources support
Benefits of Intranet Search Agents Feature Advantage Benefit Multidatabase search Client search of all corporate databases Increased organizational productivity, reduced costs Search save on servers Enables sharing of search results within organization Reduced workload Multiple-level access Allows access of certain field to Corporate security control authorized users Proactive Notification Notifies users of change in information Increased productivity, enhanced corporate communications
Intranet Filtering Agent Attributes Element Descriptions Environment Intranet Task Skills Information organizing, sharing and presentation Knowledge Skills Corporate database, workgroup discussions, newsfeeds Communication HTTP, HTML, OLAP
Benefits of Intranet Filtering Agents Feature Advantage Benefit Information Profile Form-based specification of Ideal for persistent but individual workgroup cumbersome for dynamic interests Notification Proactive information Delivery Increased site visits and increased productivity by alleviating information search Profile based filtering Relevant information for critical decisions Increased organizational productivity Heterogeneous information sources Combines heterogeneous information sources Increased productivity and reduced subscription costs through sharing
Drawbacks and extended features Drawbacks include: · Separate notification for each user interest, cluttering mailbox · Do not incorporate user model for tracking user’s actions upon information delivery Advanced Features · Recommend an agent for each new user interest topic · Modify an existing agent, based on user’s use of agent recommended information (e. g. specialize an information agent) · Remove an agent that the user does not use · Temporally activate an agent based on user interest and disinterest in the agent’s recommendation
Collaboration Agents The software runs over a network and enables a team to work together and share information. It assists groups in: · Group scheduling · Discussion groups · Resource tracking · Document Management It could do some simple tasks: · Save and re-execute shareable queries that search groupware data bases · Perform a script under pre-specified conditions · Perform a script according to pre-specified schedule
Example: Lotus Notes Agent definition · Agent name with optional comment · When the agent should run: *manually *if new mail has arrived *if documents have been created, modified, deleted *at scheduled times, e. g. hourly, daily etc • What document should the agent act on? *all documents *all new and modified documents since last time agent ran *all unread documents *selected documents • What should the agent do? *User can enter Lotus. Script program that can examine named fields, and apply simple conditional logic.
Process Automation Agents The goal is to use agents to automate workflow in business applications Differences between traditional workflow and agent-based workflow · Traditional workflow is centralized; agents offere a distributed infrastructure · Traditional workflow works only in structured environments; agents could manage workflow during execution · Traditional workflow pre-specifies paths to take for exception handling: agents can negotiate new tasks and resources dynamically
Attributes of Process Automation Agents Element Environment Description Intranet Task Skills Process scheduling, negotiation, execution, and notification Knowledge Business processes, resources management KQML, KIF, CORBA Communication skills
Advantages of Process Agents Feature Task Scheduling Advantage Schedule user tasks Negotiating with server agents Resource Management Dynamically allocate resources for task execution Exception handling Proactive notifications Benefit Alleviate the need for User to be present to execute a task Reduced workload as the user no longer needs to worry about resource availability Renegotiate to Reduced workload as reschedule in response this is transparent to user to execution errors Proactively notify user Increased productivity by of task completion reducing user need to monitor
Database Agents that provide Enterprise-based support · Run scheduled database analyses in the background · Exception reporting for operations management · Notify of information changes in a user-specified database object
Database Agents: Enterprise data delivery system OLAP Server DSS Agent Desktop VLDB Drivers Oracle Informix Server . . . SQL Server
Database Agents Attributes Element Description Environment Intranet Task Skills Data analysis automation, exception reporting, notification of information change Knowledge Data warehouse, metadata, RDBMS Communicati SQL, ODBC, OLE on Skills
Database Agent Benefits Feature Advantage Benefit Automatic data Automates users’ Reduced workload repetitive data analysis Analysis Exception reporting Reports user-defined exceptions in business Operations Faster decision making Notification alerts Notifies user of Increased changes in information productivity
Desired Features of Database Agents Exception reporting alerts · Time or event triggered report execution · Workflow actions triggered by reports · Incorporation of learning capability into the Database agents · Incorporation of learning into the OLAP server
0ba82b5e2f44e6203c70977befb8a2f6.ppt