Project Aristotle(sm):

Automated Categorization of Web Resources

"Users are seeking guidance and organization in a chaotic, dynamic information framework." David Eichmann, Ethical Web Agents, 1994

PROJECTS, RESEARCH, PRODUCTS and SERVICES

AT&T Laboratories, AT&T Research, Murry Hill, New Jersey, USA

Project Name
People Helping One Another Know Stuff (PHOAKS)
Principal Investigators
Will Hill (willhill@research.att.com)

Loren Terveen (terveen@research.att.com)

Project Summary
PHOAKS reads messages posted to thousands of USENET newsgroups and tallies and summarizes Web resource recommendations that appear in those messages. Its index method mimics exactly the hierarchical structure of Netnews. After each index name, the number of Web resources encountered by PHOAKS is indicated.

PHOAKS "can automatically recognize recommendations with at least 90% accuracy".

Demonstration or Prototype Access
http://www.phoaks.com//index.html
References
HILL, W. and TERVEEN, L. Using frequency-of-mention in public conversations for social filtering. Paper prepared for Cooperating Communities, the 1996 Conference on Computer Supported Cooperative Work, November 16-20, 1996, Boston, Massachusetts, USA.

TERVEEN, L., HILL, L. and AMENTO, B. PHOAKS: A system for sharing recommendations. Communications of the ACM 40(3), 1997,59-62

Columbia University, Department of Electrical Engineering, Center for Image Technology for New Media and Columbia Digital Library, New York, New York, USA

Project Name
WebSEEk
Principal Investigators
Shih-Fu Chang (sfchang@ctr.columbia.edu)

John R. Smith (jrsmith@ctr.columbia.edu)

Project Summary
"WebSEEk is a content-based image and video catalog and search tool for the World Wide Web. The system collects the images and videos using several autonomous Web agents which automatically analyze, index, and assign the images and videos to subject classes.

The system is novel in that it utilizes text and visual information synergistically to provide for cataloging and searching for the images and videos. The complete system possesses several powerful functionalities, namely, searching using content-based techniques, query modification using content-based relevance feedback, automated collection of visual information, compact presentation of images and videos for displaying query results, image and video subject search and navigation, text-based searching, and search results lists manipulations such as intersection, subtraction and concatenation. At present, the system has catalogued over 650,000 images and 10,000 videos from the Web.

New algorithms are being developed for automatic mapping of new unconstrained images/video to semantic-level subject classes in the image taxonomy. A working image taxonomy has been constructed in a semi-automatic way in the current prototype of WebSEEk. The mapping algorithms explore visual features (such as color, texture, spatial layout, video object features), text features (such as associated html documents, transcript, caption), and intelligent clustering techniques in the feature space.

Part of the content-based image retrieval techniques and tools have been developed in a related project, VisualSEEk. It focuses on automatic extraction of local image features, joint feature-spatial visual query, and fast image similarity matching techniques. Demos and description of VisualSEEk are available online as well."

Demonstration or Prototype Access
http://www.ctr.columbia.edu/webseek

http://www.ctr.columbia.edu/VisualSEEk

References
CHANG, S.-F. Content-based indexing and retrieval of visual information. IEEE Signal Processing Magazine 14(4), July 1997,45-48.

SMITH, J.R. and CHANG, S.-F. Searching for Images and Videos on the World-Wide Web. Technical report, CTR Technical Report #459-96-25, Center for Telecommunications Research, Columbia University, New York, New York, USA.

SMITH, J.R. and CHANG, S.-F. Visually searching the Web for content. IEEE Multimedia Magazine 4(3), Summer 1997,12-20.

CHANG, S.-F., SMITH, J.R. and MENG, H. Efficient techniques for feature-based image/video access and manipulation. Presentation prepared for the Clinic on Library Applications of Data Processing: Digital Image Access and Retrieval, March 24-26, 1996, University of Illinois at Urbana-Champaign, Illinois, USA.

DocuMagix, Inc., Menlo Park, California, USA

Product Name
HotPage Plus
Principal
DocuMagix, Inc. (info@documagix.com)
Product Summary
"With DocuMagix HotPage, [one] can capture and organize the information [one] need[s] from the World Wide Web. DocuMagix HotPage is tightly integrated with Netscape Navigator and installs itself on its menu bar allowing [the user] ... to save the Web content on [a] local PC, while automatically retaining the original live links. The organizational aspect is based on a ... user interface which allows [one] to create and maintain a personal file cabinet on a PC with customizable drawers and folders. Each folder can contain a saved HTML page or a saved Web page as a printed document.

With DocuMagix HotPage, one can view saved Web pages off-line within the Netscape Navigator viewer, link back to [an] original site without needing to remember the exact URL, organize Web pages intuitively, merge them with other Windows documents, search [the] entire cabinet for a particular Web page that contains reference(s) to a particular topic, forward a Web page document by fax or e-mail, ... mark annotations on a Web document, or even add URL links to any Windows documents.

DocuMagix HotPage allows one to file single or multiple Web page documents [and with its] unique proprietary AutoFiling(tm) technology recognizes similar documents and automatically files them into a designated folder."

References
THOMAS, W. Web Addict: The End of Web document chaos. Web review, March 8-14, 1996.

Drexel University, College of Information Science and Technology, Philadelphia, Pennsylvania, USA

Project Name
SiteMap
Principal Investigators
Xia Lin (linx@post.drexel.edu)
Project Summary
"SiteMap is a Java application that visualizes a given Web site or a collection of links). Through a Web robot, SiteMap first traverses every link of the web site, collects statistical data, and indexes all the words and pages of the site. Based on the statistical data and the indexing, SiteMap converts each page of the site into a vector, and uses these vectors to train a neural network. As the outcome, the trained neural network presents the site in an organized map: subject areas are identified and labeled; their sizes and locations are determined by relationships among the subjects and by their occurrence and co-occurrence frequencies. Links are clustered and located within their respective subject areas, represented by colored dots.

"To help users interact with the [resulting] map, SiteMap provides various interactive tools. For example, areas can be labeled in more/less details through adjusting a scroll bar; links can be selected through clicking or dragging; contents of any selected links can be shown in a separate window, etc."

A Java-enabled browser is required to view the sample SiteMaps.

Demonstration or Prototype Access
http://lislin.gws.uky.edu/Sitemap/Sitemap.html

References

LIN, X. Self-Organizing Semantic Maps as Graphical Interfaces for Information Retrieval. Thesis (Ph.D.), University of Maryland at College Park, 1993.

LIN, X. Searching and browsing on map displays. In ASIS '95, Proceedings of the 58th ASIS Annual Meeting, Converging Technologies: Forging New Partnerships in Information, October 9-12, 1995, Chicago, Illinois, USA.

LIN, X. Visualization for the document space. In Proceedings, Visualization '92, October 19-23, 1992, Boston, Massachusetts, USA.

LIN, X., SOERGEL, D., and MARCHIONINI, G. A Self-organizing semantic map for information retrieval. In SIGIR '91: Proceedings of the Fourteeth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, October 13-16, 1991, Chicago, Illinois, USA.

International Business Machines (IBM), Haifa Research Laboratory, Haifa, ISRAEL

Project Name
Bookmark Organizer (BO)
Principal Investigator
Yoelle S. Maarek (yoelle@haifa.vnet.ibm.com)
Project Summary
The approach of this project is the combination of manual and automatic organization that enables a user the flexibility of determing when to apply automatic organization to a local repository of Web URLs. The design of an organizer tool, Bookmark Organizer, is described, as are the underlying methods of analysis, clustering and search techniques behind its approach.
References
MAAREK, Y.S. and BEN SHAUL, I.Z. Automatically organizing bookmarks per contents. Paper presented at the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France. Computer Networks and ISDN Systems, 28(7-11), 1996, 1321-33.

Jet Propulsion Laboratory, Pasadena, California, USA

Project Name
Distributed Object Manager (DOM)
Principal Investigator
Rick Borgen (rlborgen@devvax.jpl.nasa.gov)
Project Summary
The "Distributed Object Manager (DOM) is a general-purpose distributed cataloging system. It is general-purpose by means of a schema language that provides specification of types, common attributes, object attributes and collections. A client-server architecture with an SQL-like server protocol language supports flexible distribution. It is a catalog system in the sense that it maintains meta-data descriptions of well-identified objects and supports appropriate search and description features, but it does not try to support full traditional DBMS functionality.

DOM employs a kind of hierarchical organization of collections, except that multiple parent collections are possible. This structure, known as the collection lattice, is a principal organizing mechanism which supports attribute sharing, access permissions, search paths as well as the logical integration of multiple servers.

DOM also employs a type system for classifying objects, which can be considered another organizing feature that cuts across the collection lattice. The type system serves the traditional role of providing an attribute template for sets of objects. It also serves a very important role for supporting distributed queries based on object type.

The DOM system provides a systematic scheme for organizing large numbers of servers to work effectively together. The goal is to approach the kind of uniformity and simplicity in the distribution model of the World-Wide-Web, and yet support the kind of sophisticated queries associated with database systems. The collection lattice, the type system and common attributes are the essential mechanisms for accomplishing this multi-server integration."

References
WAGNER, D. and BORGEN. R. JPL distributed search technology. Position Paper prepared for Distributed Indexing/Searching Workshop, May 28-29, 1996, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Johann Wolfgang Goethe-Universitaet, Frankfurt am Main, Department of Computer Science, Distributed Systems and Telematics, Frankfurt am Main, GERMANY

Project Name
WebMap
Principal Investigator
Peter Doemel (doemel@informatik.uni-frankfurt.de)
Project Summary
WebMap is an extension to hypertext navigation techniques such as linear history and hotlists and bookmarks. WebMap is designed to provide better history and navigation support through the visualization of Web structure.

Use of a two-dimensional graphical navigation map within WebMap helps users to keep track of their position in several ways:

References
DOEMEL, P. WebMap - A Graphical hypertext navigation tool. Paper presented at Mosaic and the Web, The Second International World Wide Web Conference, October 18-20, 1994, Chicago, Illinois, USA. Computer Networks and ISDN Systems 28(1-2), 1995,85-97.

Johns Hopkins University, Department of Computer Science, Baltimore, Maryland, USA

Principal Investigator
Scott A. Weiss (weiss@cs.jhu.edu)
Project Summary
A study of the use of the SMART system in an investigation of topic classification of USENET newsgroups. The framework is to determine the appropriate newsgroups to which a new document should most appropriately be posted. The system was trained by forming 'meta-documents' that represent posting topics.

A technique called 'classification-based retrieval' for finding documents similar to a query document is described.

References
WEISS, S.A., KASIF, S. and BRILL, E. Text classification in USENET newsgroups: a progress report. Paper presented at the AAAI Spring Symposium on Machine Learning in Information Access, March 25-27, 1996, Stanford, California, USA.

Knowledge Systems Incorporated, Export, Pennsylvania, USA

Project Name
PetaPlex
Principal Investigator
Robert M. Akscyn (rma@ks.com)
Project Summary
"The PetaPlex Project is a project funded by the US Intelligence Community to develop feasible architectures for very large-scale digital libraries -- to meet the future needs of the community and those of large-scale commercial applications. The specific goals targeted in the current phase of the project is to develop an architecture capable of scaling to 20 petabytes on-line with subsecond response time to access random, fine-grained URN-specified objects, at a sustained rate in excess of 30 million transactions per second.

... [T]o achieve cost feasibility, the architecture is "massively simple" -- it consists only of simple, commodity-cost, COTS [commercial-off-the-shelf] technologies that enable near-automatic construction and maintenance of the system. A principal part of the architecture involves the full-text search of the hypermedia-structured database for many concurrent searches, on the order of 100,000 on-going searches at any time. The scheme being explored is highly-parallelized, both for the incremental maintenance of the indexes, conducting searches, and storing results in persistent and accessible form.

References
AKSCYN, R. M. PetaPlex Project. Position Paper prepared for Distributed Indexing/Searching Workshop, May 28-29, 1996, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Korea Advanced Institute of Sceince and Technology, Department of Computer Science, Taejon, KOREA

Project Name
NetAgent
Principal Investigator
Taeha Park (taeha@cosmos.kaist.ac.kr)
Project Summary
"NetAgent allows users to search for information in context. Instead of using a single indexing system, NetAgent deploys multiple topic-specific indexing agents. A community of users shares a common context over the global information space by interacting with common agents. A user can find a topic-specific agent suitable for the user's community by following a path that agents suggest. By doing this, a user can avoid many search results that are useless in the context of the user's community.

[NetAgent] enhance[s] current indexing techniques by achieving the following:

References
PARK, T and CHON, K. Collaborative indexing over networked information resources by distributed agents. Distributed Systems Engineering 1(6), 1994,362-74.

Lund University, Library, NetLab, Lund, SWEDEN

Project Name
Nordic WAIS/World Wide Web Project
Principal Investigator
Anders Ardo (Anders.Ardo@ub2.lu.se)
Project Summary
The Nordic WAIS/WWW Project was established to explore the possibilities of improving the navigation and searching of the Internet. The main approach has been to further develop the strength of two of its most important tools, WAIS (Wide Area Information Server) and the World Wide Web (WWW). The project has accomplished the following results:
Demonstration or Prototype Access
http://www.ub2.lu.se/auto_new/UDC.html
References
ARDO, A. and KOCH, T. Automatic classification of WAIS databases. Technical report, Lund University Library, 1994.

ARDO, A., FALCOZ, F., KOCH, T., NIELSEN, M. and SANFAER, M. Improving resource discovery and retrieval on the Internet: The Nordic WAIS/World Wide Web Project - summary report. NORDINFO-Nytt, 1994(4), 1994,13-28.

KOCH, T. Experiments with automatic classification of WAIS databases and indexing of WWW. Some results from the Nordic WAIS/WWW Project.[Synopsis of a paper presented at Internet World and Document Delivery World International 1994] {Proceedings of the Second Conference / Internet World and Document Delivery World International '94, London. Westport, Connecticut: Mecklermedia, 1994, 112-115}.

W4: Nordic WAIS/World Wide Web Project. Project description and plan. NORDINFO-Nytt, 1994(1), 1994,6-16.

W4: Nordic WAIS/World Wide Web Project. Report of Phase I. NORDINFO-Nytt 1994(1), 1994,17-27.

Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, Massachusetts, USA

Project Name
HyPursuit
Principal Investigator
Ron Weiss (rweiss@lcs.mit.edu)
Project Summary
"HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Its clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on different principles for grouping documents, such as the Library of Congress catalog scheme, and automatically created hypertext clusters.

HyPursuit's abstraction functions summarize cluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information loss. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. [A] prototype system comprising 100 leaf World Wide Web. sites and a hierarchy of 42 servers that route queries to the leaf sites has been constructed. Experience with [the] system suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies."

References
WEISS, R., VELEZ, B., SHELDON, M.A., NAMPREMPRE, C., SZILAGYI, P., DUDA, A. and GIFFORD, D. K. HyPursuit: A Hierarchical network search engine that exploits content-link hypertext clustering In Hypertext'96: The Seventh ACM Conference on Hypertext, March 16-20, 1996, Washington, D.C., USA.

SHELDON, M.A., WEISS, R., VELEZ, B. and GIFFORD, D. K. Services and metadata representation for distributed information discovery. Position Paper prepared for Distributed Indexing/Searching Workshop, May 28-29, 1996, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Massachusetts Institute of Technology, Media Laboratory, Agents Group, Cambridge, Massachusetts, USA

Project Name
Letizia
Principal Investigator
Henry Lieberman (lieber@media.mit.edu)
Project Summary
Letizia "is a user interface agent that assists a user browsing the World Wide Web. As the user operates a conventional Web browser such as Netscape, the agent tracks user behavior and attempts to anticipate items of interest by doing concurrent, autonomous exploration of links from the user's current position. The agent automates a browsing strategy consisting of a best-first search augmented by heuristics inferring user interest from browsing behavior."

"The model adopted by Letizia is that the search for information is a cooperative venture between the human user and an intelligent software agent. Letizia and the user both browse the same search space of linked Web documents, looking for "interesting" ones. No goals are predefined in advance. The difference between the user's search and Letizia's is that the user's search has a reliable static evaluation function, but that Letizia can explore search alternatives faster than the user can. Letizia uses the past behavior of the user to anticipate a rough approximation of the user's interests."

Letizia operates in tandem with conventional Web browsers such as Mosaic or Netscape.

References
LIEBERMAN, H. An Automated channel-surfing interface agent for the Web. Presented at the Artificial Intelligence-based Tools to Help W3 Users Workshop, May 6, 1996, the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France.

LIEBERMAN, H. Letizia: An Agent that assists Web browsing. Paper presented at the International Joint Conference on Artificial Intelligence, August 20-25, 1995, Montreal, Canada.

Massachusetts Institute of Technology, Media Laboratory, Agents Group, Cambridge, Massachusetts, USA

Project Name
firefly
Principal Investigator
Pattie Maes (pattie@media.mit.edu)
Project Summary
Firefly "is a personal software agent capable of communicating with other users and recommending music that it knows the user will enjoy. Firefly automates the word-of-mouth process, learning about the user and his/her opinions, and leveraging that information to best serve the user's needs. Firefly uses the tastes, opinions, preferences and idiosyncracies of those most similar to [a user's] "nearest neighbors") in order to suggest new music that [the user] might like too. The more the user trains its agent, the more useful and accurate it gets. The more other people use the system, the smarter the firefly community becomes."

Demonstration or Prototype Access
http://www.agents-inc.com/

Michigan State University, Department of Computer Science, Intelligent Systems Laboratory, East Lansing, Michigan, USA

Project Name
Personalized, Adaptive Internet Navigation Tool (PAINT)
Principal Investigator
William F. Punch (punch@cps.msu.edu)
Project Summary
The "Personalized, Adaptive Internet Navigation Tool (PAINT), allows the user to impose a hierarchical organization on Internet sites and documents of interest by creating categories under which to group sites. Such categorization can be used not only by an individual user, but also can be shared among groups of users with similar interests."

"PAINT will also provide local automatic classification based on user parameters and user behavior. PAINT will record visited locations and categorize them according to past use. The user is then free to examine the automated organization, modify it, and make it a personalized view of the Internet."

References
OOSTENDORP, K.A., PUNCH, W.F. and WIGGINS, R.W. A Tool for individualizing the Web. Paper presented at Mosaic and the Web, the Second International World Wide Web Conference, October 18-20, 1994, Chicago, Illinois, USA.

Nara Institute of Science and Technology, Graduate School of Information Science, Nara, JAPAN

Project Name
Intelligent Information Collector and Analyzer (IICA)

Principal Investigator
Michiaki Iwazume (mitiak-i@is.aist-nara.ac.jp)
Project Summary
IICA gathers, classifies, and reorganizes information from heterogeneous resources on the Internet. This system follows the followiong functions:
References
IWAZUME, M., SHIRAKAI, K., HATADANI, K. TAKEDA, H, and NISHIDA, T. IICA: An Ontology-based Internet navigation system. Paper presented at the AAAI-96 Workshop on Internet-based Information Systems, August 5, 1996, Portland, Oregon, USA.

IWAZUME, M., TAKEDA, H. and NISHIDA, T. Ontology-based information capturing from the Internet. In Knowledge Organization and Change: Proceedings of the Fourth International ISKO Conference, 15-18 July 1996, Washington, DC, USA. INDEKS Verlag, Frankfurt/Main, 1996.

Netscape Communications Corporation, Mountain View, California, USA

Principal
Netscape Communications Corporation (moreinfo@netscape.com)
Product Name
Netscape Catalog Server
Product Summary
"Using the Netscape Catalog Server, companies can set up and automatically maintain Yahoo-style services for their intranets to make the company's knowledge base and Web environment easy to navigate for employees. Using state-of-the-art development tools such as Java, JavaScript, frames, and LiveWire, Catalog Server enables [users] to create a highly customized, multimedia catalog service. You can easily define [a] taxonomy, layout, search menu, and other features with no programming required."

"Netscape Catalog Server is a new class of software for creating, managing, and maintaining an up-to-date catalog of Internet and intranet resources such as documents, email addresses, and file archives resources. [The] Catalog Server automatically creates and maintains a catalog of corporate documents and other information. Users can quickly and easily browse an up-to-date, user-friendly catalog of networked resources such as email addresses, documents, and applications that are located on servers throughout the World Wide Web."

"The Netscape Catalog Server architecture consists of two primary components:

The Catalog Server provides a variety mechanisms for building collections, notably

References
Netscape Catalog Server 1.0 Data Sheet, Product information, Netscape Communications Corporation, Mountain View, California, USA, 1996.

NTT (Nippon Telegraph & Telephone Corporation), Software Research Laboratories, Tokyo, JAPAN

Project Name
Ingrid
Principal Investigator
Paul Francis (francis@slab.ntt.jp)
Project Summary
The Ingrid project is an effort to build a global distributed search infrastructure for the Internet. Using software currently in development, links are automatically placed between a specialized subset of similar web resources. Ingrid adds value to resources by placing them near other relevant resources in Ingrid space. The result is a fully distributed search infrastructure that can be efficiently searched.

Prototypes have been implemented for 1) the infrastructure creation and maintenance software, 2) for a robot for gathering, 3) multi-lingual term isolation and term weighting software, and 4) a user interface for searching.

The project seeks individuals or organizations who may wish to participate in the pilot studies.

Demonstration or Prototype Access
http://www.ingrid.org/
References
FRANCIS, P., KAMBAYASHI, T., SATO, S. and SHIMIZU, S. Ingrid: A Self-Configuring Information Navigation Grid , Software Laboratories, NTT, March 1995.

FRANCIS, P., KAMBAYASHI, T., SATO, S. and SHIMIZU, S. Ingrid: a self-configuring information navigation infrastructure. Paper presented at The Web Revolution, the Fourth International World Wide Web Conference, December 11-14, 1995, Boston, Massachusetts, USA. NTT R & D 45(2), 1996,159-66.

OCLC Online Computer Library Center, Inc., Office of Research and Special Projects, Dublin, Ohio, USA

Project Name
Scorpion
Principal Investigator
Keith Shafer (shafer@oclc.org)
Project Summary
"Scorpion is a research project at OCLC exploring the indexing and cataloging of electronic resources. Since subject information is key to advanced retrieval, browsing, and clustering, the primary focus of Scorpion is the building of tools for automatic subject recognition based on well known schemes like the Dewey Decimal System."

The "thesis of Scorpion is that the Dewey Decimal Classification ... can be used to perform automatic subject assignment for electronic items," i.e. this scheme "can be used to classify an item and denote subject headings."

Demonstration or Prototype Access
http://purl.oclc.org/scorpion/
References
VIZINE-GOETZ, D. Online classification: implications for classifying and document[-like object] retrieval. In Knowledge Organization and Change: Proceedings of the Fourth International ISKO Conference, 15-18 July 1996, Washington, DC, USA. INDEKS Verlag, Frankfurt/Main, 1996.

Oracle Corporation, Redwood Shores, California, USA

Product
ConText Option
Principal
Oracle Worldwide Customer Support (http://www.oracle.com/support/)

Product Summary
"The Oracle7 Release 7.3 ConText Option is the first text management solution tightly integrated with an industrial-strength database. It enables organizations to leverage text information sources as quickly and easily as structured data. It combines the power and scalability of the Oracle® Universal Server® and its SQL-based tools with advanced text retrieval technology to help users extract exactly the information needed. Together, these technologies allow enterprises to integrate large-scale document databases with mission-critical applications and provide hundreds or even thousands of concurrent users with fast, efficient access to text-based information. The ConText Option is ideal for managing and accessing any information source, from historical news archives to cutting-edge Web content."

"Using the ConText Option's advanced text retrieval, text reduction, and classification features, users can pinpoint required information quickly within very large databases. These advanced features for text retrieval, reduction, and classification include:

Reference
ROBERTSON, D. W. Subject indexing on the Web. Position Paper prepared for Distributed Indexing/Searching Workshop, May 28-29, 1996, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Silicon Graphics, Mountain View, California, USA

Project Name
nif-T-nav
Principal Investigator
Kirsten L. Jones (kirsten@csd.sgi.com)
Project Summary
"The statelessness of HTTP makes it difficult for information providers to present complex hierarchical information in an easily navigable fashion. Many non-WWW interfaces allow users to browse through a hierarchical index of information, divided into categories to reduce confusion. [nif-T-nav] is a program ... developed to create an improved interface for complex hierarchies on the Web."

"nif-T-nav was developed ... using standard HTTP; the state information is stored at the client and passed back to the server through the URL. This allows an information provider to present complex information in an easily navigated fashion, reducing the time needed for the user to find the desired data."

References
JONES, K.L. nif-T-nav: A Hierachical navigator for WWW pages. Paper presented at the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France. Computer Network and ISDN Systems 28(7-11), 1996,1345-1354.

Stanford University, Department of Computer Science, Database Group, Stanford, California, USA

Project Name
Glossary-of-Servers Server (GlOSS)
Principal Investigator
Luis Gravano (gravano@cs.stanford.edu)
Project Summary
GlOSS is a broker that can effectively and efficiently locate databases of interest that are most appropriate for satisfying a given query, even in a system of hundred of databases.

GlOSS makes use of partitioned hashing and grid files as useful data structures for summarizing information about included databases.

Demonstration or Prototype Access
http://gloss.stanford.edu/

References

GRAVANO, L., GARCIA-MOLINA, H., and TOMASIC, A. The Efficacy of GlOSS for the Text Database Discovery Problem, Technical report, STAN-CS-TN-93-2, Department of Computer Science, Stanford University, Stanford, California, USA.

TOMASIC, A., GRAVANO, L., LUE, C., SCHWARZ, P., and HAAS, L. Data structures for efficient broker implementation. ACM Transactions on Information Systems 15(3), 1997,223-253.

Stanford University, Department of Computer Science, Logic Group, Stanford, California, USA

Project Name
Infomaster(tm)
Principal Investigator
Michael Genesereth (genesereth@cs.stanford.edu)
Project Summary
Infomaster(tm) "is an information integration system. It provides integrated access to distributed, heterogeneous information sources, thus giving its users the illusion of a centralized, homogeneous information system. Information consumers can ask questions, confident that the system will provide information from all relevant information sources. Authorized information suppliers can add information updates, confident that the system will distribute those updates appropriately.

An essential feature of Infomaster(tm) is its emphasis on semantic information processing. Infomaster(tm) integrates only structured information sources, sources in which the syntactic structure reflects its semantic structure (in other words, databases and knowledge bases). This restriction enables Infomaster(tm) to process the information in these sources in a semantic fashion; information retrieval and distribution can be conducted on the basis of content as well as form. Information updates can be used to deduce additional updates. Complex queries can be decomposed on semantic grounds into simpler queries, and the answers to these simpler queries can then be combined to form unified answers to the original queries. At various points in this process, information can be translated between different vocabularies and formats."

Demonstration or Prototype Access

http://infomaster.stanford.edu/tutorial/

http://infomaster.stanford.edu:4000/

References
DUSCHKA, O. M. Generating Complete Query Plans Given Approximate Descriptions of Content Providers, Technical report, Logic Group Technical Report 96-1, Department of Computer Science, Stanford University, Stanford, California, USA, February 1996.

GEDDIS, D.F., GENESERETH, M., KELLER, A.M., SINGH, N.P. Infomaster: A Virtual information system. Paper presented at Intelligent Information Agents Workshop, Fourth International Conference on Information and Knowledge Management, December 1-2, 1995, Baltimore, Maryland, USA.

Stanford University, Department of Computer Science, Nobots Group, Stanford, California, USA

Project Name
Fab
Principal Investigator
Marko Balabanovic (marko@rsv.ricoh.com)
Project Summary
Fab uses adaptive information retrieval techniques to learn a profile of a user over time and provides recommended Web pages based on this profile.

"Fab, like other Web recommendation services, is divided into three components: Collection (first collect the items to be recommended), Selection (then select from the collected items those best for a particular user), Delivery (finally deliver the selected items to the user)". For the collection component, Fab utilizes an evolving population of search agents, which do a best-first search. In Fab, every agent has [its] own profile. Pages found by this and other agents are collected in a central repository.

Whenever a user asks for a Fab recommendation, their individual selection agent fetches those pages from the central repository which best match their personal profile. Every user has a personal selection agent who maintains their profile. Users are asked to provide feedback on the pages they review. This feedback is used by the selection agent to update the user's own profile, which over time becomes an increasingly accurate predictor of the user's interests.

Demonstration or Prototype Access
No Longer Available

References
BALABANOVIC, M. An adaptive Web page recommendation service. Paper presented at the First International Conference on Autonomous Agents, February 5-8, 1997, Marina del Rey, California, USA. Technical report, SIDL-WP-1996-0041, Stanford Digital Library Project, Stanford University, Stanford, California, USA, September 1996.

BALABANOVIC, M. Learning to Surf: Multiagent Systems for Adaptive Web Page Recommendation. Technical report, CS-TR-98-1605, Stanford, Stanford University, Stanford, California, USA, March 1998.

BALABANOVIC, M., SHOHAM, Y. and Yun, Y. An Adaptive Agent for Automated Web Browsing. Technical report, CS-TN-97-52, Stanford, Stanford University, Stanford, California, USA, February 1997.

Stanford University, Department of Computer Science, Project in People, Computers and Design, Stanford, California, USA

Project Name
Grassroots
Principal Investigator
Kenichi Kamiya (kamiya@cs.stanford.edu)
Project Summary
Grassroots "is a system that provides a uniform framework to support people's collaborative activities mediated by collections of information. The system seamlessly integrates functionalities currently found in such disparate sytems as e-mail, newsgroups, shared hotlists, hierarchical indexes, hypermail, etc. Grassroots co-exists with these systems in that users can benefit from the uniform image provided by Grassroots, but other people can continue using other mechanisms, and Grassroots leverages from them. The current Grassroots prototype is based on an http-proxy implementation and can be used with any Web browser.

References
KAMIYA, K., ROESCHEISEN, M. and WINOGRAD, T. Grassroots: A System providing a uniform framework for communication, structuring, sharing information, and organizing people. Paper presented at the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France. Computer Networks and ISDN Systems 28(7-11), 1996,1157-1174.

Universidade do Minho, Departamento de Informatica, Braga, PORTUGAL

Project Name
SOUR
Principal Investigator
F. Luis Neves (fln@di.uminho.pt)
Project Summary
This project involves the use of modifications and extensions to the SOUR prototype as a tool for the classification, storing and retrieval of Internet information. Internet links are seen as reusable objects, stored and maintained in a generalized and specialized structure based on a comparison-metrics algorithm.

"On the implementation side, SOUR is extended by making use of OLE Automation within Netscape and DDE interprocess communication mechanisms which allow for a third party application to control the Netscape Navigator client."

References
NEVES, F. LUIS and OLIVEIRA, JOSE N. Classifying Internet objects. Paper presented The Web Revolution, the Fourth International World Wide Web Conference, December 11-14, 1995, Boston, Massachusetts, USA. World Wide Web Journal 1(1), December 1995,711-722.

University of Aberdeen, King's College, Department of Computing Science, Aberdeen, Scotland, UK

Project Name
LAW
Principal Investigator
Peter Edwards (pedwards@csd.abdn.ac.uk)
Project Summary
LAW is a system that assists in the identification and location of new and interesting information on the World-Wide Web by interactively suggesting links to the user to browse and through the use of a separate Web robot that autonomously searches for pages that may be of interest.

LAW makes use of two different profiles within its architecture: a link profile and a page profile. "The link profile represents the type of links which the user typically explores as the Web is browsed, and is used to provide interactive assistance as the user views new pages. The page profile describes the types of pages which the user finds interesting, and is used in conjunction with the link profile to control a Web robot." "The Web robot is a separate application that explores the World-Wide Web using a best-first search through the links it encounters."

LAW provides interactive assistance as the user browses by highlighting the links which appear most interesting on each page visited, thereby focusing the user's attention on the most salient parts of the page.

References
EDWARDS, P., BAYER, D., GREEN, C.L., and PAYNE, T.R. Experience with learning agents which manage Internet-based information. Paper presented at AAAI Spring Symposium on Machine Learning in Information Access, March 25-27, 1996, Stanford, California, USA.

University of Arizona, Department of Computer Science, Tucson, Arizona, USA

Project Name
WebGlimpse
Principal Investigator
Udi Manber (udi@cs.arizona.edu)
Project Summary
"WebGlimpse adds search capabilities to [a] WWW site automatically and easily. It attaches a small search box to the bottom of every HTML page, and allows the search to cover the neighborhood of that page or the whole site. With WebGlimpse there is no need to construct separate search pages, and no need to interrupt users from their browsings. All pages remain unchanged except for the extra search capabilities. It is even possible for the search to efficiently cover remote pages linked from [a page].

Neighborhoods can be defined for each page (e.g., all pages within two (2) links ...), and the search can be restricted to the current page's neighborhood. This allows a convenient combination of searching and browsing. WebGlimpse automatically collects and indexes remote pages that are linked from a page.""More complex definitions of neighborhoods, which may depend on semantic analysis, can also be added."

"In summary, WebGlimpse allows any Web site to offer a combination of browsing and searching by automatically analyzing a site, computing neighborhoods, and attaching search interfaces to existing pages."

WebGlimpse uses the Glimpse search engine, a fast and flexible searching tool.

Demonstration or Prototype Access
http://www.cs.arizona.edu/webglimpse/

References
MANBER, U., GOPAL, B, and SMITH, M. Combining browsing and searching. Position Paper prepared for Distributed Indexing/Searching Workshop, May 28-29, 1996, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

University of Arizona, Karl Eller Graduate School of Management, Management Information Systems, Tucson , Arizona, USA

Project Name
EntertainmentSpace (ET-Space)
Principal Investigator
Hsinchun Chen (hchen@bpa.arizona.edu)
Project Summary
"EntertainmentSpace is a set of concept-based search tool developed by the Artificial Intelligence Group in the Management Information Systems Department at the University of Arizona."

ET-SPACE "contains a clickable self-organizing map (SOM) and a searchable entertainment concept space (thesaurus) both generated automatically using [a] DEC Alpha workstation. Users can use this server to identify specific areas in Entertainment and/or other relevant search terms when searching Entertainment servers or homepages."

Funded mainly by an NSF/CISE "Intelligent Internet Categorization and Search" project (1995-1998) and the NSF/ARPA/NASA Illinois Digital Library Initiative project (1994-1998).

Demonstration or Prototype Access
http://ai.bpa.arizona.edu/Lists/list_demos.html
References
CHEN, H, SCHUFFELS, C. and ORWIG, R. Internet categorization and search: a self-organizing approach. Journal of Visual Communication and Image Representation 7(1), 1996,88-102.

CHEN, H, HOUSTON, A.L., SEWELL, R.R. and SCHATZ, B.R.. Internet browsing and searching: user evaluation of category map and concept space techniques. Journal of the American Society for Information Science 49(7), May 1998,582-603.

University of Calgary, Knowledge Science Institute, Alberta, CANADA

Project Name
WebMap
Principal Investigator
Brain R. Gaines (gaines@cpsc.ucalgary.ca)
Project Summary
"Concept maps have long provided visual languages widely used in many different disciplines and application domains. Abstractly, they are sorted graphs visually represented as nodes having a type, name and content, some of which are linked by arcs. Concretely, they are structured diagrams having discipline- and domain-specific interpretations for their user communities and, sometimes, formally defining computer data structures. Concept maps have been used for a wide range of purposes, and it would be useful to make such usage available over the World Wide Web."

WebMap makes use of "an open architecture concept mapping tool and makes it available on the Web in a number of ways:

References
GAINES, B.R. and SHAW, M.L.G. Concept maps as hypermedia components. International Journal of Human-Computer Studies 43(3), 1995,323-61.

GAINES, B.R. and SHAW, M.L.G. WebMap: Concept mapping on the Web. Paper presented at The Web Revolution, the Fourth International World Wide Web Conference, December 11-14, 1995, Boston, Massachusetts, USA. World Wide Web Journal 1, December 1995.

University of California, Department of Information and Computer Science, Irvine, California, USA

Project Name
Syskill & Webert
Principal Investigator
Michael Pazzani (pazzani@ics.uci.edu)
Project Summary
Syskill & Webert, is a software agent "that learns to rate pages on the World Wide Web (WWW), deciding what pages might interest a user. The user rates explored pages on a three point scale, and Syskill & Webert learns a user profile by analyzing the information on a page. The user profile can be used in two ways. First, it can be used to suggest which links a user would be interested in exploring. Second, it can be used to construct a LYCOS query to find pages that would interest a user."

Demonstration or Prototype Access
No Longer Available

References
PAZZANI, M., MURAMATSU, J. and BILLUS, D. Syskill & Webert: Identifying interesting web sites. Paper presented at the Thirteenth National Conference on Artificial Intelligence, August 4-8, 1996, Portland, Oregon, USA.

PAZZANI, M., MURAMATSU, J. and BILLUS, D. Syskill & Webert: Identifying interesting web sites. Paper presented at AAAI Spring Symposium on Machine Learning in Information Access, March 25-27, 1996, Stanford, California, USA.

University of California, School of Information Management and Systems, Berkeley, California, USA

Principal Investigator
Ray Larson (ray@sherlock.sims.berkeley.edu)
Project Summary
"This exploratory study examines the explosive growth and the "bibliometrics" of the World Wide Web based on both analysis of over 30 gigabytes of web pages collected by the Inktomi "Web Crawler" and on the use of the DEC AltaVista search engine for cocitation analysis of a set of Earth Science related WWW sites. The statistical characteristics of web documents and their hypertext links are examined, along with examination of the characteristics of highly cited web documents."
References
LARSON, R. R. Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace. Prepared for Global Complexity: Information, Chaos and Control, the 1996 Annual Meeting of the American Society for Information Science, October 21-26, 1996, Baltimore, Maryland, USA.

University of California, Department of Computer Science, Santa Barbara, USA

Project Name
Pharos
Principal Investigator
Ron Dolin (rad@cs.ucsb.edu)
Project Summary
"Pharos is a scalable distributed architecture for locating heterogeneous information sources. The system incorporates a hierarchical metadata structure into a multi-level retrieval system. Queries are resolved through an iterative decision-making process. The first step retrieves coarse-grain metadata, about all sources, stored on local, massively replicated, high-level servers. Further steps retrieve more detailed metadata, about a greatly reduced set of sources, stored on remote, sparsely replicated, topic-based mid-level servers.
References
DOLIN, R., AGRAWAL, D., DILLON, L and EL ABBADI, A. Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources. Technical report, TRCS96-05, Computer Science Department, University of California, Santa Barbara, California, July 1996.

University of Chicago, Computer Science Department, Intelligent Information Laboratory, Illinois, USA

Project Name
ECHO (Even Chaos Holds Order)
Principal Investigator
Kristian J. Hammond (hammond@cs.uchicago.edu)
Project Summary
ECHO "is an experimental meta-search engine that clusters the results of a general search query into semantically relevant catagories. In other words, ECHO does all the work, so that a user can search using the most basic and general parameters."

Demonstration or Prototype Access
http://infolab.cs.uchicago.edu/echo/echo.html

University of Iowa, School of Library and Information Science, Iowa City, Iowa, USA

Project Name
Sulla
Principal Investigator
David Eichmann (david-eichmann@uiowa.edu)
Project Summary
Sulla is a user agent that supports long-lived, goal-oriented Web activity. Our current approach to agent interaction entails Sulla mimicking the behavior of a human interacting with each service agent. This approach suffers from the ambiguities of natural language and the limitations of interaction through simplistic query interfaces."

"In particular, Sulla supports the ability to:

"Sulla was the robotic secretary to Harry Domain, General Manager of Rossum's Universal Robots, in Karel Capek's 1921 play R.U.R., where the term 'robot' was first coined." The development of Sulla is supported in part by a grant from Texas Instruments and in part by the Repository Based Software Engineering Project.

References
EICHMANN, D. Interaction protocols for software agents on the World Wide Web. Presented at the Artificial Intelligence-based Tools to Help W3 Users Workshop, May 6, 1996, the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France.

EICHMANN, D. Search and meta-search on a diverse Web. Position Paper prepared for Distributed Indexing/Searching Workshop, May 28-29, 1996, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

EICHMANN, D. and WU, J. Sulla - A User agent for the Web. Poster Paper presented at the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France.

University of Kansas, The Department of Electrical Engineering and Computer Science, Lawrence, Kansas, USA

Project Name
Cooperative Agents for Conceptual Search and Browsing of World Wide Web Resources
Principal Investigators
Susan Gauch (sgauch@ittc.ukans.edu)
Project Summary
Cooperative Agents for Conceptual Search and Browsing of World Wide Web Resources is investigating the application of cooperating distributed searching agents for the Web. The goals of this project are four fold:

We are evaluating the effectiveness of ontology-based Web browsing agents and studying the feasibility of visualization agents as a mechanism to provide a holistic overview of the conceptual organization of information on the Web as a whole. Finally, we are analyzing a record of changes to the ontology as a method of studying how quickly information changes on the Web.

Cooperating, distributed, intelligent agents are employed to organize the information on the Web. Each Web site has local agents that characterizes and provides access to the information at the local site. These local agents in turn communicate with regional agents that characterize and provide access to the information for regions of the Web. There is a regional site for every N local sites and a super-regional site for every N regional sites and so on.

This project is funded by the National Science Foundation, CAREER/EPSCoR Award number 97-03307

A Java-enabled browser is required to view the samples.

Demonstration or Prototype Access
http://www.ittc.ukans.edu/~xzhu/wwwAgents/agents/lba/eecs/lba.html

http://www.ittc.ukans.edu/~xzhu/wwwAgents/agents/rba/rba.html

http://www.ittc.ukans.edu/~xzhu/wwwAgents/project.html

References
CASSOLA, E. and GAUCH, S. Intelligent Information Agents for the World Wide Web. Technical report, ITTC-FY97-TR-11100-1, Information and Telecommunication Technology Center, University of Kansas, Lawrence, Kansas, USA, May 1997.

HAVERKAMP, D. and GAUCH, S. Intelligent information agents: review and challenges for distributed information sources. Journal of the American Society for Information Science 49(4), April 1998,304-311.

University of Maryland, Department of Computer Science, Parallel Understanding Systems Group, College Park, Maryland, USA

Project Name
Simple HTML Ontology Extensions (SHOE)
Principal Investigator
Sean Luke (seanl@cs.umd.edu)
Project Summary
SHOE, (Simple HTML Ontology Extensions), is a proposed extension to HTML that allows World-Wide Web authors to annotate their pages with formal knowledge- representation semantics.

This set of HTML extensions provides authors with the ability to embed knowledge directly into HTML pages, making it simple for user-agents and robots to retrieve and store this knowledge. This superset of HTML provides a knowledge markup syntax, enabling authors to use HTML to directly classify Web pages and to indicate relationships and semantic attributes in machine-readable form.

A Java applet, the Knowledge Annotator, has been developed to facilitate the annotatation of Web pages with SHOE knowledge. This is done by loading SHOE tags from a Web page, displaying these graphically, and permitting appropriate editing that results in the appropriate HTML coding

Demonstration or Prototype Access
http://www.cs.umd.edu/projects/plus/SHOE/spec.html

References
LUKE, S. Creating Ontologies Using SHOE. Technical report, Parallel Understanding Systems Group, Department of Computer Science, University of Maryland at College Park, College Park, Maryland, USA, August 1996.

LUKE, S. SHOE 1.0, Proposed specification, Parallel Understanding Systems Group, Department of Computer Science, University of Maryland at College Park, College Park, Maryland, USA, January 1, 1998.

LUKE, S., SPECTOR, L., and RAGER, D. Ontology-based knowledge discovery on the World-Wide Web. Paper presented at the AAAI-96 Workshop on Internet-based Information Systems, August 5, 1996, Portland, Oregon, USA.

University of Texas at El Paso, Department of Electrical and Computer Engineering, El Paso, Texas USA

Project Name
Simple Distributed Directory Services (SDDS)
Principal Investigator
Moises E. Hernandez (moises@accugraph.com)
Project Summary
"The goal of this work is to provide a distributed indexing service based upon a transparent naming system, which will provide a contextual framework for the distributed classification, indexing and location of network resources.

The generalized method described in this work addresses the naming of and search of digital objects regardless of their actual physical location in a network. The network transparent addressing scheme is intended for collections of digital objects stored in a network-like database (such as the WWW), where object attributes can be defined in such a way that any given combination of attributes resolves to the set of matching instances of the requested resources. The naming system presented here is based on a taxonomic classification model."

References
HERNANDEZ, M. E. A Simple Distributed Directory Service for Indexing and Location of Network Resources. Doctoral dissertation proposal presented to the Doctoral Committee of the Department of Electrical and Computer Engineering, the University of Texas at El Paso, July 22, 1996.

Washington State University, School of Electrical Engineering and Computer Science, Pullman, Washington, USA

Project Name
WAVE (Web Analysis and Visualization Environment)
Principal Investigator
Robert E. Kent (rekent@eecs.wsu.edu)
Project Summary
"The goal of the project "Creating a WAVE" is the conceptual organization of a community's information space on the World Wide Web. The project will develop an advanced (Networked Information Discovery and Retrieval) NIDR system called WAVE, which fuses the current NIDR system technology with a mechanism for 'dynamic distributed classification'."

"The project seeks to address the following research question: "What is the appropriate architecture for a digital library?" The research goal of the project is to demonstrate in the distributed context of the World Wide Web that the WAVE system, using both the technique of automatic classification and the notion of conceptual space provides the kernel architecture for a digital library."

This project is funded by the Intel Corporation.

Demonstration or Prototype Access
http://wave.eecs.wsu.edu

References
KENT, R. E. and NEUSS, C. Creating a Web Analysis and Visualization Environment. Paper presented at Mosaic and the Web, The Second International World Wide Web Conference, October 18-20, 1994, Chicago, Illinois, USA. Computer Networks and ISDN Systems 28(1-2), 1995,109-117.

NEUSS, C. and KENT, R. E. Conceptual analysis of resource meta-information. Computer Networks and ISDN Systems,27(6), 1995,973-984.

GENERAL BIBLIOGRAPHY

HERMANS, B. Intelligent Software Agents on the Internet: An Inventory of Currently Offered Functionality in the Information Society & A Prediction of (Near-)Future Developments. Review, Tilburg University, Tilburg, The Netherlands, July 9, 1996.

KOSTER, M. Robots in the Web: threat or treat? ConneXions 9(4),1995,2-12.

LYNCH, C.A. Networked information resource discovery: an overview of current issues. IEEE Journal of Selected Areas of Communications 13(8), 1995,1502-22.

MAES, P. Agents that reduce work and information overload. Communications of the ACM 37(7), 1994,30-40.

MCKIERNAN, G. Automated categorisation of Web resources: a profile of selected projects, research, products and services. New Review of Information Networking 2, 1996,15-40.

MCKIERNAN, G. Hand-made in Iowa: organizing the Web along the Lincoln Highway. D-Lib Magazine, February 1997.

SCHAEFER, M.T. Project Aristotle & Cyberstacks: automating the virtual Internet library. Information Retrieval & Library Automation 33(9), February 1998,1-3.

Project Aristotle(sm): Automated Categorization of Web Resources, is a clearinghouse of projects, research, products and services that are investigating or which demonstrate the automated categorization, classification or organization of Web resources. A working bibliography of key and significant reports, papers and articles, is also provided. Projects and associated publications have been arranged by the name of the university, corporation, or other organization with which the principal investigator of a project is affiliated.
Project Aristotle(sm) is compiled and maintained by Gerry McKiernan, Science and Technology Librarian, Science and Technology Department, and Curator, CyberStacks(sm), Iowa State University, Ames, IA 50011
Scout Report Selection

July 26, 1999

http://www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm