[Tom's Home Page]
[Professional] [Life, Fun, &c] [Tell Me...]
Thomas Erickson and Gitta Salomon
(now at) firstname.lastname@example.org and email@example.com
This paper describes the first phase of a project to create a desktop information system for general users. The approach was to observe the problems, needs, and practices of several groups of information users, and to use these observations to drive the interface design of a prototype. In the first section of the paper, we describe problems which arise in the use of a relevance feedback system for information retrieval. In the second and third sections, we look at the needs and practices of users of both electronic and paper-based information systems. In the final section, we briefly describe the resulting design.
KEYWORDS: information retrieval, human interface, user interface, interactive systems, design process, design methodology, relevance feedback
Today there are hundreds of on-line databases available to anyone with a personal computer and a modem. But it isn't very easy to access them. Each data source has its own interface; the computer often serves as only a terminal emulator. In most cases, while accessing information, users temporarily move into a world which is isolated from the rest of their computer environment. When they return, there are few facilities for working with the retrieved data.
In the future, users will want to move fluidly between numerous remote databases and effectively use the information they collect. Personal computers will need to be part of an integrated information environment.
In the Fall of 1989 we began a research project to explore interface issues related to the creation of just such an environment. Our focus was on problems that arise when general users are given access to a number of large, remote databases through their personal computers. (By "general user," we mean users who are not specialists in information retrieval; rather they need to obtain information to do their jobs.) One goal of the project, which is still underway, is the creation of a working prototype which will be installed in a real world environment, and the observation of its use. This prototype will give a group of accountants access to outside news sources and internal company data.
In this paper we discuss some of the interface issues which arose during the initial investigation phase and provide an illustration of how these issues drove an early prototype design. The investigation phase involved studying an existing commercial full-text information retrieval system, called DowQuest , which permits users to create powerful queries using natural language and relevance feedback  rather than a sophisticated query language. This phase also involved observation of information users. We interviewed and observed three groups of users: professional on-line searchers; day to day users of on-line information sources who were not information professionals; and a group of accountants. While the accountants made little or no use of on-line information sources, they nevertheless accessed and managed large amounts of paper-based information, and are the target group for the interactive prototype.
The remainder of this paper is divided into four sections. After a brief overview of the DowQuest system, we discuss issues concerning its query style. In the second and third sections, we look at the needs and practices of users of both electronic and paper-based information systems. Finally, we discuss a prototype that addresses some of these issues.
Early in the project, we were presented with the opportunity to use the DowQuest retrieval engine in our working prototype. In general, this engine seemed well suited to our target audience of accountants, who were generally lacking experience in the use of sophisticated query languages. Before we set out to design an interface to the engine, we examined the already functioning DowQuest implementation.
DowQuest, offered by Dow Jones & Company as part of their Dow Jones News Service, gives users access to over 350 news sources covering, approximately, the previous six months . The system offers a full-text retrieval mechanism based on relevance feedback  which is purported to enable ordinary users to conduct powerful searches of large databases. Rather than using a sophisticated query language, DowQuest allows users to first type in a few words, get a list of potential hits, and then say in essence 'get more like that one.'
Figures 1 and 2 depict two phases of the process of constructing a query in DowQuest. In Figure 1, the user has entered a sentence describing the desired information. While DowQuest does not do actual natural language understanding, the user is encouraged to enter text in that manner. In the example shown, the system will drop out the words "tell," "me," "about," "the," and "of," and use the other, lower frequency words to search the database. After the user has entered the initial query, the system returns the titles of the 16 most 'relevant' articles, where 'relevant' is defined algorithmically and is based on a variety of features over which the user has no control (and often no knowledge). While this list frequently contains articles relevant to the user's query, it also usually contains items which appear to the user to be irrelevant. At this point, the user has the option of reading the articles retrieved or continuing to the second phase of the query process.
Through observation of users, as well as our own experiences with the system, we uncovered a number of interface issues related to DowQuest's method of query specification and use of relevance feedback. A variety of lower level interface problems such as the arbitrary 16 article result set size or the limitations of the teletype-style interaction are discussed in . We discuss two higher level problems which seem of general interest and importance.
New users of DowQuest generally had high expectations of the system's
intelligence. There are a variety of possible reasons for this, ranging
from the seeming use of natural language, to the system's apparent ability
to 'find more like this,' to the general belief in the intelligence of computers.
In any event, these expectations were usually dashed when, in response to
the first phase of the first query, DowQuest would return a set of articles
containing many irrelevant articles. Consequently many users assumed the
system was no good, or that no relevant articles existed, and would abandon
the query before even trying relevance feedback .
Another negative effect due to the assumption of intelligence occurred in the second phase of the query, when users requested the system to retrieve more articles 'like that one.' The new list of articles returned was ordered by 'relevance,' and, of course, no computer scientist would be surprised to find that an article is most similar to itself. General users, however, lacked this insight, and so when they looked at the new list and discovered that the first, most relevant article was the one they had told the system to find more like, they assumed there was nothing else relevant available and did not inspect the rest of the list . While this assumption was incorrect, in human-human conversations it is conventional to assume that a provider of information will provide new information if it exists .
Another problem, observed primarily in our own use of DowQuest, was one
of undesired generalization. An example of this occurred for the query:
'tell me why Apple Computer stock prices have dropped.' The initial query
produced some relevant articles, but after a couple rounds of feedback,
the articles found veered away from Apple stock prices and began to emphasize
the fluctuations in high technology stock prices. This occurred because
articles discussing Apple's stock price tended to put it in a more general
context, and repeated feedback of relevant articles reinforced this context.
It is perhaps inaccurate to refer to such generalization as a problem, since
it may often be a desired result. Nevertheless, it aptly illustrates the
loss of control that results from shielding the user from the complexity
of query languages.
While both problems discussed in this section arise in the context of DowQuest, analogs of them seem likely to occur in any system which attempts to use built-in intelligence to shield the user from underlying complexity.
Through interviewing and observing users of both electronic and traditional information, we uncovered a number of issues that need to be addressed in the creation of an integrated desktop information environment. These are discussed below.
Before users can create queries they need metaknowledge about the information in which they're interested. For example, they need to know 1) where to look for the answer to their question, and 2) what constitutes a reasonable question. This knowledge is not typically in the hands of the general user.
There are many databases available on-line. How do users decide where
to start looking for desired information? In observing expert on-line searchers
at their weekly status meeting, we noted that a remarkable amount of time
was spent sharing information about databases: topics included newly available
databases, information quality, frequency of updates, timeliness of updates,
costs, as well as situations in which a particular database should be consulted.
Some of this information was gathered from experience, some gleaned from
newsletters written by the database publishers. It became apparent that
learning and memorizing database characteristics is a recognized part of
the professional searcher's job.
Yet, a casual information user cannot be expected to stay abreast of database attributes in the same way. On the other hand, casual users often hold strong opinions about the quality of various data sources (whether well founded or not), and would likely be opposed to any system that automatically selected 'appropriate' databases. The information access system should, therefore, be designed to offer easy access to descriptive information about the available databases and offer aid in making decisions, when desired.
A related problem is that general users often lack familiarity with the
amount or scope of knowledge associated with the information they are seeking.
The on-line searchers indicated that it is not uncommon for a client to
request, for example, all information about "artificial intelligence."
In such situations, the searcher explains the difficulty and, through conversation,
narrows the query's breadth. However, if the user addressed the same query
to an on-line service, an enormous amount of material would be retrieved,
unaccompanied by explanation. In such instances, the information system
needs to help users make headway in their search. Various research systems
have addressed this problem, and solutions range from providing the user
with an example of a retrieved record to assist in query reformulation ,
to providing mechanisms for guiding the user through the information .
Additional information about these, and a variety of related issues, can be found in  and .
Many databases contain frequently changing information. Bibliographic sources acquire new citations; news databases receive the latest reports. Over time, previously available information may longer be accessible. For example, due to the large volume of news items and storage limitations, DowQuest offers approximately the last six months of news at any one time. Several interface issues arise because of this dynamic nature of information sources, some of which are discussed in .
From our interviews we expect users will issue two types of queries: ad hoc queries, where they want an answer to a specific question and nothing more; and on-going queries, where they want to be kept up to date on a particular topic. The following examples illustrate problems that can occur in both of these cases.
One day in November of 1989, we issued the ad hoc query "earthquake volcano ashes seismic activity" on the DowQuest database. This query was successful and returned desired articles about the October 1989 California earthquake. However, when we executed the same query at a later date with the intent of quickly re-finding this information, we obtained articles about a newly erupting Alaskan volcano. Because DowQuest only returns 16 results to any query, the new information had taken precedence and the "California Earthquake" articles had slipped below the retrieval threshold. Even if DowQuest had displayed the entire result set, we may not have easily found the desired articles, because their location had changed. Users may find it disconcerting that on a different day the same query may not return the same set of results.
Similarly, a once useful on-going query may eventually become inadequate. For example, an on-going query established ten years ago to track news on portable computers might have performed well for quite some time. Today, the same query would return unmanageable numbers of articles. Furthermore, because terminology has changed, some relevant information might not be returned: machines that were called portable ten years ago might not be called portable today and many subclassifications now exist. In order to be useful again, the old query would have to be refined and narrowed to meet particular interests, in light of new developments. Possibly, several new, specific queries would be required to effectively deal with the information.
These problems are basically the result of a mismatch: a static query cannot remain effective when it is directed at a dynamic database. Therefore, the query interface will need to establish a means of explaining why and how changes have occurred and offer ways for the user to easily alter the query as the available information changes.
In our observations of general information users, we noted a number of practices which seemed of importance in their use of information. It seems likely that any successful desktop information system will have to support such practices.
In our study of accountants, we found that whether they were dealing
with newspapers, technical papers, or memos, no one ever used the verb "read."
These users began by skimming all information they received, often relying
on the layout of the information to give them a quick overview. Only rarely
did they decide to read the material thoroughly. One accountant subscribed
to approximately 20 magazines and journals, but infrequently ventured beyond
the table of contents. Similar usage patterns have been noted in other domains
It is difficult to skim electronically-based information in the same way. One accountant, who had personally implemented part of an electronic database of a standard accounting reference, confessed that he preferred using the hard copy version because it was easier to skim.
One way to facilitate skimming is to provide article summaries. However, it is often not possible to summarize (either automatically or manually) a document because different people will look for different types of information. The accountants we interviewed noted that they often search for information that is implicit or even deliberately concealed (such as bad financial indicators), and would be even less likely to be included in an abstract.
A different tactic is to rely on structure in the document itself. Various designers (e.g., ) have argued that document usability can be enhanced by incorporating the structure of traditional documents into on-line information. Paper-based documents such as magazines employ a variety of visual design techniques which could be used to facilitate skimming in on-line documents. The design challenge here is to support skimming in ways that go beyond adaptation of traditional printed media design and take advantage of the properties of electronic media (e.g., ). For example, one accountant suggested that the system could display the first few sentences of every paragraph and he could choose where to expand to full text.
Most of the accountants annotated (i.e., added comments or marked-up)
the paper-based information they saved. Annotation was used as a memory
cue about what aspects of the information were of importance. In addition,
annotation was used to add value. For example, annotation facilitated skimming
by other people with whom the document was shared. Also, it was used to
indicate relationships between the document and other information.
Currently, it's difficult to annotate an electronic document casually. One accountant who maintained information on-line went to great lengths to annotate it. He would import the ASCII text into a word processor and mark it up by changing text styles to bold or underline. More typically, users printed the information they'd found, marked it up by hand, and filed it, thus losing any capacity for electronically managing the retrieved documents. A complete information environment needs to provide users with annotation tools, the means to view documents in both pristine and annotated form, and the ability search for elements in both the original data and the annotations.
Our interviews with accountants also revealed a way in which annotation may be more important in an electronic environment than in a paper-based one. The accountants themselves are audited by corporate level quality control people who want to make sure that they're performing to the company's standards. Among other things, quality control people look at clipping files to ensure that the accountant is keeping up on the industry and clients. Future systems which automatically retrieve information on particular topics would eliminate this as a source of evidence. In such an instance, the existence of annotations would provide proof that the information had been 'touched by human hands' evidence that might be welcomed by clients as well as quality controllers.
The accountants discarded all but the most important information; space
constraints, as well as the difficulty of deciding which file folder was
most appropriate, deterred them from saving more. There was a general feeling
that the fewer items saved, the easier it was to re-locate them. One of
the few users who maintained information in electronic form saved items
into a "scrapbook" file, but rarely revisited anything because
this required a sequential scan through the file. These cases indicate that
an information management system needs to supply users with tools to organize
and reorganize their data, once retrieved.
Such tools need to support full text search on saved items, as well as the ability to search on other criteria. For example, users often remember the approximate date on which the data was found, or the source it came from. Tools provided by the system should allow the use of combinations of such attributes for searching and reorganizing, thus permitting users to create their own idiosyncratic databases with items retrieved from external databases.
In this section, we briefly describe some of the design elements which resulted from consideration of the issues previously identified. Note that the design does not address all of the issues we have discussed in this paper. Furthermore, we must emphasize that because the system is still being implemented and has yet to be tested on the intended users we cannot say whether the features we describe will be successful. Readers may wish to look at related systems, such as SuperBook  and Concordia , which have already progressed through implementation and testing phases and which address similar issues.
Our prototype interface design has three components: reporters, newspapers,
Reporters are what users interact with to define the type of information they wish to retrieve. Through a form-based dialogue, a user can give a reporter specifications, examine items it retrieves, and use relevance feedback to refine those specifications. Any reporter can be automated so that it will access desired databases on a regular basis.
By using a reporter metaphor, we hope to provide users with a way to understand and contend with a less-than-predictable query mechanism and the dynamic nature of databases. This metaphor allows us to examine an interesting conjecture: anthropomorphism may be useful for representing ignorance, as well as intelligence. Users were often disturbed when initial queries to DowQuest would result in the retrieval of irrelevant articles, and sometimes concluded that "the system" didn't work. Would they be more forgiving of a reporter and expect it to improve with feedback? In addition, real-world reporters embody many of the characteristics of the retrieval mechanism: the ability to use fuzzy information as feedback ('find more like that one'), and the ability to function in a world of changing information (a reporter is not expected to come back with the same information next week).
Typically, a user might create several automated reporters. Because users will want a quick way to determine what's new without having to access each independent reporter, we designed the newspaper component to allow users to skim through all new information. Each reporter is allocated a 'column' in the newspaper. If new information has been retrieved by the reporter since the last edition of the newspaper, the associated column appears in the current newspaper, and contains the titles and brief excerpts of each item found. Reporters that find large amounts of relevant information appear on the front page; progressively less active reporters appear on subsequent pages. A listing of the columns published in the current issue is always available to the user and serves as a navigation device. From the newspaper, the user can either access the full text of an item of interest or call up the reporter. Consequently, if a reporter's column starts to stray from the desired information, the user can easily revise the reporter's assignment.
Whether users are interacting with a reporter or a newspaper, if they encounter an article they wish to keep, they may save it into a notebook. Notebooks allow users to create their own customized databases. Figure 3 describes features of a preliminary design which support practices such as browsing, annotation, and organization.
In this paper we've described the investigation phase of a project aimed at creating a desktop information system for general users. We began by describing problems due to inappropriate expectations of intelligence that arise when users employ natural language and relevance feedback to retrieve information. Similar problems may arise in other domains as interfaces grow more intelligent and adaptable. In our prototype, we use a "reporter." This anthropomorphic metaphor might be more suited to the fuzziness and inevitable 'mistakes' that occur in information retrieval.
Our investigation also included observations and interviews of professional searchers, general users of on-line systems, and accountants, which revealed a number of needs and practices that a desktop information system should support. The system should address the need for metaknowledge and offer support for dealing with dynamic information. The current interface prototype addresses these issues only slightly, because the initial implementation will provide its users with access to familiar information sources. In addition, the system should support current practices such as skimming, annotation, and organization. The newspaper and notebook components of the interface prototype illustrate some ways of providing this support.
The next phase of this project includes the implementation of the interface, its installation in an accounting office, and the observation of its use. At a later date, we hope to report on the nature and efficacy of the implemented interface and use our findings to drive the next design phase.
Special thanks to Ruth Ritter for graphic design assistance and to Kevin Tiene for influence throughout. The project discussed is part of a joint effort between Apple Computer, Dow Jones & Co., KPMG Peat Marwick and Thinking Machines Corp. We'd like to thank the following project leaders from each company for their assistance: Charlie Bedard, Clare Hart, Robin Palmer and Brewster Kahle.
1. Allen, R. B. User Models: theory, method, and practice. International Journal of Man-Machine Studies 32, (1990), 511-543.
2. Belkin, N. J. and Vickery, A. Interaction in information systems: a review of research from document retrieval to knowledge-based systems. LIR Report no. 35. London, The British Library, 1985.
3. Daniels, P. J. Developing the User Modelling Function of an Intelligent Interface for Document Retrieval Systems. Ph.D. Thesis, The City University, London, 1987.
4. Dillon, A., Richardson, J. and McKnight, C. Human factors of journal usage and design of electronic texts. Interacting with Computers. 1, 2, (1989), 183-189.
5. Dow Jones & Company, Inc. Dow Jones News/Retrieval User's Guide. 1989.
6. Egan, D.E., Remde, J.R., Gomez L.M., Landauer, T.K., Eberhardt, J., Lochbaum, C.C. Formative Design-Evaluation of SuperBook. ACM Transactions on Information Systems, 7, 1, (January 1989), 30-57.
7. Glushko, R. J. Design Issues for Multi-Document Hypertexts. In Proceedings of Hypertext 1989. ACM Press, November, 1989, pp. 51-60.
8. Grice, H. P. Logic and Conversation. In P. Cole & J.L. Morgan (Eds.), Syntax and Semantics, Volume 3: Speech Acts. New York: Seminar Press, 1975.
9. Meier, E., Minjarez, F., Page, P., Robertson, M. & Roggenstroh, E. Personal communication, 1990.
10. Salomon, G., Oren T. and Kreitman K. Using Guides to Explore Multimedia Databases. In Proceedings of the Twenty-Second Annual Hawaii International Conference on System Science. (Kailua-Kona, Hawaii, Jan. 3-6, 1989), IEEE Computer Society Press, vol. 4, pp. 3-11.
11. Salton, G. and McGill, M. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.
12. Stanfill, C. and Kahle, B. Parallel Free-text Search on the Connection Machine System. Communications of the ACM. 29, 12, (Dec. 1986), 1229-1239.
13. Walker, J. Supporting Document Development with Concordia. IEEE Computer. [Jan. 1988], 48-59.
14. Weyer, S. Questing for the "Dao": DowQuest and Intelligent Text Retrieval. Online. 13, 5, (Sept. 1989), 39-48.
15. Williams, M. D. What makes RABBIT run? International Journal of Man-Machine Studies 21, (1984), 333-352.
[Tom's Home Page]
[Professional] [Life, Fun, &c] [Tell Me...]
© Copyright 1991 by Thomas Erickson and Gitta Salomon. All Rights Reserved.