The success of the Web as the main provender of information is indisputable. If a company or government is not on the web, it effectively does not exist. A key to the Web's phenomenal success, intriguingly, is in some respects less the information on it, than in our ability to find the information it references. Indeed, the main way we access the Web is via that wee box that from a few words seems to read our mind and return a list of links to resources we want. So successful has this approach to finding information become that on the one hand it is difficult to remember how we managed to find any information at all prior to web based keyword search, and on the other, it's difficult to envision needing or wanting any other tool for information discovery. If we can find it with Google, what more do we need?

Successful paradigms can sometimes constrain our ability to imagine other ways to ask questions that may open up new and more powerful possibilities. The Newtonian model of the universe-as-clockworks, for instance, is still a sound paradigm to explain a great deal of physical phenomena. Indeed, one may say it was only some niggling phenomena that were not well described by that model already that begged the question might their be a better model, a different paradigm? Relativity, a very different way to imagine the behaviours in the manifest world, opened up whole new ways of understanding our universe.
The success of the Google paradigm may be our Newtonian paradigm for the Web. It enables us to do so much information discovery that it is difficult to imagine what we cannot do with the paradigm of continually refining search terms to get to The Result. The approach Google has made ubiquitous, however, does assume that there is An Answer Out There; if we can just specify the query correctly, we can find It.
But how does the Google paradigm help a busy mom find a better job quickly, effectively, that is a match for her passion and skills. And if that mom could use some extra training to support that skill to get that better job, how would the Google paradigm bring in that highly relevant information that is outside the constraints of the keyword search?
In the Information Retrieval and Information Seeking literature, these kinds of more complex, rich information discovery and knolwedge building tasks have been modelled in terms of Search strategies and tactics (Think bates and belkin). In the relatively recent work classed as Exploratory search (see Special Issue, CACM April 2006)., the emphasis has been on harmonizing human computer Interaction design approaches with models of information seeking to develop new tools that will support these alternative kinds of search and knowledge building.
Examples of such approaches include:
Each of these approaches to knowledge building involve exploration of information that yes, pull together a wide array of information resources, but that have less to do with specific iterative searches for a particular pre-existing answer, than support for the development of a New Answer through the interrogation and association of these sources. To support these different kinds of knowledge building goals, we need to develop the tools that will support these kinds of approaches to exploration. The goal of this article is to consider some of the nascent efforts that have been developed around these non-keyword search paradigms.
Exploratory Search Tools to Date
The pre-history of Exploratory Search can be seen in the raison d'etre of hypertext: to support human made associations through knowledge spaces. Nelson, who coined the term "hypertext" in 1965 was inspired by Vanevar Bush's close of WWII vision of the Memex. The goal of the Memex was to support better knowledge management of a post war Science Explosion by helping scientists build, maintain and share their own paths through the document space. Bush called these paths Trails. He postulated that these human made Trails of associations would be more meaningful for scientific discovery than having to track up and down through library taxonomies of texts. Nelson took Trails and imagined what was to become the key component of the Web: the Link, the ability to "transclude" or connect by reference into a new document both one's own thoughts with others' work to develop a perpetual exchange of ideas. A key attribute of the hypertext link was to support non-linear exploration of information for free form association building. Nelson, an Arts graduate, imagined "A File Structure for the Complex, the Changing, and the Indeterminate" a few years before computer scientist Doug Engelbart first presented the NLS, including the debut of the Mouse for navigating a dynamic file linking system, shared screen collaboration, and hypertext. A critical component of the NLS demo was providing multiple visualizations for the ways files and their associated categorization/hierarchies could be represented or resorted.
15 years later, prior to the networked web, Trigg's Notecards system (1984), put NLS on steroids via somewhat richer visualizations of the types of linking functions already described in NLS. While most hypertext researchers point to Triggs formalization of link types as his key contribution, from an HCI perspective that he chose the note card as the metaphor for his system is for our purposes significant. The card paradigm would later be developed into spatial hypertext (Marshall and Shipmen; Bernstein) to support not just a temporal model of seeing one card at a time (a limit of 1984 display systems) but of being able to support the cognitive model of presenting information akin to the layout and re-organization of cards in a physical world in order to build new knowledge through the association of this information. Bernstein's Tinderbox is a commercial application that leverages this visualizaiton for information sense making and for building new knowledge as associations emerge. A data mining engine in the software also exposes potential associations on a topic to surface further information possibilities. It is only recently, in research projects like VIKI by Dontecheva and Drucker that have begun to bring spatial hypertext metaphors to the web, via Web 2.0 protocols. It's early days yet for these projects, but it will be interesting to see how this approach may be used to build, organize and share new knowledge, and what the translation will be between cards-as-notes and documents.
Another related exploratory search thread in the pre web research space that has been Hypertext is adaptive/adaptable hypermedia. Summarized by Brusilovsky, Adaptive Hypermedia sought to blend context awareness with hypertext to deliver the appropriate set of links and trails through a document space. The main scenarios for adaptive hypermedia have been context-aware tour systems and learning programs. The goal of adaptive hypermedia has been, through a user-model, to anticipate the best delivery of material to best support what a person needs to achieve a particular goal, whether that's to get a customized tour of a museum based on one's cultural preferences, or to get the best learning package based on one's current knowledge of a domain. If successful evaluation of these systems has been relatively thin on the ground, they expose the challenge, desire and potential to try to refine a search space based on a person's needs and interests, rather than keyword searches alone.
Some take-aways from these preweb representations of knowledge building across automated resources (both real and imagined) is that Search as keyword search has been largely absent from the main visions of these systems. Perhaps it was simply assumed as a rudimentary tool/strategy such as rooting through the various categorizations of a card catalogue, but it seems important to realize that strategies such as recovering the path through a document space from start to goal (Trails) were seen as critical. Likewise visualizations that privileged non-linear, non-temporally restricted representations of information such operations that can be carried out with notecards - stacking, sorting, selectively displaying, sharing, tagging - were also seen as key parts of information building and communication of that information. And then the Web happened.
This pre-history of current Web-based exploratory search approaches is likewise important because it motivates a kind of recherche du temps perdu - we have been here before, asking how to best enable knowledge discovery - not as fact retrieval but in terms of how to support and enhance that retrieval for building new knowledge. With the astounding success of the Googleverse, however, we occasionally demonstrate a kind of amnesia about what we once sought to achieve. Part of this amnesia may be driven by a similar kind of Newtonian Model success: we've gotten so much out of this approach so far, why not keep digging away at it, push *its* limits? Google demonstrated such envelop pushing by showing how search term patterns correlate to the movement of the flu in the USA.
Early Web Serendipity and Serendipity Redux
One of the celebrated features in the early days of the web - something we have heard less about in the past few years - is the ability to explore a domain. To "surf" the web was a common expression: it meant that we navigated from linked page to linked page - pre the power of search engines - to come upon information serendipitously. The power of the hypertext link was ascendant. this surfing as sense making was something that was not as readily possible in the physical world: books or documents do not have ready links to other documents. While references may be embedded in documents, and one could go from one physical reference, and physically track through a library to another, this took considerable time. The more or less immediate ability to decide to follow one link rather than another and have that linked document returned and displayed caused the notion of serendipitous discovery to be foregrounded as a key value of the web. It made serious and valuable the hours spend surfing that might otherwise be seen as a non-productive use of time. The lack of a powerful search engine made this navigational hit and miss, buggy approach to information finding on the the web a feature rather than a bug. in its early days Indeed, the acceleration of the serendipitous discovery from the rare to the frequent demonstrated another power of the web: acceleration of an analogue process once it goes digital begins to change that practice and our expectations from it. We'll come back to the role of acceleration.
So what has happened to web surfing? The scale of the web has grown so profoundly that surfing has been largely replaced by searching interspersed with select sources of mediation, such as blogs, rss feeds and social networks: we leverage each other's serendipity. We serendip within a smaller set of known resources and search with intent for particular answers. We google so much that it has become a verb that presidential candidates must know to be seen as au fait with the cultural memes about "the internets" and "the google;" those who would serve and who are not current with what is perceived as such basic literacy may be the recipient/victim of "google bombs." These bombs are only so effective because this kind of search has become the key way by which we find information.
The Web as such a networked model of documents misses some of the key features of document exploration we have had in the physical world. Artefacts like library shelves let someone get a sense of the scale of a domain by looking at the space taken up by a topic. Classification systems meant that related topics could be clustered in physical space and located. Some argue that it's impossible to put shelves/categorization systems on the web. Indeed, early ways of exploring the web were through categorization systems like Yahoo and the Internet Directory Project that seemed to fail at scale. The categories, it seemed, became to brittle for the fluid growth of the Web. One of the early Exploratory Search paradigms has been to revisit the notion of categories valuable ways to make sense of a domain and see if there mayn't be a role for such an approach within the web. These models have become known as Facetted Search.
Facetted Search: the Metadata is the Message
Whereas a keyword search brings together a list of ranked documents that match those search terms, the goal of a facetted search is to enable a person to explore a domain via its attributes. One of the most well known examples of such a browser is Apple's iTunes application which is an interface to access and playback tracks or sets of tracks from a collection of music files.

The browser to the collection presents three columns, representing three facets of the Music domain: genre, artist, album. Attributes matching these facets are populated into the columns.A selection in any column acts as a filter on the column to its right. Once a selection is made, and the right column(s) filtered, a list of individual tracks matching those selected is presented in the lower most browser pane. Keyword search is integrated into iTunes such that the list of data matching the search terms populates the facets in the columns as well as returns a list of individual track results. This layout means that even after the keyword search results are returned, the facets can be operated upon to further explore the collection. If results returned cover multiple genres it is easy to highlight those instances that are associated with a given artist, genre or album.
Exploration by facet enables one to make new connections about a domain or its attributes within a domain. One might, for instance discover that someone perceived to be a Jazz artist has also recorded Country music, which may lead one to explore Country music - something previously thought to be of no interest. This same ability to reconsider a domain via attributes also supports creating new knowledge about the domain: a person may not know that these attributes are a way of interpreting a domain. In online shopping sites it is increasingly common when looking for an item to be presented with facets as a way of refining a query by seeing visually, what ways that query can be narrowed . For instance, after doing a search for "sweater" a range of categories to choose from are presented: Category: men's, women's, snow boarding, kids. Feature: on sale, colour, brand or price.

Enriched Facets. Another attribute of note in this small commercial example that goes beyond even iTunes is quantity. The facets not only provide the categories of sweater possible, but how many of each there are. In a sense this is reminiscent of seeing the number of books on a shelf for a particular topic: we immediately get a greater sense of the domain from this simple cue.
A facetted browser that has made particular use of representing quantity is the RB++ browser.

Here, several types of information are visually communicated. First, histogram bars against each attribute in a facet show how many documents are associated with that facet. Hovering over a facet reduces the histograms accordingly to show clearly which attributes are included in the remaining set if that attribute is selected.

selecting mathematics (above)

then selecting Asia after mathematics (above).
Again, it is informative in an of itself to be able to see that in an education curriculum space regarding mathematics that about 25% of the associated information is about Asian curriculum performance, that the documents are mainly in the k-12 space and available as web pages. In this respect the RB++ browser persistently presents the total documents associated with the space, as well as the effect of selection on the space. These light weight information markers provide additional attributes on a space that are not available from keyword search alone.
Backwards Highlighting (UIST08) in the mSpace browser is a similar way of showing effects of selection across facets in what is otherwise known as a directional browser like iTunes. In iTunes, a selection in the middle or left column only filters to the right; it does not populate back to the columns to the left of that selection. Picking the artist "radiohead" in other words does not show with what Genres that band is associated. Backwards highlighting shows both the filter to the right as well as the possible paths that could be associated with that selection from the left. In the example of a newsfilm space below, where the facets are decade, year, theme, subject and story, a person has picked the 1940's in the leftmost column. The columns to the right are all filtered by that choice. They next choose a Theme in the third column. The effect of this selection is both to filter the remaining columns to the right, but also to highlight two items in the Year column to the left from which the selected third column item is related. The intensity of the highlights also shows a person which attributes were deliberately selected (the bright highlight) and which were calculated (the duller highlight). These simple information guides have been shown to assist both recall and descriptions of information in a domain.

Making Sense of the Facets themselves. Another sense making attribute that can be associated with an individual item in a facet is a Preview Cue. Preview cues were designed to help users unfamiliar with a domain and its attributes which may still be presented at a level of expertise outside the ken of the explorer. For instance, someone unfamiliar with classical music may not find much exploratory help in a list of types like Sonata or Symphony or periods like Classical or Baroque. They can make a judgement about the actual music represented by an attribute and whether or not they like that sound. The preview cue, in the classical music example, associates a set of music samples with that attribute. Once the samples are triggered the person can either step through those samples, or based on the first one played decide if they wish to explore that area of the domain further, or move on.

In the image above, hovering over the Speaker icon has triggered a preview cue for the Baroque Composer Reneau. 3 selections by the artist are also cued up in the preview cue. Note also that where Baroque in Period has been selected, a description of the selected facet is presented. Likewise, to help develop an understanding of the domain, when an item associate with a facet is selected, information about that facet is presented.
So far we have seen how small cues associated with static facets can enrich their value for users exploring a domain. mSpace has focused on supporting manipulations of the facets to be presented. mSpace refers to the presentation of facets as a "slice" through a domain space, and enables the facets in the slice to be reordered, as well as enabling other facets to be added or removed to a slice.

This ability to reorganize a slice according to a person's interests was motivated by the desire to enable a person to explore a domain by what is relevant or known to them: to enable them to have more facility to make sense of a domain in ways that are meaningful to them. In the newsfilm world for instance, one may be more interested to organize a space around the work of a particular reporter than around a particular topic.
Visualizations to Enhance Representations for Knowledge Building
While the above discussion has highlighted the simple ways in which information facets can be decorated to enable rich exploration of a domain, mash ups have also shown us the value of re-presenting those attributes across a variety of visualizations. Exhibit is an example of a tool that provides facetted exploration of data along with visualizing that data against maps and timelines

The value of these representations is in the questions they foreground that can be asked. The Presidents facets makes it easy to see at a glance that most Presidents were born on the eastern side of the US. That Cleveland was the last president to hold office completely inside the 19th Century (MacKinley bridges 19th and 20th C).
Projects like LifeLinesII have taken larger sets of data such as patient's health records and medical test results, mashed them up, in order to enable medical professionals to align rank and sort them according to the attributes available on the data. This visualized and parameterized mash up readily facilitates seeing whether and where there might be correlations across populations of timing of a drug, for instance, with respsonses to it when other conditions are present. While IBM's manyEyes shows the value of being able to share visualizations of data quickly for powerful analysis, by adding manipulatable facets onto the visualization, LifelinesII enables dynamic exploration of many "what if" scenarios to be explored and new discoveries through correlations to be made.

Moving from Data Manipulations to Tracking New Ideas
Facetted browsers and tunable visualizations as we have seen make it possible to ask questions either not easily expressed in a keyword search, but also facilitate rapid refinement of queries with real time direct manipulation. Spatial layout of the data's attributes for manipulation allows relationships within the data to remain available for rapid comparison. Likewise mapping data against different kinds of coordinates like quantity, temporal and spatial qualities enables additional information to be communicated without actively seeking for it, enabling the information implicitly to inform query manipulation.
Related to actual data manipulation for exploring data and generating new insights is the question of what to do with the information while moving through it - information we may want to return to later, but not now; thoughts we have mid stream that we'd like to capture without leaving our current focus. All these types of interactions are components of enhancing our information seeking and knowledge building practice.
Currently, we have seen the use of tags-as-annotation as one strategy to enhance the personal or social network value of found things: a tag helps gather that artefact into many potentially relevant contexts. Indeed, the popularity of online photo tagging has rather destroyed the credibility of the oft expressed sentiment that people won't add metadata to their data. Indeed the social sharing value that tags enables, such as a social network being given a set of artefacts from a space tagged specifically for a collaborative project has high value: someone on the team found this thing relevant to our work. Projects like Folksonomies are considering how more strcutured taxonomies may emerge from these flat spaces in order to add the value of categories for exploration to these annotations.
Beyond tags (single words) to strings, or data that's more recognizable as a note or comment on a document, SparTag.us enables not only notes to be associated with a Web page and shared, but these notes can automatically show up anywhere online the document may be cloned. The authors of the technique make the compelling case that much of the Web's available content, from news articles to blog posts, is frequently reprinted verbatim. But what do we do with something we find interesting in the middle of a search? The most common approach is to bookmark or otherwise record the URL for a given post. As work in Hunter Gatherer showed (2002) however, sometimes we don't want the whole document. We want a piece of a document. In Hunter Gatherer, components of Web pages could be captured by highlighted text and hitting a control key. The text was titled and the URL automatically associated with it, and was captured in a linear list called a "collection. " As mentioned previously, drawing on earlier hypertext ideas and modern graphics processing, work by Donetcheva and Drucker on VIKI takes the collection notion and enables each component captured to be laid out as an individual card (2006). LiveLabs recent version of this project adds machine learning processes so that extracted addresses from a collection can be automatically mapped; books can be explored via extracted author or genre information, and cars by price, engine size, model and so on.
Right now, each of these categories of information extraction - books, cars, addresses, people - have been handwrapped widgets matched with the machine learning, and deployed at personal scale. It will be interesting to see how the benefits of formally facetted data can be brought to wilder data collections where machine learning techniques can extract these values for richer re-presentations.
Whither the Note Book , History and what i don't know i need to know?
At a recent NSF workshop on Information Seeking, two of the components that the discussants kept resurfacing as critical tools for exploratory search were History and Note Keeping. An expressed desire was for tools that would help surface things we should know about if and when we're looking at a given topic.
For history currently, we have the History list of our browsers, it's true. But show me someone who has tried to refind something based on History alone and i'll show you a frustrated person. In mSpace, when someone shares an article with another person, they also share the state of the facets to get to that artefact so a larger context of discovery is available. Going outside the context of a single application, the Jourknow project (UIST07) proposes being able to use local computer context to associate and recover information across personal facets like location (from wireless mapping and calendar information), date, and applications to support questions like "what pages was i looking at when i was in the cafe last sunday?" This kind of approach to information seeking does not discriminate between possible search contexts like public, social, private, or application-specific data. The philosophy beyond journknow is that any process might inform any other process of interrogation and discovery: how can we make them available to each other for exploration? Will this ability to blend personal, social and public data itself surface new knowledge/discoveries?
Such questions lead us to come back to questions around how do we capture and reflect upon the knowledge building we are doing? Right now, the main paradigm for exploration is to "go to the web" - via a browser - to trawl for information. Is this the optimal interaction? It seems there are at least two challenges for knowledge building via information seeking while we are working on our own thoughts, or bluntly, when we are taking notes. We may wish to take notes about something while we're reading it - hence being able to select and annotate web documents, as imagined by Nelson decades ago, is as yet uncommon, and still very much in the research wood shed. But likewise we write notes on our own thoughts. Blogging is a popular demonstration of how well writing notes, thoughts or articles is supported - where we can effortlessly add in links to other information. Indeed, with trackbacks, we can also inform those to whom we've linked that a conversation involving their work is underway. Comments on blogs set up meta conversations around the initial seed of a discussion. Fabulous. But blogging is still largely text based. Sure we can link in photos and YouTube videos, but there is many other kinds of data that we might want to reflect upon and share with others.
For instance, consider a scientist who wants to gather up scientific data generated from an experiment, add some notes, tie in some data about the apparatus, along with several quotations about the informing theory, all to give as a blog to a colleague to ask "why aren't my results what the theory predicted? On a more casual note, someone has used VIKI thoughtfully to gather considerable data about various digital cameras. In the mix is the camera they've selected to purchase. How would that annotation be captured to be shared? or the features that were important easily selected for persistent views? And as the data rapidly goes out of date, how might the person share the attributes of their choice to act as a template for a friend's future choice? Backstory (Venolia 08) is a search tool that has been developed to look at some of these issues within a software developer support group works. Gathering up web based sources with local resources and notes on contexts of use, Backstory makes it possible to share local knowledge within a team across data object types. Backstory is a start to taking collections and making the rationale for those collections easier to share, but we are still very light on such wrapping for reuse tools. Right now, wrapping knowledge about gathered artefacts for reuse is what Dan Olson would call a highly "viscous" process: the cost of carrying out the process of gathering organizing annotating and managing the data may be higher than the perceived benefit, and a knowledge building opportunity is postponed or lo
If these kinds of data gathering and sharing tasks for enhanced knowledge building were better supported, we can readily imagine that the process of discovery and innovation would accelerate. As we have seen with Google, when a process accelerates, such as finding a phone number or a paper or the answer to a "what is it" question, the activities supported by those processes change. If we can do something quickly, trivially now that used to take days or hours, we can move on more rapidly from information seeking to knowledge building.
Related to this kind of human enhanced annotated and gathered set of data for another's engagement is what the machine may be able to bring to the table. A repeated demand at the NSF workshop was, "tell me what i don't know i need to know." Such a challenge goes beyond related recommendations of people who read this also bought that. Recently we looked at search behaviours of 2000 users looking for information on diets. We saw that people who also found diet forums came to a decision about what diet they wanted to pursue in about half the time of others who did not. We also saw that the forum users' queries were quite distinct from those who had not found the forums. We know from related research that social support for dieting is a signficant benefit. This preliiminary study seems to indicate that seeing someone search for diet information, and hooking them up with forums where diet support is the topic of the space would be one of the good things to know that a neophyte would not know they need to know. The design challenges here are significant: how can we surface this kind of valuable associated knowledge that would not show up in a keyword search? how do we reflect back why information of this type was being surfaced? Are there ethical issues around how information is selected to be associated? eg, people who are interested in explosives might also want to know about off shore suppliers of hydrogen peroxide?
These kinds of challenges are exciting to contemplate. They suggest that there are many more ways in which we already want to be able to find, manipulate, ponder, share and reflect upon information - all with the facility of keyword search, but none of which keyword search addresses. All which are part of the larger space of "information seeking" beyond simple "search"
So while Google can certainly find data with an increasingly freaky extrasensory like ability, there are so many other aspects to our information seeking and knowledge building practices that, if they too were on Google like steroids, we could return to that initial scenario of a busy mom being able to come to the computer and say "i want a better job" and see a result set perhaps that shows
Your Interests matched with Current Skills Needed Additional Skills Where to Get Training Where to Apply for Positions Now, here's a package to send - would you like to amend any details? would you like me to dial the number for you?
Posted by mc at November 24, 2008 3:53 PM
Leave a comment