Ignore almost everything … if you can.

As a knowledgebase grows larger, our first requirement is to limit what we look at to those things that are highly relevant. One way to meet this requirement is explicit specification of “context.”

What is a “context”?

Few people would assert that the meaning of most natural-language sentences — especially when they are encountered in isolation — is unambiguous in all circumstances. Even when the meaning of a proposition or assertion is represented more formally — as proposed in this blog — the meaning of the assertion may be ambiguous unless we make the context [more] explicit.

So, what, precisely, is a “context”? First of all, let me avoid the term “precisely,” because it is impossible to recreate all the meaningful aspects of a natural-language assertion in any formalism or even in any natural language. Those aspects may be unknown … and/or unlimited. As friend John Bottoms of FirstStar Systems recently observed, “Everyone has this huge parser in their heads of their current version of truth (for him/her).”

And Pat Hayes noted (in a posting to the Ontolog forum) that anything that imposes constraints on the meaning of an assertion is a “context” — even, for example, an adverb. For me, this seems a bit tautological, but it may be all that needs to be said about how we should understand “context” — except that …

  1. accommodating context in a model for representing practical knowledge is important, and
  2. we don’t need a way of modeling “anything” in order to identify simple, useful ways to expressing context and use it to dramatically limit the number of choices presented to us in an information repository.

A few examples of representation of context

In a medical knowledgebase, you might find an assertion like the following, which occurs in a government publication.

Most men with prostate cancer are older than 65 years and do not die from the disease.

You might establish four facets for representing the context of this information:

  1. Sex (men vs. women)
  2. Age range
  3. Frequency of fatality

Those same characteristics (expressed as facets) will be useful across a wide range of assertions about disease and health concerns. (But in many other assertions, none of them will be relevant.)

Techniques for “rapid information thinning”

As literacy (and the number of documents) started racing past our ability to organize it by meaning with traditional, rigid filing systems designed for large collections of documents (Dewey Decimal, etc.), in 1933 Indian mathematician and librarian S. R. Ranganathan formalized a simple but highly effective technique for getting past such rigid single-parent classification systems. We usually call it faceted classification.

The principle is simple: the subject matter of a document can be described by selecting terms from multiple, semantically orthogonal (mutually exclusive) perspectives (facets). In formal faceted classification systems, the facets are usually hierarchies of characteristics. That may sound arcane, but the idea is very old and crops up everywhere. If you are familiar with the parlor game “20 Questions,” you know how to use facets.

Faceted classification has several implementations in the world of library and information science, but it still takes a back seat to the Dewey and Library of Congress models. Times have changed. Faceted approaches are a natural fit for the computer environment. Although, to the best of my knowledge, no significant online retrieval implementation builds on Ranganathan’s basic high-level categories (including Energy, Matter, and Personality) or uses his Colon Classification notation, many thousands of web sites (including  most high-volume product web sites) now use a faceted classification approach as a complement to full-text search.

You won’t see references to facets on product web sites that use faceted retrieval, but you will see tools that allows you to rapidly reduce the set of items of interest. Just click an item in one or more semantically orthogonal categories (price, color, features, etc.) and the set of items presented is reduced quickly from thousands to a manageable few.

How does this apply to retrieval of information from a knowledgebase?

Ranganathan clearly intended his system to handle, in effect, all knowledge, as expressed in documents. But here we are concerned with discrete assertions about reality — Facts and Insights, in particular — not documents. We are also not talking about selecting a new camera from a web site.

You don’t need (and probably don’t want) a high-level set of abstract facets like Ranganathan’s Energy, Matter, and Personality. Michael Crandall, in “Organization of Information,” correlates those characteristics with How, What, Who, but most information-seekers would find even those familiar terms odd or unusable. On the other hand, Where and When will help define context for many assertions.

The key question is: “What characteristics of assertions about reality are relevant to you or your organization?” Chefs don’t care about binary stars. What contexts make sense for you?

  • Persistent factors in your decision-making processes — project names, skills, and goals.
  • Technologies and tools that may apply to problem solving.
  • Relevant technology standards.
  • Any number of other characteristics.

Don’t start a cross-organizational project to identify the “complete” set of facets you need unless you have a Library Science specialist to govern the project. Even then, “facet analysis” — which is part of the formal process for designing a large-scale faceted document-retrieval system — could be a black hole for time and productivity.

You might want to start with a few facets that you know will help in retrieval, but otherwise just let the facets emerge from development and use of the knowledgebase itself. You will not know in advance exactly what additional facets will prove useful and usable. Ask everyone involved to list any suggestions for facets in a simple outliner. (As in all technology choices, confirm that semantic information produced in use of the tool can be transferred to other tools.) As the facets and facet hierarchies emerge, start implementation of tools for putting them to work for information thinning.

Like other aspects of any model for the representation of practical knowledge, the development of facets for describing context should be accommodated in advance, but you must be able to integrate the specific objects and relationships (for example, a hierarchical facet and its elements) without concern for how they affect other aspects of a knowledge representation system. The good news is that faceted classification, whether formal or informal, adapts easily to change. A new facet can be added at any time … as long as it is semantically orthogonal to other facets in use.

© Copyright 2017 Philip C. Murray

This entry was posted in information thinning. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *