The equivalence relationship
What is this?
Restroom, bathroom, toilet, loo, facilities, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room. . .
Synonymy: Using different words to identify the same concept.
What is mercury?
What is bank?
What is python?
What is java?
Polysemy: Using the same word
(morphologically speaking) to identify different concepts.
Java: Island in Indonesia, variety of coffee bean, generic term for coffee, object-oriented programming language.
The White House has been lobbying Congress to support the proposed budget. . .
Freedom of the press is an important value in the United States. . .
I’m tired of taking the bus; I need some new wheels.
. .
Metonymy: Using a related concept to stand for another concept.
Synecdoche: Using the word for part of something to stand for the entire thing.
No.
Furnas and colleagues asked people (including subject experts) to label a variety of items (recipes, text editing operations,
“common content objects”). Surprise, there was little agreement among the names submitted by participants.
Conclusion: “The idea of an ‘obvious,’ ‘self-evident,’ or
‘natural’ term is a myth! Since even the best possible name is not very useful, it follows that there can exist no rules, guidelines or procedures for choosing a good name, in the sense of ‘accessible to the unfamiliar user.’”
Furnas and colleagues suggest that interface designers:
•
Implement unlimited aliasing.
•
Disambiguate terms that can be used in multiple senses by presenting possibilities to users and asking them to select the appropriate one.
•
Participants were asked to label objects, not how they would search for objects.
•
The study assumes a search interface, not a browsing (or menu-driven) interface.
In a search interface, users must recall or guess an object’s name. In a browsing interface, users merely need to recognize the appropriate term.
Designers of information organization systems have long grappled with the ambiguities of language.
Synonymy, polysemy, and so on complicate the goal to collocate, or bring together, like items in an information system.
In LIS, vocabulary control is similar to
Furnas’s idea of aliasing: concepts are associated with their synonyms.
One term is designated as preferred: this is the term used in a display. Other labels associated with the concept are used in searching.
Example: Search Nordstrom.com for “frock” and get “dresses” instead.
Preferred term: bathroom
Equivalent terms: restroom, loo, toilet, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room, ladies room, ladys room, lady’s room, ladie’s room, ladys’ room...
Similar concepts may be treated as equivalents; this is a design decision by the vocabulary creator.
Example
Vocabulary includes this preferred term: Beer
These terms are designated as equivalents: ale, porter, stout, pilsner, bock, IPA, malt liquor, barley wine.
Polysemous terms are often identified by adding qualifying terms in parentheses.
Mercury (chemical element)
Mercury (god in Greek mythology)
Search engines may use ask users to select the sense they want.
Library catalogs have three traditional access points: author, title, and subject. In the old card catalog, these were the three ways that users could search.
Each of these access points has associated vocabulary control.
In library cataloging, controlled vocabularies for authors, titles, and subjects are called authority files.
Authority files both disambiguate names that identify multiple people or items and group variations for the same person or item (that is, they deal with polysemy and synonymy).
In the UT author authority file: headings for
Patricia Williams:
•
Names are disambiguated by using middle initials and dates of birth.
•
Cross references are used for some authors.
•
There may still be two headings for one person.
Fun digression: Pseudonyms in the catalog
The current catalog maintains pseudonymous identities (in older catalogs, everything went under the author’s real name).
For example, “Carolyn Keene,” the name used by multiple people as the author for the Nancy
Drew novels, is maintained as an author entity in the authority file.
Thesauri are a type of controlled vocabulary that include equivalence, hierarchical, and associative relationships. Thesauri can also be faceted (that is, represent multiple aspects of a concept...we will discuss facets in depth later).
Thesauri are often developed to deal with subjects of documents, and we will talk a lot about this beginning in a few weeks.
Dark chocolate
BT Chocolate
RT Single-origin chocolate
UF Semisweet chocolate
Baker’s chocolate
Sweet chocolate
SN Chocolate without milk solids and with less than 70 percent chocolate mass.
BT: broader term, one level up in a hierarchy
RT: related term, in another facet or hierarchical branch
UF: Use for; synonyms, or nonpreferred terms
SN: Scope note; definitions or usage guidelines
The Medical Subject Headings (MeSH) index journal articles for the PubMed database.
Keyword searches in PubMed are automatically expanded with MeSH. Searches can also be explicitly limited to MeSH terms, which can increase precision.
The comparison to a system like Google Scholar is illuminating.
•
Controlled vocabularies increase precision and recall in searching by identifying equivalent terms.
•
Authority files are types of controlled vocabularies.
•
Thesauri are subject-based controlled vocabularies that include hierarchical and associative relationships in addition to equivalence relationships. Thesauri can also be used as browsing interfaces.