It does not get more amusing the further you go in, as a book should. Yet sadly, I still have to summaries this book and continue to press on to read it.
Chapter 8: Search Systems
Why is does a website need search? Any site's main goal is to keep a user on the site as long as possible and give them the right information a user is looking for.It is extremely hard to find information on a website if proper search systems are not put into place. Imagine trying to build a house from scratch, and you are trying to find where to drill the holes in the lumber, but have no blueprints.
There are many variables to consider into making a good search engine:
- Determining Search Zones - Search zones are subset of a website that have been indexed separately from the rest of the site's content. These zones eliminate content that are irrelevant to a user's needs. (E.g. When a user searches a search zone, they have identified that they are interested in that particular information). Most websites contain two types of pages:
- Navigation - contain main pages, search pages, and pages that help you browse a site.
- Destination - contain the actual information the user wants.
- Search zones can index information a number of ways:
- Indexing for specific audiences
- Indexing by topic
- Indexing recent content
Search Algorithms are under the skin of search engines, used to help find information. There are 40 different algorithms, but the book only describes a few:
- Pattern-Matching Algorithms
- Recall - Best used for finding quick, precise documents related to search.
- Precision - Best used for finding all documents related to search.
- Cited By - What other papers cite this one?
- Active Bibliography (related documents) - This paper cites others in its bibliography implying a similar type of shared relevance.
- Similar Documents Based on Text - Documents are converted into queries automatically and are used to find similar documents.
- Related Documents from Co-citation - Co-citation assumes that if documents appear together in the bibliographies of other papers, they probably have something in common.
Query builders also affect the result of a search. They are tools to advance a query's performance. Common one's are:
- Spell-checkers
- Phonetic tools - Used when searching for a name.
- Stemming tools - Used to find documents of the search with variant terms. (e.g. Lodge, lodger, lodging)
- Natural language processing tools - Used for syntactic nature of a query. Such as terms as "How to" or "Who is".
- Controlled vocabularies and thesauri - Used to expand the semantic nature of a query by automatically including synonyms within the query.
Listing these results can be ordered in many ways, by:
- Alphabet
- Chronology
- Ranking by Revelance
- Ranking by Popularity
- Ranking by Users'/Experts' ratings
- Ranking by Pay-For-Placement (PFP)
Designing the search interface poses many questions to be answered:
- Level of searching expertise and motivation
- Type of information need
- Type of information being searched
- Amount of information being searched
When designing the search box, we must consider searching options for the user, since it is being made for them. Some of these include: allowing the query to work without AND, OR, NOT, etc., typing in a term that describes the search, the query will search the entire site, and many more examples.
Some searches give an "advanced search" option, which not only give the user the insight of the functionality of the search engine, but gives them many perimeters to work with.
For helping a user hone in on their search, there are many techniques:
- Repeating the search in the results page
- Explaining where the results have come from
- Explain what the user has done
- Integrating searching with browsing
Chapter 9: Thesauri, Controlled Vocabularies, and Metadata
A single link on a page can simultaneously be part of the site’s structure, organization,
labelling, navigation, and searching systems. It’s useful to study these systems separately, yet it is crucial to understand how they interact.
Metadata - In data processing, meta-data is definitional data that provides information about or documentation of other data managed within an application or environment. For example, meta-data would document data about data elements or attributes (name, size, data type, etc.) and data about records or data structures (length, fields, columns, etc.) and data about data (where it is located, how it is associated, ownership, etc.). Meta-data may include descriptive information about the context, quality and condition,
or characteristics of the data. (e.g. <meta name="keywords" content="information architecture, content management, knowledge management, user experience">)
Controlled Vocabularies - come in many different forms. Vaguest form, a controlled vocabulary is any defined subset of natural language. Simplest form, a controlled vocabulary is a list of equivalent terms in the form of a synonym ring, or a list of preferred terms in the form of an authority file.
- Synonym ring - connects a set of words that are defined as equivalent for the purposes of retrieval.
- Authority file - is a list of preferred terms or acceptable values. Authority files have traditionally been used largely by libraries and government agencies to define the proper names for a set of entities within a limited domain. They are synonym rings in which one term has been defined as the preferred term or acceptable value.
A Synonym Ring |
An Authority File |
- Classification Schemes - is used to mean a hierarchical arrangement of preferred terms. (e.g. Dewey Decimal Classification (DDC))
- Thesaurus - Different from a common thesaurus, this one is integrated within a web site or intranet to improve navigation and retrieval, shares a common heritage with the familiar reference text but has a different form and function. Like the reference book, this thesaurus is a semantic network of concepts, connecting words to their synonyms, homonyms, antonyms, broader and narrower terms, and related terms.
Example of a Thesaurus |
- Classic - used at the point of indexing and at the point of searching. Indexers use the thesaurus to map variant terms to preferred terms when performing document-level indexing.
- Indexing - used when able to perform controlled vocabulary indexing, but not able to improve the work to the point of searching and mapping users’ variant terms to preferred terms. This is a has some weaknesses to it.
- Search - uses a controlled vocabulary at the point of searching but not at the point of indexing. This is used when dealing with third-party content or dynamic information that is changing every day or there is so much content that manual indexing costs would be astronomical.
What sets a thesaurus apart from the simpler controlled vocabularies is its large array of semantic relationships:
- Equivalence - employed to connect preferred terms and their variants.
- Hierarchical - divides up information space into categories and subcategories, relating broader and narrower concepts through the familiar parent-child relationship.
- Associative - the trickiest, and by necessity is usually developed after a good start on the other two relationship types. In thesaurus construction, associative relationships are often defined as strongly implied semantic connections that aren't captured within the equivalence or hierarchical relationships. (e.g. hammer & nail, or straw & milkshake).
Proper terminology is critical. The following are some aspects of terminology:
- Term Form - Defining the form of preferred terms is extremely difficult. Some questions that come up when selecting the for are "Use a noun or a verb?", "What is the correct spelling?", "Can an abbreviation be a preferred term?", etc.
- Term Selection - Selection of a preferred term involves not only the form, but the right term to work with in the first place. (e.g. Literary warrant (occurrence of terms in documents) is the guiding principle for selection of the preferred (term).)
- Term Definition - The right definition has to be recognised. (e.g. Cells [biology] or Cells [Prison])
- Term Specificity - Some terms might be recognised in many terms or just one, so it is critical to recognise it what they are to the user. (e.g. "Knowledge Management Software" could be broken down into many terms. It could be seen as "Software", "Knowledge of Software", or even as "Programs run by Computer".
- Polyhierarchy - When a term can be cross referenced among more than one hierarchy tree. (e.g. A "frog" and "toad" can both fit into being the top in the hierarchy tree for "characteristics of an amphibian".
- Preferred Term (PT)
- Variant Term (VT)
- Broader Term (BT)
- Narrower Term (NT)
- Related Term (RT)
- Use (U)
- Used For (UF)
- Scope Note (SN)
- Topic
- Product
- Document type
- Audience
- Geography
- Price
Chapter 12: Design and Documentation
Communicating Visually
As ideas are put to paper, it can be scary to realize there’s no going back. The project is now actively shaping what will become the user experience. Fears and discomforts will be diminished if time and resources have been implemented to do the research to help provide develop a strategy. Sometimes projects are pushed straight into design (which is quite common) which gives the project team the uneasy use of their intuition and "gut" instinct.
As ideas are put to paper, it can be scary to realize there’s no going back. The project is now actively shaping what will become the user experience. Fears and discomforts will be diminished if time and resources have been implemented to do the research to help provide develop a strategy. Sometimes projects are pushed straight into design (which is quite common) which gives the project team the uneasy use of their intuition and "gut" instinct.
There is no ideal solution for diagramming information architecture, let alone an agreed upon set of diagrams to work with. The field of information architecture is simple too young at this point in time to have a set criteria, but there are some strategies to help present it.
Communicating visually is usually the best way to show the content components and their connections with each other. These are called blueprints. They show relationships between pages and other content components to visually show organisation, navigation, and labelling systems.
These "blueprints" are usually created with a top-down approach, starting with the main page, and spreading through the website and its components. The best way to help clients and team members to understand a blueprint is to keep it simple (providing a legend for example), detail pages and content (with a unique identification number to link it to detailed documents relating to that content), and organising them (sometimes a blueprint is to large to be represented on one page, so splitting the blueprint up into multiple blueprints that relate is a good method).
A simple website blueprint. |
A visual design with the use of a wireframe implemented. |
Content mapping is where top-down approach meets bottom-up. The process breaking down or combining existing content into chunks that are useful for inclusion in your site. A content chunk is not necessarily a sentence, a paragraph, or a page. It is the most finely grained portion of content that merits or requires individual treatment. It could be information relating to the search engine and information organised which is categorised into a list for developer's and user's use.
Content models are “micro” information architectures made up of small chunks of interconnected content. Content models support the critical missing piece in so many sites: contextual navigation that works deep within the site. Why a missing piece? Because it’s too easy for an organization to accumulate blobs of content, but extremely difficult to link those blobs together in a useful way.
Controlled Vocabularies
There are two primary types of work products associated with the development of controlled vocabularies:
- Metadata matrixes that facilitate discussion about the prioritization of vocabularies
- An application that enables you to manage the vocabulary terms and relationships.
An information architect’s job is to help define which vocabularies should be developed, considering priorities, time, and budget constraints.
Example of a Metadata Matrix |
After all these design and documentation concepts have been developed, other stakeholders involved in the site (visual designers, developers, content authors, or managers) will be collaborating together more frequently. It is the most challenging step in design, since everybody wants their own ideas to play a role in the final product. Because of this, there are often competing vocabularies and breakdowns in communication.
The best course of action is for everyone goes in with an open mind, and collaborate together. This gives a shared vision that is more satisfying that a personal one.
Design sketches are an inventive approach to collect knowledge of multiple teams in a project as a first attempt at interface design towards "top-level" pages for a website.
The process is quite simple. Wireframes as a guide, the designer can sketch pages of the site on paper. As the design sketches each page, questions arise from other members that must be discussed. It is very cheap and fast approach compared to creating HTML pages with graphics.
Prototypes are used later after the base design has been agreed upon and all questions asked. They show how the site will look and function, they are concrete and aesthetically appealing. Another benefit of a prototype is that they can show unseen problems/properties related to information architecture.
Point-of-Product IA
As an information architect, it is their job to be actively involved to make sure the architecture is implemented according to plan and to address any problems that arise. Many decisions must be made during production. Some include:
- Are these content chunks small enough that we can group them together on one page, or should they remain on separate pages?
- Should we add local navigation to this section of the site?
- Can we shorten the label of this page?
Answers to these questions may impact a burden on the production team as well as the usability of the
website. An information architecture needs to balance requests of the client with the sanity of the production team, the budget and timeline, and their vision for the information architecture of the website. They shouldn’t need to make major decisions about the architecture during production because these should have already been made.