Evergreen Search Discussion

Below are specific enhancments to help us reach the MassLNC Vision for Search.


Problem Areas / Possible Improvements

  • Handle apostrophe normalization appropriately for the language of the word being indexed - see How Evergreen Handles Apostrophes
  • Less aggressive stemming or alternatives to stemming to minimize hits for words unrelated to the entered search term. (e.g. puffins is stemmed to puffin, but pacificism is not stemmed to pacific). Possible solutions:
    • If a less aggressive stemming algorithm is available (e.g. plurals only), allow sites to easily identify the level of stemming that is used (aggressive stemming, light stemming, or no stemming)
    • Consider use of lemmatization or other forms of fuzzy matching instead of stemming.
    • Maintain a list of words that should not be stemmed.

What We Need to Keep

  • The ability to add or remove indexing for any MARC field to any type of keyword search (title, author, subject, keyword, etc.)
  • The ability to turn stemming on or off.
  • Entered search terms joined with an implicit AND operator


Problem Areas / Possible Improvements

  • Faster overall searching - The search results page should load with the top 10 search results within 1 to 2 seconds of performing the search. Overly broad searches should be returned within 3 seconds. See comparison of search retrieval times across systems and web sites.
  • The ability to add fields to search indexes or to adjust relevancy without ultimately seeing a performance hit caused by a proliferation of entries in metabib tables. 
  • No timeouts on broad searches like 'history.' If timeouts occur, the error message should clearly state it is a timeout issue, not that there are no records matching the search.


Problem Areas / Possible Improvements

  • Large-print, audiobook, ebook format of items often are ranked ahead of regular print titles (maybe because they were entered more recently?). We typically want regular print to rank ahead of these. 
  • With keyword searches, shorter records tend to rank higher than larger records due to coverage density settings. The end result is that older records or non-bibliographic materials (equipment, toys) rank ahead of records that are considered more relevant. 
  • Weighting for exact term matches should be stronger. 
  • The ability to identify which fields / subfields should rank more heavily in a keyword search without, at the same time, inflating metabib entries to an extent that search performance takes a hit.

What We Want to Keep

  • The ability to weight specific fields /subfields in a search. 
  • The ability to weight titles more heavily based on its activity (under development), but, depending on what we find in testing, we may want improvements in the way this metric can work with bib relevance ranking.


Problem Areas / Possible Improvements

  • Working auto-suggest that doesn't break accessibility with possible improvements.
  • "Did you mean?" functionality, based on terms used in bib records, can help lead users to more accurate search results.
  • The ability to leverage rich data in authority records to provide the user with options for expanding and refining all types of searches, including keyword searches.
  • Additional facets that aren't based in MARC indexes (format, copy location, publication date, etc.)
  • A synonym list or thesaurus where site administrators can identify word variations that can be searched along with user's entered search terms.
  • Allow users to set search preferences that should be used for all searches (e.g. always retrieve adult materials, always retrieve English materials, never retrieve eletronic materials, etc.)

What We Want to Keep

  • Existing facets based on search indexes
  • The ability to easily refine a search from the search results page or from the bib record.
  • The ability to limit by format, copy location, or any of the currently-available limiters.
  • The ability to scope a search to a copy location group, branch, system or consortium. 
  • The ability to scope electronic resource records based on the OU identified in a Located URI
  • The ability to group formats and editions in our search results so that the user's search results page shows result for a distinct title that then leads them to all formats and editions of that work.


Problem Areas / Possible Improvements

  • Default indexing by MARC instead of MODS so that site administrators can easily see and explain what is being indexed without consulting stylesheets or external documentation.
  • When system automatically changes search terms via stemming or other means (thesaurus), the system shows users how the search was changed. Ideally, user could click a link if they do not want alternate forms searched. (Example: https://www.google.com/?gws_rd=ssl#q=skullduggery+pleasant+series)
  • The ability to see relevance scores, allowing users to understand why a record ranked where it did. 
  • Highlighting of search terms, allowing users to see why a record was retrieved.


Syndicate content

Creative Commons license icon
This work is licensed under a Attribution Share Alike Creative Commons license