News:

Welcome to the new (and now only) Fora!

Main Menu

How do search engines and database work?

Started by mittens, January 08, 2020, 12:52:55 AM

Previous topic - Next topic

mittens

I am a new person, even though I have read the forum many times over the years that I have been an academic and have learned a great deal.

I have a question about academic search engines and library databases that I haven't seen in previous discussion threads.
What are the metrics by which academic databases and library catalogues order the list of search findings? I know that it can't be based on the date of publication.

I am asking because I am mid-career and applying for a job. The published research work that I am most excited about seems to be buried much later in the list, while things that I would consider inconsequential rise as the number 1 hit on the search. I don't want to bury the lead; I want to optimize the search results so that my recent work will rise to the top of the search list. Any insights into this technical question?

This question is about library databases and catalogues-- not really about google.
Thanks!

Puget

This question doesn't really make sense to me-- in every library database I've ever encountered the user controls the sort order on the dimension they choose-- author, date, title, relevance, etc.

Maybe you are asking in particular about how "relevance" is determined? That really only comes into play when you search by keyword, and while I'm sure it is going to vary across systems it is going to involve how often the keywords occur in the title, abstract, keywords, or sometimes full text of the document, and potentially the number of citations. Other than writing clear descriptive titles and abstracts using the standard terminology for your field in the first place, there's not really any way to "optimize" that, since it will depend on how each user searches.

I very much doubt however that search committees will be looking up your publications in their library database and using their order to decide whether to hire you. Your research statement is the place to describe them and their importance in context, and they will of course be listed on your CV. You can also create a google scholar profile (which by default will order them by number of citations, but users can re-sort by date or title).
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

Myword

I was a research librarian, assisting students with databases. It depends much on which database you search. Different strategies are needed for each one. It sounds like relevance is most important, yet the databases do not know what relevance means to you. Each has its own method for determining that, based on title, abstract and keywords.  The whole methodology of databases and library catalogs is flawed and often not developed by strong subject specialists. Loads of false positives, material irrelevant, or they utilize a term in a different definition than you are. Such as "induction" that is ambiguous. It takes much patience and time for some subjects. Good luck. Subject specific databases are easier to use.

mittens

Thanks for your posts.

Sorry, I think I misused the word because I don't understand the mechanics of the technology. I am interested in understanding how 'relevance' is determined.

I was curious because when I enter my name (first middle last) the list of items is such that my less important work comes right to the top!

I wondered if the 'relevance' was determined by what people read/opened the most? Otherwise, why is the listing not just ordered by date?

My question was really asking more about entering in a person's full name into a database search.

Thanks.

mamselle

So, you really want to know about 'impact,' I suspect.

I think there are online explanations about that, if you Google the word with "citations," "readership," or "publications."

M.
Forsake the foolish, and live; and go in the way of understanding.

Reprove not a scorner, lest they hate thee: rebuke the wise, and they will love thee.

Give instruction to the wise, and they will be yet wiser: teach the just, and they will increase in learning.

trebond98

The research area you're looking for is "bibliometrics" if you want to know more about citations and impact but if you want to know how LIS thinks about relevance, the classic work is Relevance: A review of and a framework for the thinking on the notion in information science by Tefko Saracevic in the Journal of the American Society for information science, 1975.

mbelvadi

Quote from: mittens on January 08, 2020, 09:55:22 AM
Thanks for your posts.

Sorry, I think I misused the word because I don't understand the mechanics of the technology. I am interested in understanding how 'relevance' is determined.

I was curious because when I enter my name (first middle last) the list of items is such that my less important work comes right to the top!

I wondered if the 'relevance' was determined by what people read/opened the most? Otherwise, why is the listing not just ordered by date?

My question was really asking more about entering in a person's full name into a database search.

Thanks.
Often, the specific algorithm for determining relevance ranking is considered a proprietary/trade secret by the vendor.  You can typically assume that if your name matches in the title or subject/thesaurus headings, those will be highly ranked, and there's often additional factors in the formula for recency of pub date and length of the work (longer gets higher relevance).  If you're not specifying the author field along with your name, you may find your name in author is lower relevance, because in most user keyword searches, people don't want to match on the author, they want the topic. This makes more sense of you think about names that are also regular words.  Most library-licensed databases are very likely NOT ranking based on the past selections by other patrons - that's a very e-commerce concept, not a research one. Google Scholar is one of the few academic search databases that has "cited by" data so they likely also weigh that in, but they are the most hush-mouthed about their algorithm.  And as another respondent said, it's going to vary a lot by vendor.  If you name the particular platform vendor (your ILS? Alma? III?  EBSCO? Proquest? JSTOR?) we (librarians) might be able to tell you a bit more based on our experience with that platform and we actually do talk to the vendors about this sometimes too.