Skip down to main content

Search results for historical material

Published on
22 Oct 2014

This is a guest post by Jaspreet Singh, a researcher at the L3S Research Center in Hanover. Jaspreet writes:

When people use a commercial search engine to search for information, they represent their intent using a set of keywords. In most cases this is to quickly look up a piece of information and move on to the next task. For scholars however, the information intent is usually very different from the casual user and often hard to express as keywords. The fact that the advanced query feature of the BL’s web archive search engine is quite popular is strong evidence to suggest this.

By working closely with scholars though we can gain better insights into their search intents and design the search engine accordingly. In my master thesis I focus specifically on search result ranking when the user search intent is historical.

Let us consider the user intent, ‘I want to know the history of Rudolph Giuliani, the ex-mayor of New York City’. We can safely assume that history refers to the important time periods and aspects of Rudolph Giuliani’s life. The user would most likely input the keywords ‘rudolph giuliani’ and expect to see a list of documents that give him a general overview of Giuliani’s major historically relevant facts. From here the user can modify his query of filter the results using facets to dig deeper into certain aspects. A standard search engine however is unaware of this intent. It only receives keywords as input and tries to serve the most relevant documents of the user.

At the L3S Research Center we have developed a prototype search engine specifically for historical search intents. We use temporal and aspect based search result diversification techniques to serve users with documents which cover a topic’s most important historical facts within the top n results. For example, when searching for ‘rudolph giuliani’ we try to retrieve documents that cover his election campaigns, his mayoralty, his run for senate and his personal life so that the user gets a quick gist of the important facts. Using our system, the user can explore the results by time using an interactive timeline or modify the query. The prototype showcases the various state of the art algorithms used for search diversification as well as our own algorithm, ASPTD. We use the New York Times 1987-2007 news archive as our corpus of study. In the interface we present only the top 30 results at a time.

In the future, we plan to test our approach on a much larger news archive like the 100 year London Times corpus. We also intend to strengthen the algorithm to work with web archives and work with the BL to integrate such methods in the current BL web archive search system so that users can explore the archive better.

Link to the system: http://pharos.l3s.uni-hannover.de:7080/ArchiveSearch/starterkit/.

Privacy Overview
Oxford Internet Institute

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies
  • moove_gdrp_popup -  a cookie that saves your preferences for cookie settings. Without this cookie, the screen offering you cookie options will appear on every page you visit.

This cookie remains on your computer for 365 days, but you can adjust your preferences at any time by clicking on the "Cookie settings" link in the website footer.

Please note that if you visit the Oxford University website, any cookies you accept there will appear on our site here too, this being a subdomain. To control them, you must change your cookie preferences on the main University website.

Google Analytics

This website uses Google Tags and Google Analytics to collect anonymised information such as the number of visitors to the site, and the most popular pages. Keeping these cookies enabled helps the OII improve our website.

Enabling this option will allow cookies from:

  • Google Analytics - tracking visits to the ox.ac.uk and oii.ox.ac.uk domains

These cookies will remain on your website for 365 days, but you can edit your cookie preferences at any time via the "Cookie Settings" button in the website footer.