Skip down to main content

Data Mining and Information Extraction for CiteSeerX and Friends

Date & Time:
12:00:00 - 13:30:00,
Friday 22 June, 2012

About

Cyberinfrastructure or e-science has become crucial in many areas of science where data access often defines scientific progress. Open source (OS) systems have greatly facilitated the design and implementation and support of cyberinfrastructure, thereby permitting the design of specialized integrated search engines and digital libraries which offer many opportunities for domain relevant information and knowledge extraction, (such as citation extraction, automated indexing and ranking, chemical formulae search, table indexing).

In this talk, Professor Giles will describe the open source SeerSuite architecture which is a modular, extensible system built on successful OS projects such as Lucene/Solr and will discuss issues in building domain specific enterprise cyberinfrastructure for the sciences and academia. Because of the large amount of information crawled, many problems arise in information extraction and data mining such as author and entity disambiguation, data extraction and ranking. Professor Giles will highlight application domains with examples from computer science, CiteSeerX, and chemistry, ChemXSeer and related problem areas.

As such enterprise systems require unique information extraction approaches, several different machine learning methods (such as conditional random fields, support vector machines, mutual information based feature selection, sequence mining) are critical for performance. Professor Giles will draw lessons for other e-science and cyberinfrastructure systems in terms of design and implementation, and discuss future directions, systems and research.

Data Dump to delete

Speakers

  • Name: Professor C. Lee Giles
  • Affiliation: David Reese Professor, College of Information Sciences and Technology,
    Pennsylvania University
  • Role:
  • URL: http://clgiles.ist.psu.edu/
  • Bio:

Papers

Privacy Overview
Oxford Internet Institute

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies
  • moove_gdrp_popup -  a cookie that saves your preferences for cookie settings. Without this cookie, the screen offering you cookie options will appear on every page you visit.

This cookie remains on your computer for 365 days, but you can adjust your preferences at any time by clicking on the "Cookie settings" link in the website footer.

Please note that if you visit the Oxford University website, any cookies you accept there will appear on our site here too, this being a subdomain. To control them, you must change your cookie preferences on the main University website.

Google Analytics

This website uses Google Tags and Google Analytics to collect anonymised information such as the number of visitors to the site, and the most popular pages. Keeping these cookies enabled helps the OII improve our website.

Enabling this option will allow cookies from:

  • Google Analytics - tracking visits to the ox.ac.uk and oii.ox.ac.uk domains

These cookies will remain on your website for 365 days, but you can edit your cookie preferences at any time via the "Cookie Settings" button in the website footer.