Information Retrieval models (IR). Formal characterization of IR models. Classical and advanced IR. IR evaluation measures. Query language and query operations. Standard representation of documents and metadata. Index system. Web search engine.
R. Baeza-Yates, B. Ribeiro-Neto, "Modern Information Retrieval" Addison Wesley
Learning Objectives
Knowledge acquired:
The course aims to provide the main knowledges about IR. Particular attention will be paid on: techniques for the search of information on the Web, Web search engine creation, information gathering techniques and index information, standard representation of documents.
Competence acquired:
Comprehension of information retrieval on the Web and ranking models. Knowledge about XML languages for document representation. Models for document indexing.
Skills acquired (at the end of the course):
Information relevance measure with respect to users' information needs. Introduction to Semantic Web document standards.
Prerequisites
None
Teaching Methods
CFU: 6
Total hours of the course (including the time spent in attending lectures, seminars, private study, examinations, etc...): 150
Hours reserved to private study and other indivual formative activities: 102
Frequency of lectures, practice and lab: Recommended
Teaching Tools
UniFi E-Learning: http://e-l.unifi.it
Office Hours:
Prof. Francesconi
By appointmemt
E_mail: enrico.francesconi@unifi.it
Type of Assessment
Oral
Course program
Introduction
Motivations. Basic concepts. The process of retrieval.
Modelling
Information Retrieval (IR) models. Types of Retrieval. Formal characterization of IR models. Classical IR: basic concepts, Boolean Model, Vector Model, Probabilistic Model. Comparison of models. Fuzzy Set Models, Extended Boolean, Generalized Vector Space, Neural Networks.
Evaluation of Retrieval
Retrieval evaluation measures: Precision and Recall. Alternative measures.
Query Language
Keyword-Based Querying, Single-Word Queries, Context Queries, Boolean Queries, Natural Language Query. Pattern Matching. Structural Queries. Query Protocols.
Query operation
User Relevance Feedback. Query Expansion and Term Reweighting for the Vector Model. Term Reweighting for the Probabilistic Model. Automatic Local Analysis: Query Expansion Through Local Clustering. Query Expansion Through Local Context Analysis. Automatic Global Analysis: Query Expansion based on a similarity thesaurus, Query Expansion based on a statistical thesaurus.
Languages for content representation.
Metadata. Text: formats, information theory, natural language modelling, model similarity. Markup Languages: SGML, HTML, XML.
Text Operations
Document analysis: lexical analysis, stopwords, stemming, selection of terms for indexing, thesauri. Clustering of documents. Text compression.
Indexing and Searching
Inverted files. Other indexing techniques. Boolean Queries. Sequential searching. Pattern Matching. Structural Queries. Compression.
Distributed IR
Case study: The NIR Project. XML and URN. Applications.
Search on the Web
Characterization of Web. Search Engines. Browsing. Meta Engines. Web Query Languages and Software Agents.