Formal characterization of the Information Retrieval (IR) problem. IR models: Boolean, Vector and Probabilistic; Fuzzy Set, Extended Boolean, Generalized Vector Space models. Retrieval evaluation. Query languages. Pattern Matching. Query Protocols. Query operations: User Relevance Feedback, Query Expansion and Term Reweighting. Standards of document representation and metadata. Document indexing. Semantic Web principles.
R. Baeza-Yates, B. Ribeiro-Neto, "Modern Information Retrieval" Addison Wesley
Learning Objectives
- Knowledge and understanding:
This course aims to provide the basic knowledge about Information Retrieval models. Moreover the principles and standards of the Semantic Web will be introduced.
- Applying knowledge and understanding: Particular attention is given to applying knowledge to the techniques for information discovery on the Web and to assigning a relevance value of a document with respect to a query, search engine creation, information gathering and indexing techniques, document standards.
Oral test. Intermediate tests are not foreseen. The oral test aims to verify the acquired knowledge on the IR model able to assign a relevance value with respect to a query, as well as to the assessment of the IR models.
Course program
Motivations. Basic concepts. The Information Retrieval process.
Modeling
Information Retrieval (IR) models. Formal characterization of the Information Retrieval (IR) problem. IR models: Boolean Model, Vector Model, Probabilistic Model. Modelli Fuzzy Set, Extended Boolean, Generalized Vector Space.
IR evaluation
Precision and Recall. Alternative measures.
Query Language
Keyword-Based Querying, Single-Word Queries, Context Queries, Boolean Queries, Natural Language Query. Pattern Matching. Structural Queries. Query Protocols.
Query operations
User Relevance Feedback. Query Expansion and Term Reweighting for the Vector Model. Term Reweighting for the Probabilistic Model. Automatic Local Analysis: Query Expansion Through Local Clustering. Query Expansion Through Local Context Analysis. Automatic Global Analysis: Query Expansion based on a Thesaurus of keywords, Query Expansion based on a statistic Thesaurus.
Standards for content representation
Metadata. Texts: formats, information theory, natural language modeling and processing. Markup languages: SGML, HTML, XML.