Formal characterization of the Information Retrieval (IR) problem. IR models: Boolean, Vector and Probabilistic; Fuzzy Set, Extended Boolean, Generalized Vector Space models. Retrieval evaluation. Query languages. Pattern Matching. Query Protocols. Query operations: User Relevance Feedback, Query Expansion and Term Reweighting. Standards of document representation and metadata. Document indexing. Semantic Web principles.
R. Baeza-Yates, B. Ribeiro-Neto, "Modern Information Retrieval" Addison Wesley
Learning Objectives
This course aims to provide the basic knowledge about Information Retrieval models. Particular attention is given to the techniques for information discovery on the Web, search engine creation, information gathering and indexing techniques, document standards.
Motivations. Basic concepts. The Information Retrieval process.
Modeling
Information Retrieval (IR) models. Formal characterization of the Information Retrieval (IR) problem. IR models: Boolean Model, Vector Model, Probabilistic Model. Modelli Fuzzy Set, Extended Boolean, Generalized Vector Space.
IR evaluation
Precision and Recall. Alternative measures.
Query Language
Keyword-Based Querying, Single-Word Queries, Context Queries, Boolean Queries, Natural Language Query. Pattern Matching. Structural Queries. Query Protocols.
Query operations
User Relevance Feedback. Query Expansion and Term Reweighting for the Vector Model. Term Reweighting for the Probabilistic Model. Automatic Local Analysis: Query Expansion Through Local Clustering. Query Expansion Through Local Context Analysis. Automatic Global Analysis: Query Expansion based on a Thesaurus of keywords, Query Expansion based on a statistic Thesaurus.
Standards for content representation
Metadata. Texts: formats, information theory, natural language modeling and processing. Markup languages: SGML, HTML, XML.