Information Retrieval models (IR). Formal characterization of IR models. Classical and advanced IR. IR evaluation measures. Query language and query operations. Standard representation of documents and metadata. Index system. Web search engine.
R. Baeza-Yates, B. Ribeiro-Neto, "Modern Information Retrieval" Addison Wesley
Learning Objectives
Knowledge:
The course aims to provide the key knowledge about IR. Particular attention is given to the techniques for information searching on the Web, Web search engine creation, information gathering techniques and indexing, document standards.
Skills:
Information retrieval on the Web and ranking models. Languages for document representation. Models for document indexing.
Acquired skills (at the end of the course):
Information relevance measure with respect to users' information needs. Introduction to Semantic Web document standards.
Further information
Office hours:
Prof. Francesconi
by appointment
E_mail: francesconi@ittig.cnr.it
Type of Assessment
Oral
Course program
introduction
ReasonsMotivations. Basic concepts. The process of retrieval.
Mmodeling
Models of Information Retrieval (IR). Types of Retrieval. Formal characterization of IR models.
IR classical concepts, Boolean Model, Vector Model, Probabilistic Model. Comparison of models. Fuzzy Set Models, Extended Boolean, Generalized Vector Space, Neural Networks.
Evaluation of Retrieval
Measures for evaluating the Retrieval: Precision and Recall. Alternative measures.
Qquery Language
Keyword-Based Querying, Single-Word Queries, Context Queries, Boolean Queries, Natural Language Query. Pattern Matching. Structural Queries. Query Protocols.
Query operationOperations on query
User Relevance Feedback. Query Expansion and Term Reweighting for the Vector Model. Term Reweighting for Probabilistic Model. Automatic Local Analysis: Query Expansion Through Local Clustering. Query Expansion Through Local Context Analysis. Automatic Global Analysis: Query Expansion based on a similarity thesaurus', Query Expansion based on a Statistical Thesaurus.
Languages for contente rRepresentation languages content
Metadata. Text formats, information theory, modeling of natural language, models of similarity '. Markup Languages: SGML, HTML, XML.
Text processingWorking With Text
Analysis of documents: lexical analysis, stopwords, stemming, selection of terms for indexing, thesauri. Clustering of documents. Text compression.
Indexing and Searching
Inverted files. Other mode 'indexing. Boolean Queries. Sequential searching. Pattern Matching. Structural Queries. Compression.
Ddistributed IR
Case study: The NIR project. XML and URN. Applications.
Search the Web dell'infomazione
Characterization of Web search engines. Browsing. Meta. Web Query Languages and Software