Thursday, January 16, 2014

Unit 2 muddiest point and Unit 1 reading notes

Unit 2 
Muddiest point:
1.     Why does the stemming never lower recall?
2.     In what condition does the WSD not work?


Unit 1
Reading Notes

 FOA section1.1


Introduces the “find out about (FOA)” thing, which is a cognitive activity by asking question through language or documents. In this process, coming up with the questions, answering the questions and assessing the questions play vital role in finding out about something. It also talks little about the information retrieval tradition, and the core meaning of information retrieval is “search engine”. The schematic of this search engine is as follows:
The one who has the information need has a query, when he comes up with it, the descriptive features mentioned by users in their queries and documents sharing those same features get matched by the algorithm.

ES section 1.1 and 1.2

1.1 What Is Information Retrieval?
Introduce what is the Information Retrieval, and some kinds of search method. Such as Web Search, the most popular and heavily used one, and Desktop and file system search, especially effective for files stored on a local hard disk and possibly on disks connected over a local network. There also exists Digital libraries and other specialized IR system support access to collections of high-quality material, often of a proprietary nature.
1.2 Information Retrieval Systems
It introduces the fundamental terminology and technology of the Information Retrieval System.
    1.2.1 Basic IR System Architecture
Information needsà a query to the IR systemàProcessed by a search engineàmaintaining collection statistics associated with the index
     àComputes a scoreàthe result list may be subjected to further processing
    1.2.2 Documents and Update
             Document referred to any self-contained unit that can be returned to the user as search result.
    1.2.3 Performance Evaluation
             Two measures: efficiency: response time and effectiveness: relevance (Probability Ranking Principle)

MIR sections 1.1-1.4

1.1  Information Retrieval
It introduces the early development, the specified field of libraries and digital libraries and the center stage in www filed.
1.2  The IR Problems
 The primary goal of an IR system is to retrieve all the documents that are relevant to a user query while retrieving as few non- relevant documents as possible; Two different user’s task: searching and browsing; The differences between the information and data retrieval (according to its accuracy).
1.3  The IR System
It introduces the software architecture of the IR system and the retrieval and ranking process, the same with the last reading.

1.4  The web
1.4.1 it introduces the history of the web development.
1.4.2 it introduces the popularity and its “free to publish” feather of web.
      1.4.3 Ranking and indexing components of any search engine are fundamental IR pieces of technology. There exist some major imparts of the web on search: document collection; the size of collection and the volume of user queries submitted on a daily basic; the vast size of the document collection; not only just a repository of documents and data, but also a medium to do the business; web advertising and other economic incentives (a little about the web spam)
     1.4.4 practical issues on the web: security, privacy, copyright and patent rights, and scanning, optional character recognition. 

No comments:

Post a Comment