Blog for Class IS2140: Unit1 Muddiest point, Unit2 Reading Notes

Unit 1 : Muddiest point:

I am a little confused about the “Whole View of System Oriented IR”，what is the role of the index processing ,and why it must have , I think retrieval and ranking process with queries is enough.

Unit2 Reading Notes

Section1.2: A first take at building an inverted index.

It introduces the major steps in inverted index construction, and we should know:

1. How to draw the inverted index that built for some document collections?

2. How to draw the term-document incidence matrix for some document collections? And how to draw the inverted index representation for this collection.

Chapter2: The term vocabulary and posting lists

2.1. How the basic unit of a document can be defined and how the character sequence that it comprises is determined?

2.2. How to determine the term vocabulary（tokenization, stop words, normalization and stemming and lemmatization ）

2.3. It further explores how to use posting list data structure and increase the efficiency of using it. (Skip list, if an index is static)

2.4. Biword indexed, positional indexes, combination schemes( I am not get the point of this section)

Chapter 3 Dictionaries and tolerant retrieval

3.1. Finding the data structure to help search for terms in the vocabulary in an inverted index (hash or search tree)

3.2. A idea about “Wildcard query” (such as *a*e*i*o*u*, which seeks documents containing any term that includes all the five vowels in sequence.

3.3. Some techniques to solve the spelling error in queries.

Two steps to solve the spelling error: edit distance and k-gram overlap

Two basic principles are for the spelling correction algorithms.

Two form of spelling corrections: isolated-term, context-sensitive.

Two techniques for addressing isolated-term correction: edit distance and k-gram overlap.

3.4. Phonetic correction: generate a “phonetic hash”

Blog for Class IS2140

Friday, January 10, 2014

Unit1 Muddiest point, Unit2 Reading Notes

No comments:

Post a Comment