Thursday, January 30, 2014

Unit4 Muddiest Point and Reading Notes

1.Muddiest Point:
How does the Delta coding works?


 2. Reading Notes

1.3 Processing Boolean queries
1. Locate Brutus in the Dictionary

2. Retrieve its postings

3. Locate Calpurnia in the Dictionary

4. Retrieve its postings

5. Intersect the two postings lists, a

1.4The extended Boolean model versus ranked retrieval
A proximity operator is a way of specifying that two terms in a query must occur close to each other in a document,

6.1Weight zone scoring
The weighted zone score is defined to be
gisi. i=1
Weighted zone scoring is sometimes referred to also as ranked Boolean re- trieval.
6.2 Term frequency Weighting
We can refine this idea so that we add up not the number of occurrences of each query term t in d, but instead the tf-idf weight of each term in d.
Score(q, d) = tf-idft,d.
6.3 The vector space model for scoring

sim(d1, d2) = v(d1) ·v(d2).
The vector space model for scoring
The resulting scores can then be used to select the top-scoring documents for a query. Thus we have
score(q, d) = V (q) · V (d) . |V (q)||V (d)|

6.4 Variant tf-idf functions
A common modification is
SMOOTHING
6.4 Variant tf-idf functions 127
to use instead the logarithm of the term frequency, which assigns a weight given by
wft,d = 􏰍 1+logtft,d if tft,d > 0 . 0 otherwise
In this form, we may replace tf by some other function wfto obtain:
wf-idft,d = wft,d × idft.

ntft,d =a+(1a) tft,d , tfmax(d)


No comments:

Post a Comment