Blog for Class IS2140: Unit 6 Muddiest Points and Reading notes

Muddiest Point:

I am fine with this class,and I do not have the muddiest point.

Reading Notes:

8.1 Information retrieval system evaluation

In this chapter, the author measured the effectiveness of IR systems

8.2 Standard test collections

Here is a list of the most standard test collections:

The Cranfield collection: precise but too small

Text Retrieval Conference (TREC): topics and specified in detailed text passages. Largest and the topics are more consistent

GOV2: the largest Web collection easily available for research purposes.

NTCIR: Be focused on East Asian language and cross-language information retrieval,

CLEF: Be concentrated on European languages and cross-language information retrieval.

REUTERS: Its scale and rich annotation makes it a better basis for future research.

20 NEWSGROUPS: It consists of 1000 articles from each of 20 Usenet newsgroups (the newsgroup name being regarded as the category).

8.3 Evaluation of unranked retrieval sets

precision and recall

A single measure that trades off precision versus recall is the F measure, which is the weighted harmonic mean of precision and recall:

说明: Macintosh HD:Users:danyangli:Desktop:Screen Shot 2014-02-11 at 7.36.25 PM.png

8.4 Evaluation of ranked retrieval results

Entire precision-recall curve

The traditional way to test the entire precision-recall curve is 11-point interpolated average precision.

Other measures have become more common :Mean Average Precision (MAP)

measuring precision at fixed low levels of retrieved results, such as 10 or 30 documents. This is referred to as “Precision at k”,

R-PRECISION

BREAK-EVEN POINT

ROC CURVE

SENSITIVITY

SPECIFICITY

NDCG

8.5 Assessing relevance

It discussed developing reliable and informative test collections。

8.6 A broader perspective: System quality and user utility

Another systems aspects that allow quantitative evaluation and the issue of user utility.

System issues: All the criteria apart from query language expressiveness are straightforwardly measurable: we can quantify the speed or size.

User utility: quantifying aggregate user happiness, based on the relevance, speed, and user interface of a system

Refining a deployed system

The most common version of this is A/B testing，The basis of A/B testing is running a bunch of single variable tests

8.7 Results snippets

static and dynamic

A static summary is generally comprised of either or both a subset of the document and metadata associated with the document

Dynamic summaries display one or more “windows” on the document, aiming to present the pieces that have the most utility to the user in evalu- ating the document with respect to their information need.

Blog for Class IS2140

Friday, February 14, 2014

Unit 6 Muddiest Points and Reading notes

Reading Notes:

No comments:

Post a Comment