Wikimarks

Reference evaluation results of a range of baselines.

Also, see information on tasks and how wikimarks are derived from Wikipedia articles.

Evaluation Results for Reference Baselines

Results

Results that are continuously updated in this google sheet

simple en
benchmarkY1.train benchmarkY1.test benchmarkY1.train benchmarkY1.test
Paragraph Retrieval [MAP]
bm25 0.31+/-0.04 0.29+/-0.03 0.097+/-0.01 0.094+/-0.01
bm25-rm3 0.29+/-0.04 0.26+/-0.03 0.107+/-0.01 0.101+/-0.01
QL-rm3 0.25+/-0.04 0.20+/-0.02 0.084+/-0.01 0.076+/-0.01
Entity Ranking [MAP]
page-bm25 0.03+/-0.005 0.038+/-0.007 0.025+/-0.002 0.026+/-0.003
page-bm25-rm3 0.05+/-0.007 0.048+/-0.007 0.037+/-0.003 0.038+/-0.004
paragraph-bm25-ECM 0.23+/-0.03 0.253+/-0.021 0.215+/-0.01 0.21+/-0.01
Cluster [Adj. RAND]
TF-IDF agglomerative 0.16+/-0.06 0.27+/-0.07 0.15+/-0.01 0.16+/-0.01
TF-IDF kmeans 0.13+/-0.01 0.12+/-0.01 0.11+/-0.04 0.19+/-0.05
SBERT kmeans 0.38+/-0.09 0.38+/-0.09 0.23+/-0.02 0.19+/-0.01
Entity Linking [Paragraph-macro F1]
WAT 0.44+/-0.01 0.42+/-0.01 0.332+/-0.004 0.310+/-0.003

Baselines

Passage and Entity Retrieval

Baseline implementations are based on Lucene, with code provided online.

Baselines for passage retrieval

bm25:

Lucene’s BM25 method.

bm25-rm3:

RM3 query expansion, then retrieve with BM25.

QL-rm3:

RM3 query expansion, then retrieve with Lucene’s Dirichlet-smoothed query likelihood.

Baselines for entity retrieval

page-bm25:

Retrieving Wikipedia pages via BM25.

page-bm25-rm3:

RM3 query expansion, then retrieving pages with BM25.

paragraph-bm25-ECM:

Retrieving paragraphs with BM25, then ranking entities linked in these paragraphs with the entity context model (ECM).

Clustering

Based on default implementations in scikit.learn for TF-IDF, agglomerative clustering, and K-means clustering. We use packages sklearn.feature_extraction.text and sklearn.cluster in scikit.learn version 1.0.2

TF-IDF agglomerative:

Each paragraph is represented as a TF-IDF vector, then using agglomerative clustering with Euclidean distance.

TF-IDF kmeans:

TF-IDF paragraph representation, then using K-means clustering.

SBERT kmeans:

Using Sentence-BERT paragraph representation (using ), then using K-means clustering.

Sentence-BERT [@reimers2019sentence] is a BERT-based embedding model trained for clustering sentences. We are using the bert-base-uncased version provided by the authors.

Entity Linking

We provide reference results for entity linking with the WAT entity linker [@piccinno2014wat] using its default configuration.