Wikimarks
Reference evaluation results of a range of baselines.
Also, see information on tasks and how wikimarks are derived from Wikipedia articles.
Evaluation Results for Reference Baselines
Results
Results that are continuously updated in this google sheet
simple | en | |||||
benchmarkY1.train | benchmarkY1.test | benchmarkY1.train | benchmarkY1.test | |||
Paragraph Retrieval [MAP] | ||||||
bm25 | 0.31+/-0.04 | 0.29+/-0.03 | 0.097+/-0.01 | 0.094+/-0.01 | ||
bm25-rm3 | 0.29+/-0.04 | 0.26+/-0.03 | 0.107+/-0.01 | 0.101+/-0.01 | ||
QL-rm3 | 0.25+/-0.04 | 0.20+/-0.02 | 0.084+/-0.01 | 0.076+/-0.01 | ||
Entity Ranking [MAP] | ||||||
page-bm25 | 0.03+/-0.005 | 0.038+/-0.007 | 0.025+/-0.002 | 0.026+/-0.003 | ||
page-bm25-rm3 | 0.05+/-0.007 | 0.048+/-0.007 | 0.037+/-0.003 | 0.038+/-0.004 | ||
paragraph-bm25-ECM | 0.23+/-0.03 | 0.253+/-0.021 | 0.215+/-0.01 | 0.21+/-0.01 | ||
Cluster [Adj. RAND] | ||||||
TF-IDF agglomerative | 0.16+/-0.06 | 0.27+/-0.07 | 0.15+/-0.01 | 0.16+/-0.01 | ||
TF-IDF kmeans | 0.13+/-0.01 | 0.12+/-0.01 | 0.11+/-0.04 | 0.19+/-0.05 | ||
SBERT kmeans | 0.38+/-0.09 | 0.38+/-0.09 | 0.23+/-0.02 | 0.19+/-0.01 | ||
Entity Linking [Paragraph-macro F1] | ||||||
WAT | 0.44+/-0.01 | 0.42+/-0.01 | 0.332+/-0.004 | 0.310+/-0.003 |
Baselines
Passage and Entity Retrieval
Baseline implementations are based on Lucene, with code provided online.
Baselines for passage retrieval
- bm25:
Lucene’s BM25 method.
- bm25-rm3:
RM3 query expansion, then retrieve with BM25.
- QL-rm3:
RM3 query expansion, then retrieve with Lucene’s Dirichlet-smoothed query likelihood.
Baselines for entity retrieval
- page-bm25:
Retrieving Wikipedia pages via BM25.
- page-bm25-rm3:
RM3 query expansion, then retrieving pages with BM25.
- paragraph-bm25-ECM:
Retrieving paragraphs with BM25, then ranking entities linked in these paragraphs with the entity context model (ECM).
Clustering
Based on default implementations in scikit.learn
for TF-IDF, agglomerative clustering, and K-means clustering. We use packages sklearn.feature_extraction.text
and sklearn.cluster
in scikit.learn version 1.0.2
- TF-IDF agglomerative:
Each paragraph is represented as a TF-IDF vector, then using agglomerative clustering with Euclidean distance.
- TF-IDF kmeans:
TF-IDF paragraph representation, then using K-means clustering.
- SBERT kmeans:
Using Sentence-BERT paragraph representation (using ), then using K-means clustering.
Sentence-BERT [@reimers2019sentence] is a BERT-based embedding model trained for clustering sentences. We are using the bert-base-uncased
version provided by the authors.
Entity Linking
We provide reference results for entity linking with the WAT entity linker [@piccinno2014wat] using its default configuration.