Wikimarks
Tools, Installation, and Usage
Data Conversion Pipeline
Our pipeline for converting Wikipedias and generating Wikimarks is found in the TREMA-UNH/trec-car-release
project. Please follow installation, configuration and usage instructions in the README
.
Sourcecode for Conversion Tools
This pipeline builds upon the conversion tools provided by the trec-car-create
package, which provides utilities for converting, extracting, inspecting, filtering, and generating benchmarks from Wikipedia. Please follow installation and compilation instructions described in the README
.
Language Bindings for CBOR
Language bindings for java and python to read the CBOR file formats are provided in the trec-car-tools
packages.
Alternatively the equivalent JSONL format can be used with any JSON parsing package. Because of the high amount of redundancy, we provide JSONL files as gzipped and recommend to open them directly with a GzipCompressed file handler. (See data model on the main page.)