Wikimarks

Wikipedia Conversions

Along with the methodology and tools, we provide the raw products of our article conversion pipeline run on the English, Simple English, and Japanese Wikipedia dumps from 1 January 2022. These datasets, which we call unprocessedAll, include all pages of each Wiki in machine-readable JSONL or CBOR formats.