Citation

Project Citation: 

Peeters, Ralph, Primpeli, Anna, and Bizer, Christian. Product Datasets from the MWPD2020 Challenge at the ISWC2020 Conference (Task 1). Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-11-26. https://doi.org/10.3886/E127482V1

Persistent URL:  http://doi.org/10.3886/E127482V1

Project Description

Project Title:  View help for Project Title Product Datasets from the MWPD2020 Challenge at the ISWC2020 Conference (Task 1)
Summary:  View help for Summary The goal of Task 1 of the Mining the Web of Product Data Challenge (MWPD2020) was to compare the performance of methods for identifying offers for the same product from different e-shops. The datasets that are provided to the participants of the competition contain product offers from different e-shops in the form of binary product pairs (with corresponding label ?match? or ?no match?) from the product category computers. The data is available in the form of training, validation and test set for machine learning experiments. The Training set consists of ~70K product pairs which were automatically labeled using the weak supervision of marked up product identifiers on the web. The validation set contains 1.100 manually labeled pairs. The test set which was used for the evaluation of participating systems consists of 1500 manually labeled pairs. The test set is intentionally harder than the other sets due to containing more very hard matching cases as well as a variety of matching challenges for a subset of the pairs, e.g. products not having training data in the training set or products which have had typos introduced. These can be used to measure the performance of methods on these kinds of matching challenges. The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites, marking up their offers with schema.org vocabulary. For more information and download links for the corpus itself, please follow the links below.
Original Distribution URL:  View help for Original Distribution URL https://ir-ischool-uos.github.io/mwpd/

Scope of Project

Subject Terms:  View help for Subject Terms schema.org; product matching; entity matching; identity resolution; record linkage; e-commerce


Name Size File Type Download/
Preview
file ISWC2020_SWC_MWPD_challenge.zip 31.1 MB application/zip Download

Published Versions

Export Metadata

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.