Restaurants (Fodors-Zagats), Augmented Version, Fixed Splits

Name: Restaurants (Fodors-Zagats), Augmented Version, Fixed Splits
Published: 2020-11-23
License: https://creativecommons.org/licenses/by/4.0

Principal Investigator(s): View help for Principal Investigator(s) Anna Primpeli, University of Mannheim (Germany); Christian Bizer, University of Mannheim (Germany)

Version: View help for Version V1

Published: View help for Published Date November 23, 2020

Citation

Project Citation:

Primpeli, Anna, and Bizer, Christian. Restaurants (Fodors-Zagats), Augmented Version, Fixed Splits. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-11-23. https://doi.org/10.3886/E127242V1

Persistent URL: http://doi.org/10.3886/E127242V1

Project Description

Project Title: Restaurants (Fodors-Zagats), Augmented Version, Fixed Splits

Summary: Motivation:
Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits.

Dataset Description:
An augmented version of the fodors-zagats restaurants dataset for benchmarking entity matching/record linkage methods found at:
https://hpi.de/en/naumann/projects/data-integration-data-quality-and-data-cleansing/dude.html#c11471

The augmented version adds a fixed set of non-matching pairs to the original dataset. In addition, fixed splits for training, validation and testing as well as their corresponding feature vectors are provided. The feature vectors are built using data type specific similarity metrics.

The dataset contains 533 records describing restaurants from fodors.com which are matched against 331 restaurants records from zagat.com. The gold standards have manual annotations for 112 matching and 488 non-matching pairs. The total number of attributes used to decribe the product records are 5 while the attribute density is 100%.

The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results.
The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download:
http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html

Name			Size	File Type		Download/ Preview
restaurants_(Fodors-Zagats)						Download

Download this project

Published Versions

V1 [2020-11-23]

Export Metadata

OAI-PMH

DDI 2.5

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.

Restaurants (Fodors-Zagats), Augmented Version, Fixed Splits

Citation

Project Description

Published Versions

Export Metadata

Approve or Disapprove Project