Creating a Dataset for Keyphrase Extraction in Physics Publications and Patents

André Rattinger, Christian Gütl

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung


Extracting keyphrases and entities can be an important first step in many Natural Language Processing (NLP) and Information Retrieval (IR) Tasks. There are many datasets to train models for standard entities, but it is hard to find data that can be used for more domain specific applications.
The types of keyphrases someone wants to extract vary enormously between different fields, which makes otherwise successful algorithms perform poorly on them. One of the fields where this is the case is Physics, specifically to process physics publications and patents. In comparison to news articles or social media, the typical entities like Organization, Location or Person are not helpful when extracting impor-
tant information from publications or patents. There are few dataset annotations for specific domains, and even when they exist they are not easily transferable. This work contributes an annotated dataset for the facilitation of information retrieval and extraction in Physics. The dataset spans Physics Patents as well as Publications. It covers both of these document types to enable future work between them. This can
facilitate future work such as tracking inventions from the first emergence in a publication to the adaption in a patent
TitelProceedings of the 3rd International Open Search Symposium #ossym2021
UntertitelOSSYM 2021
ISBN (elektronisch)978-92-9083-633-9
PublikationsstatusVeröffentlicht - 2022
Veranstaltung3rd International Open Search Symposium: OSSYM 2021 - Virtuell, Österreich
Dauer: 11 Okt. 202113 Okt. 2021


Konferenz3rd International Open Search Symposium
KurztitelOSSYM 2021

Fields of Expertise

  • Information, Communication & Computing


Untersuchen Sie die Forschungsthemen von „Creating a Dataset for Keyphrase Extraction in Physics Publications and Patents“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren