Abstract
Extracting keyphrases and entities can be an important first step in many Natural Language Processing (NLP) and Information Retrieval (IR) Tasks. There are many datasets to train models for standard entities, but it is hard to find data that can be used for more domain specific applications.
The types of keyphrases someone wants to extract vary enormously between different fields, which makes otherwise successful algorithms perform poorly on them. One of the fields where this is the case is Physics, specifically to process physics publications and patents. In comparison to news articles or social media, the typical entities like Organization, Location or Person are not helpful when extracting impor-
tant information from publications or patents. There are few dataset annotations for specific domains, and even when they exist they are not easily transferable. This work contributes an annotated dataset for the facilitation of information retrieval and extraction in Physics. The dataset spans Physics Patents as well as Publications. It covers both of these document types to enable future work between them. This can
facilitate future work such as tracking inventions from the first emergence in a publication to the adaption in a patent
The types of keyphrases someone wants to extract vary enormously between different fields, which makes otherwise successful algorithms perform poorly on them. One of the fields where this is the case is Physics, specifically to process physics publications and patents. In comparison to news articles or social media, the typical entities like Organization, Location or Person are not helpful when extracting impor-
tant information from publications or patents. There are few dataset annotations for specific domains, and even when they exist they are not easily transferable. This work contributes an annotated dataset for the facilitation of information retrieval and extraction in Physics. The dataset spans Physics Patents as well as Publications. It covers both of these document types to enable future work between them. This can
facilitate future work such as tracking inventions from the first emergence in a publication to the adaption in a patent
Original language | English |
---|---|
Title of host publication | Proceedings of the 3rd International Open Search Symposium #ossym2021 |
Subtitle of host publication | OSSYM 2021 |
Pages | 45-49 |
ISBN (Electronic) | 978-92-9083-633-9 |
DOIs | |
Publication status | Published - 2022 |
Event | 3rd International Open Search Symposium: OSSYM 2021 - Virtuell, Austria Duration: 11 Oct 2021 → 13 Oct 2021 |
Conference
Conference | 3rd International Open Search Symposium |
---|---|
Abbreviated title | OSSYM 2021 |
Country/Territory | Austria |
City | Virtuell |
Period | 11/10/21 → 13/10/21 |
Fields of Expertise
- Information, Communication & Computing