Metadata Modeling and Knowledge Representation
for Research Data

An NKOS/DCMI Regional Workshop at the International Conference for Asian Digital Libraries

Tsukuba, Japan
December 9, 2016

Workshop Program| ICADL2016 Conference | NKOS Home

Research data in different disciplinary fields vary greatly in structure, format, and size. Although metadata standards for research data in major disciplinary fields have been established, it is common that metadata models and knowledge organization structures have to be developed for individual disciplines to meet the special needs and requirements of research data management. Computationally intensive research, for example, relies heavily on writing programming code in the research lifecycle for data collection, processing, analysis, and management. Ecosystem research data encompass a wide variety of formats and types of data that often mixed with unstructured notes and/or annotations. Humanities research data are often in text format and unstructured. Proper organization and management of research data is necessary not only for their ongoing management and use but also for long-term preservation and access.

The increasing demand for both metadata and knowledge representation in the research data domain presents a unique niche area for research and application. This half-day workshop will feature research on applying KOS in enhancing research data discovery and use, metadata modeling for big science data management, linked data competency training, and more. The speakers bring with them the ongoing projects and practices and allow the audience to have an opportunity to ask questions, learn new developments, and interact with speakers and other workshop participants. Unlike previous NKOS workshops, this first ever NKOS workshop has broadened the scope to combine metadata with KOS and focus on applications of KOS and metadata.

Who should attend

Librarians, information professionals, and graduate students who are interested in learning about metadata modeling and knowledge representation for research data will learn about the important concepts, methods, practices, and current developments through the overview and panel presentations. Researchers and practitioner will be able to share and exchange information and discuss issues in metadata modeling and knowledge representation through questions and answers between presenters and participants. This will also be a great opportunity for educators who teach courses in information organization and other related areas to stay in touch with new developments.


Below is the schedule for the workshop (Detailed presentation information will be updated from time to time):

1:30-1:35    Welcome (Jian Qin, Syracuse University, USA and Robert B. Allen, Yonsei University, South Korea)

                  Introduction of Keynote speaker (Jian Qin, Syracuse University, USA)

1:35-2:20 Joint Opening keynote for Rich Semantics and NKOS workshops: Noriko Kando, National Institute of Information, Japan

Presentation Title: KOS in Information Retrieval Experiments

2:20-3:10 Panel presentations

Presentation title: Development of an Ontology to Model Socio-Demographic Information in the Curation of Questionnaire Data
Guangyuan Sun & Koo Soo Guan, Nanyang Technological University, Singapore

This presentation reports part of an ongoing data curation project for curating quantitative social science research data for data reuse data. The focus of this project is on curating research datasets derived from survey questionnaires in the social sciences. These datasets often contain socio-demographic information of questionnaire respondents. We are interested in using the knowledge representation techniques of metadata schema and ontologies to describe and represent research data for data curation and reuse. An ontology can be used to represent the syntax and structure of datasets, and assign semantics (i.e. meaning) to socio-demographic variables and their values in the datasets needed for data interpretation and reuse. Socio-demographic variables and their different value types (i.e. typical combinations of value choices of variables) are collected from two social sciences data repositories: the U.K. Data Archive and the Interuniversity Consortium for Political and Social Research (ICPSR) and modelled using a two-level representation: the data table variable description and the physical table description. The detail of data table variable and physical table description will be presented and the implications on curating quantitative social science data will be discussed.

Presentation title:: Types of Reuse: Methods of Developing a Taxonomy for Data and Software Reuse
Xia Lin, Jane Greenburg, Kai Li, & Xuemei Gong, Drexel University, USA

Given the growth of using open source software and open access data in research projects, it is important to investigate and understand how and why the data and software are used or reused. Previous research on this often uses only simple citation counts as an indicator for reuse. In this presentation, we will discuss our efforts in searching for methods to create a taxonomy for data and software reuse. We aim to create automatic or semi-automatic procedures to identify types of reuse from the documents that mentioned about the reuse. Open-source software, LAMMPS, which is popular in materials science research, was chosen as a sample. There are 6142 articles citing this software on Google Scholar. We identified the top 400 citing documents, and found and collected all the full text articles as the source for content analysis to look for different types of reuse. For each article, the sentences that cite the software were extracted and analyzed. Terms in the sentences were collected for content analysis. As a result, eight categories of reuse were identified, including, for each category, a list of associative terms that can be used as indicators for the category. We then run two types of tests on these categories. One is to apply Machine Learning algorithms to see how we can use these categories and associative terms to predict the type of reuse in new documents that cite the software. The other is to ask a domain expert to assess these categories of reuse and evaluate how well these categories were correctly identified. The results, which are both interesting and surprising, will be discussed during the presentation.

3:10-3:30 Coffee break

3:30-4:20 Panel presentations

Presentation title: Quality of entity-level data linking and mashing-up based on LOD-enabled KOS vocabularies
Marcia Zeng, Kent State University, USA

In the creation and application of LOD datasets, data linking and mashing-up based on LOD-enabled KOS vocabularies have direct impacts on entity-level quality of data. The term "entity" refers to identifiable things, which are often seen as named entities such as people, organizations, places, events, and concepts represented in metadata statements while the values are controlled by KOS vocabularies. In RDF triples (subject-predicate-object) or metadata statements, entities occupy the positions of "subject" and "object". Data converting into, and linking through, RDF triples do not warrant the datasets' maturity, trustiness, robustness, scalability, and sustainability. This presentation will present the types of quality issues found in current practices and aim at increasing the awareness of these issues and finding enhancement methods.

Presentation title:: Modeling metadata with a KOS approach: The case of gravitational wave research data
Jian Qin, Brian Dobreski, & Duncan Brown, Syracuse University, USA

The complexity of computationally-intensive scientific research poses great challenges for both research data management and research reproducibility. What metadata and vocabulary needs to be captured for tracking, reproducing, and reuse computational results is the starting point in developing metadata models to fulfill these functions of data management. This presentation reports the findings from a study designed to gather user requirements to develop a metadata model with an ontological approach. This case shows that metadata specific to GW data, workflows, and outputs tend to differ from those currently available in metadata standards and the process of developing the metadata model should captures both structural and semantic vocabularies.

4:20-4:50 Invited talk: by Tom Baker

Presentation title:: The Global Agricultural Concept Scheme and Agrisemantics

Abstract: This presentation discusses development of Global Agricultural Concept Scheme (GACS) in which key concepts from three thesauri about agriculture and nutrition—AGROVOC, CAB Thesaurus, and NAL Thesaurus—have been merged. The respective partner organizations—Food and Agriculture Organization of the UN (FAO), CAB International (CABI), and the USDA National Agricultural Library (NAL)—undertook this initiative in 2013 with the goal of facilitating search across databases, improving the semantic reach of their databases by supporting queries that freely draw on terms from any mapped thesaurus, and achieving economies of scale from joint maintenance. The GACS beta release of May 2016 has 15,000 concepts and over 350,000 terms in 28 languages.

4:40-5:00 Discussion and closing (moderator: Jian Qin)



Jian Qin, Syracuse University, Syracuse, New York, USA
Marcia Zeng, Kent State University, Kent, OH, USA
Shigeo Sugimoto, University of Tsukuba, Tsukuba, Ibaraki, Japan
Xia Lin, Drexel University, Philadelphia, PA, USA

NKOS (Networked Knowledge Organization Systems) is an ad hoc work group of more than 300 international experts and implementers of knowledge organization systems. NKOS is devoted to enabling knowledge organization systems/services (KOS), such as classification systems, thesauri, gazetteers, and ontologies, as networked, interactive information services to support the description and retrieval of diverse information resources through the Internet.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Back to NKOS home