Paper Accepted and Presented at KDIR 2025 in Marbella, Spain

We are happy to announce that our paper “A Hybrid Approach for Mining the Organizational Structure from University Websites” by Arman Arzani, Theodor Josef Vogl, Marcus Handte, Pedro José Marrón has been accepted and presented in the International Conference of Knowledge Discovery and Information Retrieval – KDIR 2025. The paper introduces a novel approach in creating the hierarchy of academic institutes based on the content of their websites.

Here is a short summary: To support innovation coaches in scouting activities such as discovering expertise, trends inside a university and finding potential innovators, we designed INSE, an innovation search engine which automates the data gathering and analysis processes. The primary goal of INSE is to provide comprehensive system support across all stages of innovation scouting, reducing the need for manual data collection and aggregation. To provide innovation coaches with the necessary information on individuals, INSE must first establish the structure of the organization. This includes identifying the associated staff and researchers in order to assess their academic activities. While this could in theory be done manually, this task is error-prone and virtually impossible to do for large organizations. In this paper, we propose a generic organization mining approach that combines a rule-based algorithm, LLMs and finetuned sequence-to-sequence classifier on university websites, independent of web technologies, content management systems or website layout. We implement the approach and evaluate the results against four different universities, namely Duisburg-Essen, Münster, Dortmund, and Wuppertal. The evaluation indicate that our approach is generic and enables the identification of university aggregators pages with F1 score of above 85% and landing pages of entities with F1 scores of 100% for faculties, above 78% for institutes and chairs.