Project Group: Web-based Organization Mining

Tutors: Dr. Marcus Handte, Arman Arzani

An important goal of many universities is to increase the number of startups that transform innovative research results of the university into sustainable businesses. To reach this goal, it is necessary to connect the researchers that have generated promising results with the business advisors and innovation coaches of the university that help scientists to successfully launch a new business.

The goal of this project group is to design and implement a web-based tool that supports the advisors and coaches of a university in identifying research groups or researchers that are working on innovative topics. As primary input the tool shall process the web pages of a university to automatically extract relevant information. Some examples for this are:

  • The organization and structure of the university (names of the faculties, research groups and researchers, etc.)
  • The research projects of the different research groups (project name and topic, project duration, funding scheme and budget, etc.)
  • The publications of the different researchers (authors, title, type of publication, etc.)

The algorithms implemented as part of the tool shall be generic to support the data extraction from the web sites of different universities. To visualize the data, the project group shall develop a web-based application that enables business advisors and innovation coaches to browse and search the extracted information.

From a technical perspective, the project will encompass the development and integration of a web-crawler, a search index, a data mining framework with the associated templates to extract the desired information and a web application to access the data. For the web-crawler and search index, we are currently planning on using Scrapy and Elasticsearch. The technologies used to perform the actual data mining can be freely defined by the students.

From a theoretical perspective, the project group covers fundamental concepts related to web search and data mining in theory and practice. This includes web crawling and search as well as data extraction and information integration. In addition, the participants will prepare individual seminar talks and papers on selected research topics related to web search and data mining.

The kickoff meeting for this course will take place on Tuesday, April 4th, 2023 from 10.00h to 14.00h in SA-126.

The admission is managed centrally. If you have any questions, please contact marcus.handte@uni-due.de.