Title: Human enhanced machine driven categorization (tentative)
Mentor: Francesco Lelli
Co-Supervisor Emiel Caron
Machines are better than human in executing repetitive and computational oriented tasks. However, humans are more flexible and can conceptualize and categorize information in a superior way. The candidate will investigate the proper way to combine both the world in the domain of categorization algorithm.
You will focus these studies in the domain of scientometrics, in particular the identification of duplicate scientific references in the patent databases (see this article for more information).
Strings representing references will be adjusted following a similarity algorithm that you will contribute to develop. This particular algorithm that will be partially “human driven”.
This project does not involve an internship. Instead, it will try to have a high academic relevance and theoretical contribution and, based on the quality of your work the candidate may be able to publish the results in the proceedings and scientific journals.
If you are curious and you what to know more about the topic, I recommend you the following:
- A generic article in Wikipedia about Microwork https://en.wikipedia.org/wiki/Microwork
- A few keywords that you may want to use in google scholar: ESP Game, string similarity, microwork, gamification.
- A few resources for a practical understanding of the tools that may be used: https://www.microworkers.com https://www.mturk.com/
- Zhao, Kangran, Caron, Emiel, & Guner, Stanislaw (2016). Large scale disambiguation of scientific references in patent databases. In Ismael Rafols, Jordi Molas-Gallart, Elena Castro-Martinez, & Richard Woolley (Eds.), Proceedings of 21st International Conference on Science and Technology Indicators (STI 2016): Peripheries, frontiers and beyond (pp. 1404-1410). València (Spain): Editorial Universitat Politècnica de València.
Note: Basic proficiency in a computer languages like JAVA and/or SQL plus the capability of consuming a Web APIs may be beneficial.