--- title: README emoji: 🚀 colorFrom: pink colorTo: purple sdk: static pinned: false license: mit --- # 🧠 Open Multi-Label ASJC Classification We present the first **multi-label classification model** built on the ASJC taxonomy that reliably assigns subject categories to individual documents—including those published in general-science or interdisciplinary journals—using Title, Container Title, and Abstract metadata. ## 👥 Team - **Michael Gusenbauer** – Johannes Kepler University Linz | ORCID: [https://orcid.org/0000-0001-7768-2351](https://orcid.org/0000-0001-7768-2351) - **Jochen Endermann** – University of Applied Sciences Kufstein - **Harald Huber** – University of Applied Sciences Kufstein - **Simon Strasser** – University of Applied Sciences Kufstein - **Andreas-Nizar Granitzer** – Norwegian Geotechnical Institute | ORCID: [https://orcid.org/0000-0002-5839-4300](https://orcid.org/0000-0002-5839-4300) - **Thomas Ströhle** – Universität Innsbruck | ORCID: [https://orcid.org/0000-0002-1954-6412](https://orcid.org/0000-0002-1954-6412) ## 🎯 Purpose Traditional ASJC classification approaches are limited by incomplete sources, journal-level labels, or single-label assignments. This project provides: - **Multi-label classification across 307 subjects** (compare [google sheet](https://docs.google.com/spreadsheets/d/1kqmGk2x0msodbaKDYt2RixyyB3MqOGrWS2azRGNsodw) for all labels) - Fine-tuned **SciBERT model** trained on Crossref metadata - Methods for **collection-level analysis** (researcher portfolios, institutions, datasets) ## ✨ Features - High performance - Works with or without source title metadata - Open, reproducible, and ready for research use ## 🗂 Content - Fine-tuned model - Sample code for model inference ## 📖 Citation If you use this work, please cite: ```bibtex @article{Gusenbauer.2025, author = {Gusenbauer, Michael and Endermann, Jochen and Huber, Harald and Strasser, Simon and Granitzer, Andreas-Nizar and Ströhle, Thomas}, year = {2025}, title = {Fine-tuning SciBERT to enable ASJC-based assessments of the disciplinary orientation of research collections}, keywords = {All Science Journal Classification;Disciplinary coverage;Fine-tuning;multi-label classification;SciBERT;Transformer-based language models}, issn = {0138-9130}, journal = {Scientometrics}, doi = {10.1007/s11192-025-05490-0}, }