This repository contains the implementation and results from the paper:
RBMD: RoBERTa-Based Module Detection in Multi-Programming Language Software Systems
Cite this work: https://doi.org/10.1109/ICWR65219.2025.11006198
The dynamic nature of web and software systems requires modularization methods that can adapt to frequent updates and diverse structures while maintaining scalability and efficiency. In this research, we propose a RoBERTa-based Module Detection (RBMD) framework that leverages transformer models to classify and manage software modules using the semantic content of source code, comments, and related textual data. Unlike traditional dependency graph-based methods, our content-driven approach offers a streamlined, scalable solution for software systems. This repository contains code, datasets, and evaluation results for the RBMD framework.
This repository provides module detection results for three open-source software systems:
- Chromium
- Mozilla 3.7
- Mozilla 134
Each folder contains the following:
- A
targetfolder - Two Python scripts:
Copy.pyandroberta.py
- Source: https://github.com/chromium/chromium
- Results: Achieved an outstanding accuracy and F1-score of 99.70% after training.
- Details: The
Copy.pyscript filters and organizes data from 10 folders based on specific files.
- Source: https://ftp.mozilla.org/pub/firefox/releases/devpreview/1.9.3a4/source/
- Source Paper: https://doi.org/10.1016/j.compeleceng.2019.106500
- Results: Achieved 92.55% accuracy and 92.47% F1-score after four epochs.
- Source: https://github.com/mozilla/gecko-dev/tree/master
- Results: Achieved 98.13% accuracy and 98.02% F1-score with rapid convergence and minimal training loss.
Kargar, Masoud, Shahin Sharbaf Movassaghpour, and Ali Bayani. "RBMD: RoBERTa-Based Module Detection in Multi-Programming Language Software Systems." In 2025 11th International Conference on Web Research (ICWR), pp. 66-73. IEEE, 2025.
© 2025 Masoud Kargar