This project involves loading and utilizing a robust machine learning model within a Jupyter notebook. The provided notebook (Submission.ipynb) efficiently executes machine learning inference tasks and provides capabilities for processing multiple PDFs simultaneously. Additionally, it supports multilingual inputs and can handle text in various languages.
- Multiple PDF Handling: The model can process and analyze multiple PDF documents simultaneously, extracting relevant information effectively.
- Multilingual Support: Designed to support and accurately process text in multiple languages, enhancing its utility in diverse linguistic environments.
To ensure smooth execution, install the following Python dependencies. It is recommended to use a virtual environment:
- Create and Activate Virtual Environment
python3 -m venv env
source env/bin/activate # On Windows use: .\env\Scripts\activate- Install Dependencies
Run the following commands to install the necessary libraries:
pip install torch
pip install transformers
pip install langchain
pip install faiss-cpu
pip install streamlit
pip install PyMuPDF # for fitz
pip install langdetect
pip install sentence-transformersAlternatively, install all at once using:
pip install torch transformers langchain faiss-cpu streamlit PyMuPDF langdetect sentence-transformersTo execute the notebook:
- Activate your environment:
source env/bin/activate- Launch Jupyter Notebook:
jupyter notebook Submission.ipynb- Model Loading Time: The model used in this notebook is substantial and may take a few minutes to load. This delay is normal and expected.
- If Loading Fails: If the model loading process fails or times out, simply refresh or restart the notebook kernel and re-execute. The model will successfully load upon retrying.
- System Resources: Ensure sufficient system resources (RAM and CPU/GPU) are available to manage the model's size and computational demands effectively.
- Efficiently manages and interprets data from complex and large PDF documents.
- Provides reliable multilingual processing, facilitating broader usability.
- Enhanced accuracy and robustness in text extraction and analysis tasks.
For further issues or troubleshooting, ensure your environment and dependencies are correctly configured as described above.