🌟 A full-stack web application for generating image captions, combining the power of modern web technologies with state-of-the-art AI models. Built using React.js (frontend) and FastAPI (backend), this project integrates a custom-trained model leveraging CLIP, a Mapping Network, and GPT-2 for caption generation.
- 🖼️ Image Captioning: Generate captions for images using either greedy search or beam search.
- ⚙️ Interactive Interface: A responsive and intuitive UI powered by React.js.
- 🚀 AI-Powered Backend: FastAPI serves as the backend for efficient model inference.
- 🔧 Custom Training: Train the model on your dataset by making minimal configuration changes.
git clone https://github.com/SurAyush/Image-Captioning.git
cd Image-CaptioningDownload the pre-trained model from Hugging Face and save it in the model/trained_model directory. Trained-Model
cd server/
pip install -r requirements.txt
fastapi runcd client/
npm install
npm run devOnce everything is set up, you can use the web app to:
- Upload an image.
- Generate captions using greedy search or beam search.
-
Prepare the Dataset:
- Make the necessary changes in
parse_coco.pyandtrain.pyto locate the MS COCO 2017 dataset.
- Make the necessary changes in
-
Generate Intermediate Dataset:
python parse_coco.py
python train.pyModify Config.py to fine-tune the hyperparameters for training.
💡 The current trained model demonstrates promising results but is limited by resource constraints during training. Despite this, it generates captions related to the input image and shows significant potential for improvement with further training. BLEU is not yet been evaluated as the model is not fully trained so evaluation does not seem very meaningful.
Inconvenience you might face (whole-hearted apologies for that):
Please install any more python libraries if required (and not specified in requirements.txt).
You may use the python venv if you like (not used as the heavy packages were pre-installed in my local machine like pytorch)
For more details, check out my blog: My Blog Post.
Contributions are welcome! Feel free to fork the repository, submit issues, or create pull requests.
This project is licensed under the MIT License.