Skip to content

A full-stack web application built with React.js and FastAPI, seamlessly integrating a cutting-edge image captioning model. This advanced AI model leverages the power of OpenAI's CLIP for image feature extraction, a Mapping Network for robust feature translation, and GPT-2 for generating natural language caption.

Notifications You must be signed in to change notification settings

SurAyush/Image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning Web App

🌟 A full-stack web application for generating image captions, combining the power of modern web technologies with state-of-the-art AI models. Built using React.js (frontend) and FastAPI (backend), this project integrates a custom-trained model leveraging CLIP, a Mapping Network, and GPT-2 for caption generation.


Features

  • 🖼️ Image Captioning: Generate captions for images using either greedy search or beam search.
  • ⚙️ Interactive Interface: A responsive and intuitive UI powered by React.js.
  • 🚀 AI-Powered Backend: FastAPI serves as the backend for efficient model inference.
  • 🔧 Custom Training: Train the model on your dataset by making minimal configuration changes.

Getting Started

1. Clone the Repository

git clone https://github.com/SurAyush/Image-Captioning.git
cd Image-Captioning

2. Download the Model

Download the pre-trained model from Hugging Face and save it in the model/trained_model directory. Trained-Model

3. Install Dependencies

Backend

cd server/
pip install -r requirements.txt
fastapi run

Frontend

cd client/
npm install
npm run dev

4. Try Image Captioning

Once everything is set up, you can use the web app to:

  • Upload an image.
  • Generate captions using greedy search or beam search.

Training the Model

  1. Prepare the Dataset:

    • Make the necessary changes in parse_coco.py and train.py to locate the MS COCO 2017 dataset.
  2. Generate Intermediate Dataset:

    python parse_coco.py

Train the Model:

python train.py

Adjust Hyperparameters:

Modify Config.py to fine-tune the hyperparameters for training.

Notes

💡 The current trained model demonstrates promising results but is limited by resource constraints during training. Despite this, it generates captions related to the input image and shows significant potential for improvement with further training. BLEU is not yet been evaluated as the model is not fully trained so evaluation does not seem very meaningful.

Inconvenience you might face (whole-hearted apologies for that):

Please install any more python libraries if required (and not specified in requirements.txt).

You may use the python venv if you like (not used as the heavy packages were pre-installed in my local machine like pytorch)

For more details, check out my blog: My Blog Post.

Contributions

Contributions are welcome! Feel free to fork the repository, submit issues, or create pull requests.

License

This project is licensed under the MIT License.

About

A full-stack web application built with React.js and FastAPI, seamlessly integrating a cutting-edge image captioning model. This advanced AI model leverages the power of OpenAI's CLIP for image feature extraction, a Mapping Network for robust feature translation, and GPT-2 for generating natural language caption.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published