Background Removal and Replacement Using Vision AI and Vision Engineering
100%
Increase in Services Offered50%
Increase in Processing Capacity90%
Daily Processing Cost SavingsCustomer Overview
Our client, a US-based niche product photography service provider in the automobile industry, wanted to grow their business using digitization. They provided an application so auto dealers could take professional-grade photos of cars' interiors and exteriors using 37 compositions or camera angles. Our client then polished these photos using both automated and manual processes to provide 4K high-definition outcomes to their customers.
Project Overview
The client used a third-party background removal and replacement tool and paid high costs for over 70,000 images daily. They didn’t use this tool for videos because it needed processing all frames, increasing costs and making staying competitive difficult. They wanted to reduce dependency and build Intellectual Property (IP) by developing a better-performing background removal and replacement tool to scale photo and video processing.
Challenges
Building a superior background removal solution with 4K resolution results, surpassing a leading tool, while meeting tight deadlines & budget constraints.
- The solution must provide high-quality outcomes better than the third-party tool our client used and help deliver 4K high-definition results for all 37 compositions.
- It should perform fine-grained background removal for car components’ narrow edges, imagery behind car’s windows and windshields, etc.
- It must generate the car’s drop shadow and manage the visibility (transparency level) of the new background as seen through the car’s windows.
- Building a custom model required labeling a massive dataset of 37 car compositions, i.e., around 20,000 4K images, and training presented a significant challenge—more importantly, achieving this within a tight deadline.
- We must research and evaluate different foundation models to identify the most suitable AI model we can utilize to build this solution.
- We must choose between two approaches: utilizing a base AI model that requires full training or a pre-trained model that needs fine-tuning.
- The solution must integrate with our client’s existing infrastructure without major code changes.
Solution
Using the latest vision model, vision engineering, annotation, and high-capacity GPUs, we built an AI-powered background removal tool for accurate and efficient image editing, achieving precise background removal and replacement.
- Using sample data of 4K resolution images, we trained and evaluated the performance of both open-source foundation models: the proven U2Net and the latest OSS Segmentation Model. Based on the results, we used the OSS Segmentation Model.
- We chose to fine-tune a general pre-trained model instead of fully training a base model, considering the need for a smaller dataset and lower computational costs, while aiming for better and faster results.
- We used 20,000 4K resolution images from 37 compositions to train the AI model. To annotate all images in under 2 weeks, we used the CVAT.ai annotation tool and a team of 10 human annotators.
- We simultaneously rented five A100 GPUs with 80 GB VRAM and 12.6 CUDA capabilities from VAST.ai to perform distributed training of our AI model.
- For accurate background removal & replacement of narrow car interior edges & drop shadow effect creation, we used the OpenCV Vision Engineering Toolkit. Adjusted opacity levels for precise window transparency.
- We built a scalable REST API similar to the one used by the third-party tool our client previously utilized, facilitating integration with minimal code.
Benefits
The AI-powered Background Removal tool we developed provides tangible benefits like the following:
- Developing an in-house solution allowed the client to create valuable Intellectual Property (IP), enhancing their competitive advantage.
- It lowered operational costs by eliminating per-image fees & scaled image processing without increasing expenses.
- The solution’s versatility supported photo and video processing, expanding the client’s service offerings to auto dealers.
- It delivers superior image quality, significantly improving outputs and reducing the burden on human editors for extensive editing.
Technology
- OSS Segmentation Model
- OpenCV
- Python
- Hugging Face
- FastAPI
- Vast.ai
- CVAT.ai
Industry
- Automobile/Automotive

Conclusion
Developing an in-house AI-powered background removal tool enhanced the client’s competitive advantage and reduced operational costs by eliminating per-image fees. The solution supported photo and video processing, expanded service offerings, and delivered superior image quality while lowering the editing workload for human editors. Check out how our AI development services can help you build custom solutions to resolve your business problems.
Frequently asked questions
Why do companies build in-house background removal tools instead of using third-party APIs?
Companies build in-house background removal tools to eliminate per-image API costs, maintain full control over output quality, and own proprietary IP. At high processing volumes, this approach scales better and becomes more cost-effective than third-party APIs.
How accurate is Vision AI for fine-grained background removal in automotive images?
Vision AI can deliver highly accurate background removal for automotive images when trained on high-resolution, domain-specific datasets. It reliably handles fine details like mirrors, window transparency, narrow edges, and drop shadows that generic tools often miss.
What makes background removal for cars more complex than for people or objects?
Background removal for cars is more complex because vehicles have reflective paint, transparent glass, thin edges, and natural shadows. Accurately separating these elements requires fine-grained segmentation and vision engineering beyond standard object-removal models.
How does AI handle background replacement while preserving window transparency?
AI preserves window transparency by identifying glass regions separately and applying pixel-level opacity during background replacement. This allows the new background to appear naturally through windows while maintaining realistic reflections and edges.
What role does vision engineering play alongside deep learning models?
Vision engineering turns deep learning outputs into production-ready results by applying rule-based refinements like edge smoothing, shadow generation, and opacity control. This ensures consistent, high-quality visuals that AI models alone can’t reliably deliver.
Is fine-tuning a pre-trained vision model better than training from scratch?
Yes. Fine-tuning a pre-trained vision model is usually better than training from scratch because it requires less data, trains faster, and achieves high accuracy by adapting proven visual features to domain-specific tasks.
How scalable are AI-based background removal systems for high-volume processing?
AI-based background removal systems are highly scalable because cloud-native pipelines can scale horizontally to process tens of thousands of images or video frames daily without increasing per-image costs.
Why is AI-based background removal becoming a competitive advantage in product photography?
AI-based background removal is a competitive advantage because it delivers faster turnaround, consistent visual quality, and lower production costs at scale, while enabling new offerings like high-volume image and video processing without additional manual effort.