At Evolphin, we’re redefining how creative teams search, manage, and collaborate on content. While most companies apply AI to text, we’ve gone further: our platform embeds and indexes actual media files — video, image, design, and audio — enabling true semantic search across time-coded content and millions of creative assets.
Our flagship platform, Zoom MAM, is trusted by global broadcasters, agencies, and top brands including Inter MIlan FC, Merck, Mercedes Benz to power their visual workflows. We’re now rebuilding our metadata and search architecture from the ground up — using Python-based LLM pipelines, vector embeddings, and a high-performance object store designed for AI-native media search.
This isn’t just “chat with your documents.” It’s AI that understands a scene, shot, logo, or layout — and finds the right clip, version, or layout at the speed of thought.
What You’ll Own- Build and extend backend services that power AI-driven media search and metadata enrichment
- Develop, integrate, and deploy AI/ML inference pipelines (embeddings, vision/audio models, transcription, background removal, etc.)
- Fine-tune and optimize computer vision and generative models (e.g., U²Net, BiRefNet, CLIP, Whisper, YOLO, diffusion models)
- Work with large datasets (100k–5M images): preprocessing, augmenting, and structuring for training/inference
- Contribute to building pipelines for tasks like background removal, inpainting/outpainting, banner generation, logo/face detection, and multimodal embeddings
- Integrate with vector databases (e.g., FAISS, Pinecone, Weaviate, Qdrant) for similarity and semantic search
- Collaborate with the engineering team to deploy scalable AI inference endpoints (Docker + GPU/EC2/SageMaker)
- Core Python (Required) – solid programming and debugging skills in production systems
- AI/ML Libraries – hands-on experience with PyTorch and/or TensorFlow, NumPy, OpenCV, Hugging Face Transformers
- Model Training/Fine-Tuning – experience fine-tuning pre-trained models for vision, audio, or multimodal tasks
- Data Handling – preprocessing and augmenting image/video datasets for training and evaluation
- Vector Search – familiarity with FAISS, Pinecone, or similar for embeddings-based search
- Comfortable with chaining or orchestrating multimodal inference workflows (e.g., image + audio + OCR → unified embedding)
- Have worked with generative models (diffusion, inpainting, or outpainting)
- Understand large-scale media workflows (video, design files, time-coded metadata)
- Enjoy experimenting with new models and pushing them into production
- Care about making AI useful in real-world creative pipelines