DeepMind researchers have unveiled a new model, PaliGemma, pushing forward the evolution of vision-language models. The new model successfully integrates the strengths of both the PaLI vision-language model series and the Gemma family of language models. PaliGemma is an example of a sub-3B vision-language model that uses a 400M SigLIP vision model along with a…
