PhotoMaker is an emerging artificial intelligence tool that’s challenging the traditional use of LoRA training and IPAdapter in the recreation of faces. Unlike LoRA, PhotoMaker doesn’t necessitate training and is touted for its easy re-generation of reference faces. This article provides a comprehensive look into the tool’s workings, the author’s test findings, and some samples for review. The PhotoMaker Repo is accessible and can be set up for hands-on learning, although, some installation challenges were faced on Windows 11 PC. A separate Fork of this Repo was found to better cater to Windows users.
Setting up PhotoMaker on your PC is fairly straightforward once you follow the correct repo. The installation process primarily involves the installation of Python, Git and Visual Studio Re-distributable, followed by running commands listed on the repo. Executing the GUI.bat file then launches the operation, downloading models on the first execution which can slow down the startup time. Subsequent startups, however, are faster.
To use PhotoMaker, a number of sample face images need to be uploaded, preferably focusing mainly on the face aspect. The prompter should then be entered, remembering to incorporate the trigger/class word “img”. Optional style templates provided on the Gradio app can enhance prompts to stylize images. Image generation is processed based on your GPU’s capability, and took the author around 30 seconds on their RTX4080 16GB PC.
Advanced options allow certain generation parameters to be customized. Once generated images are ready, they must be downloaded as there is no default output directory in which they are saved. Past images may be lost if another generation process is executed without downloading the previous results.
A detailed analysis of the original face references against their generated counterparts showed considerable consistency in face interpretation. However, the author noticed the generated faces weren’t exact duplicates of the originals, despite some resemblance. Testing with different faces, the author concluded that LoRA training may still be superior given the vast information it captures, leading to closer resemblances.
While PhotoMaker’s training-free approach is a fresh concept, it requires further refinements to achieve high-fidelity facial reproductions. Future iterations of this method are anticipated to enhance its capabilities and improve outcomes.