Researchers at Apple have published a new paper detailing their MLLM-Guided Image Editing (MGIE) AI Model, which can edit an image using text prompts. Apple worked alongside University of California, Santa Barbara researchers to come up with a new model that is capable of handling a wide range of editing scenarios, from simple colour adjustments to more complex object manipulations.
The MGIE model consists of a Multimodal Large Language Model that expands users request and provides "concise expressive instructions" that the diffusion model can use to edit the input image. According to the research paper, this way of editing allows the MGIE model to address "ambiguous human commands to achieve reasonable editing".
For example, a picture of a pizza with the input "make it more healthy" is understood by the MLLM, which interprets the ambiguous term "healthy" and connects it with "Vegetable toppings on a pizza". The diffusion model then edits the image according to the instructions provided by the MLLM.
READ: Adopt AI, don't sit on sidelines: Microsoft's Satya Nadella to CEOs
According to the research, existing models such as LLM-Guided Image Editing (LGIE) lack the visual perception of MGIE. The Large Language Model (LLM) is confined to a single modality, while the MLLM, with access to the input image and cross-modal understanding, derives more descriptive instructions. For example, if the user wants the image to be brighter, the MLLM within the MGIE model will let the diffusion model know which regions should be brightened.
MGIE is available as an open-source project on GitHub and can be downloaded with code, data and pre-trained models. According to VentureBeat, the image editing model is also available through a web demo hosted on Hugging Face spaces. However, Apple has not yet confirmed how it plans to utilise this model beyond research projects.
Earlier this month, During Apple's quarterly earnings call, CEO Tim Cook confirmed that the company is working on AI features for its devices that will be announced later this year. Apple is expected to incorporate gen-AI features into its virtual assistant Siri and Messages app for features like text summarisation, suggestions and more. Similarly, other services across Apple's platform, such as Apple Music, Pages and Keynotes, will likely get the AI treatment, too.