-->
OpenAI CEO Sam Altman has revealed the first major enhancement to ChatGPT's image-generation capabilities in over a year. The start-up is immediately incorporating this functionality, dubbed "Images in ChatGPT," into the platform. With GPT-4o, users can now create photos directly within ChatGPT. This initial edition is available across ChatGPT Plus, Pro, Team, and Free subscription tiers and focuses exclusively on picture creation.
Altman explained that GPT-4o, equipped with image output, "thinks" a bit longer than DALL-E 3 (OpenAI’s previous image generation model) to produce images that are more accurate and detailed.
This new image generator is part of the same model that generates text and code, as OpenAI trained the entire system to understand multiple forms of media simultaneously. This contrasts with DALL-E 3, which is a traditional diffusion transformer model designed to reconstruct images from text prompts by denoising pixels.
New Model Capabilities
According to a release from OpenAI, the models were trained on the joint distribution of online text and images, enabling them to learn how language connects to both visuals and images. Intensive post-training has made the final model exceptional in visual fluency, producing consistent, context-aware, and useful images.
Users can enhance photos through genuine dialogue in a chat context, allowing GPT-4o to expand on both text and images while maintaining consistency. Additionally, it can alter existing images, including those featuring people by modifying or "inpainting" details such as background and foreground objects.
For example, if you’re designing a video game character, the character’s appearance remains coherent across multiple iterations as you refine and experiment.
GPT-4o generates images with remarkable precision by strictly following the prompts with attention to detail. While many other systems struggle to accurately render scenes with as few as five to eight objects, GPT-4o can manage between 10 and 20. The tighter binding of objects to their traits and relations allows for better control. Additionally, GPT-4o can analyze user-uploaded photographs, seamlessly integrating their details into the synthesised image's context.
By natively linking its understanding of text and visuals, GPT-4o creates images that are both intelligent and contextually rich. Its ability to convincingly create or modify images stems from training on a wide variety of visual styles, further enhancing its overall effectiveness.