The AI imagery rivalry is heating up. In recent years, neural network-based text-to-image generators have made amazing progress. The latest, Imagen from Google, follows DALL-E 2, which was revealed by OpenAI in April.
Both techniques turn text instructions into images. Google researchers, on the other hand, say that their technology provides “unprecedented photorealism and deep language understanding.”
As a text encoder, the Imagen system employs a big pre-trained language model. A series of diffusion models then convert the user’s words into images.
Imagen “significantly outperformed” DALL-E 2 in tests, according to the Google team.
Imagen’s creators have even devised a new method of determining their creation’s superiority.
The benchmark, dubbed DrawBench, evaluates human assessments on the outputs of various text-to-image generators.
Unsurprisingly, Google’s statistic gave Google’s system high marks. “With DrawBench, extensive human evaluation shows that Imagen outperforms other recent methods by a significant margin,” the researchers wrote in their study article.
The visuals and data are obviously amazing, but Google hasn’t provided a way to evaluate the results.
You can attempt various interactive examples on the Imagen website, however you can only utilize a limited number of phrases to build a confined sentence.
Cynics will suspect that Google is cherry-picking the findings until the model and code are made public. One issue with appraising these AI creations is that both companies have refused to offer public demos that would allow researchers and others to test them. Part of the rationale for this is concern that the AI could be exploited to make deceptive visuals, or that it could simply produce dangerous effects.
Google’s reasoning for keeping the model private is similar to OpenAI’s: the system is too risky to reveal.
According to the researchers, generative methods can spread disinformation, incite hostility, and worsen marginalization.
“Our preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes,”Researchers
The team finds that Imagen “is not suitable for public use at this time,” although it does provide the possibility of a future release.
With trepidation, I await their update. As for someone who makes photographs for stories on a daily basis, the prospect of AI laboratories vying to provide better outcomes seems appealing.
Despite this, the friendly competition among the large corporations is expected to imply that technology continues to improve quickly, as tools developed by one company can be included into the future model of another. In the past year, for example, diffusion models, in which neural networks learn how to reverse the process of adding random pixels to a picture in order to improve it, have showed promise in machine-learning models. DALL-E 2 and Imagen both rely on diffusion models, which were shown useful in less powerful models like OpenAI’s Glide picture generator.
On the other hand, we wouldn’t want our robot overlords to use algorithms to replace artists.