Google’s AI Parti relies on over 20 billion inputs to create photorealistic images

A couple of months ago, Google presented another example of its keen interest, trust, and heavy investment in artificial intelligence (AI). Pathways Autoregressive Text-to-Image (Parti), which Google revealed on June 23, 2022, is Google’s newest text-to-image generator AI that relies on 20 billion inputs to create photorealistic images and can “accurately reflect world knowledge”.

While Imagen and DALL·E 2 are diffusion models, Parti follows in DALL·E’s footsteps as an autoregressive model. Although architecture and training methods may differ, the objective of all these models, including Parti, is to generate detailed images based on the user’s text input.

What is the working process of the “Parti model”?

In the beginning, Parti’s approach converts a collection of images into a sequence of code entries, similar to puzzle pieces. Then, it translates a given text prompt into these code entries and creates a new image.

The process trains computer models by adding “noise” to an image so that it’s obscured; it’s like static on a television screen. The model then learns to decode the static to re-create the original image. As the model improves, it can turn what looks like a series of random dots into an image.

According to Google, Parti text-to-image computer model renders hyperrealistic images by studying tens of billions of inputs. It studies sets of images, which Google calls “image tokens,” using them to construct new images. Parti’s images become more realistic when it has more parameters – tokens and other training material – to review. The model studies 20 billion parameters before generating a final image.

Parti uses an autoregressive model that, according to Google, can “benefit from advances in large language models.” On the other hand, Imagen uses Diffusion, where the model learns to convert a pattern of random dots into images.

Researchers created four model sizes of Parti. The models include parameter counts at 350 million, 750 million, 3 billion, and 20 billion. They trained those models using Google Cloud TPUs which were able to easily support the creation of these huge model sizes. Several comparisons between the model sizes are provided on the website.

Similar to all the other text-to-image generators out there, Parti also struggles in a variety of similar ways like incorrect object counts blended features, incorrect relational positioning or size, not handling negation correctly, and so on.

Will Parti be available publicly?

No. Google isn’t currently releasing Parti or Imagen to the public because AI data sets carry the risk of bias. Because human beings create the data sets, they can inadvertently lean into stereotypes or misrepresent certain groups. Google says both Parti and Imagen carry bias toward Western stereotypes.

Stating that these models have many limitations, Google writes, “neither they can reliably produce specific counts of objects (e.g. “ten apples”), nor place them correctly based on specific spatial descriptions (e.g. “a red sphere to the left of a blue block with a yellow triangle on it”)”.

According to Google, these behaviors are a result of several shortcomings, including lack of explicit training material, limited data representation, and lack of 3D awareness. “We hope to address these gaps through broader representations and more effective integration into the text-to-image generation process”, Google has written.

At its I/O developer conference in May, CEO Sundar Pichai said AI is being used to help Google Translate add languages, create 3D images in Maps and condense documents into quick summaries. “The progress we’ve made is because of our years of investment in advanced technologies, from AI to the technical infrastructure that powers it all”, said Pichai.

Parti and Imagen aren’t the only text-to-image models around. Dall-E, VQ-GAN+CLIP, and Latent Diffusion Models are other non-Google text-to-image models that have recently made headlines. Dall-E Mini is an open-source text-to-image AI that’s available to the public but is trained on smaller datasets.

Recommended:

Investment in AI

Google has invested heavily in artificial intelligence (AI) as a way to improve its services and develop ambient computing, a form of technology so intuitive it becomes part of the background. As per a report on Apr 13, 2022, Google was to invest $9.5 billion in US data centers, and offices – AI Business. In 2021, the global total corporate investment in artificial intelligence (AI) reached almost 94 billion U.S. dollars, a significant increase from the previous year.

Amazon and IBM are two of the biggest companies investing in AI. With Evi Technologies in 2012, when William Tunstall-Pedoe first built “Evi,” a virtual assistant, he then didn’t know that she would eventually become “Alexa.” One year later, Amazon bought the Cambridge, England-based company for more than $26 million, eventually using its A.I. On the other hand, IBM has been a leader in the field of artificial intelligence since the 1950s. The company extensively invested in its cloud and AI services, with an investment of US$3.3 billion in net capital expenditures.

Conclusion

This shows that the future of artificial intelligence is still a big question, and this is why Google has not released Parti or Imagen to the public. Researchers and companies are still finding ways how to make AI more user-friendly and bias-free.

Data from both variables will be of immense importance, but there may also be ethical issues too. Research, along with ethical concerns can still be meaningful in tackling these questions, as long as it’s properly done in a safe environment.