Google’s AI Parti relies on over 20 billion inputs to create photorealistic images

A cou­ple of months ago, Google pre­sent­ed anoth­er exam­ple of its keen inter­est, trust and heavy invest­ment in arti­fi­cial intel­li­gence (AI). Path­ways Autore­gres­sive Text-to-Image (Par­ti), which Google revealed on June 23, 2022, is Google’s newest text-to-image gen­er­a­tor AI that relies on 20 bil­lion inputs to cre­ate pho­to­re­al­is­tic images and can ‘accu­rate­ly reflect world knowledge’

While Ima­gen and DALL·E 2 are dif­fu­sion mod­els, Par­ti fol­lows in DALL·E’s foot­steps as an autore­gres­sive mod­el. Although archi­tec­ture and train­ing method may dif­fer, the objec­tive of all these mod­els, includ­ing Par­ti, is to gen­er­ate detailed images based on the user’s text input.

What is the working process of the “Parti model”?

Image Cred­it: Google

In the begin­ning, Parti’s approach con­verts a col­lec­tion of images into a sequence of code entries, sim­i­lar to puz­zle pieces. Then, it trans­late a giv­en text prompt into these code entries and cre­ates a new image.

The process trains com­put­er mod­els by adding “noise” to an image so that it’s obscured; it’s like sta­t­ic on a tele­vi­sion screen. The mod­el then learns to decode the sta­t­ic to re-cre­ate the orig­i­nal image. As the mod­el improves, it can turn what looks like a series of ran­dom dots into an image.

Accord­ing to Google, Par­ti text-to-image com­put­er mod­el ren­ders hyper­re­al­is­tic images by study­ing tens of bil­lions of inputs. It stud­ies sets of images, which Google calls “image tokens,” using them to con­struct new images. Par­ti’s images become more real­is­tic when it has more para­me­ters — tokens and oth­er train­ing mate­r­i­al — to review. The mod­el stud­ies 20 bil­lion para­me­ters before gen­er­at­ing a final image.

Par­ti uses an autore­gres­sive mod­el that, accord­ing to Google, can “ben­e­fit from advances in large lan­guage mod­els.” On the oth­er hand, Ima­gen uses Dif­fu­sion, where the mod­el learns to con­vert a pat­tern of ran­dom dots into images.

Researchers cre­at­ed four mod­el sizes of Par­ti. The mod­els include para­me­ter counts at 350 mil­lion, 750 mil­lion, 3 bil­lion, and 20 bil­lion. They trained those mod­els using Google Cloud TPUs which were able to eas­i­ly sup­port the cre­ation of these huge mod­el sizes. Sev­er­al com­par­isons between the mod­el sizes are pro­vid­ed on the website.

Sim­i­lar to all the oth­er text-to-image gen­er­a­tors out there, Par­ti also strug­gles in a vari­ety of sim­i­lar ways like incor­rect object counts, blend­ed fea­tures, incor­rect rela­tion­al posi­tion­ing or size, not han­dling nega­tion cor­rect­ly and so on.

Related Reading:

Like Ima­gen, Google has decid­ed not to release Parti’s “mod­els, code, or data for pub­lic use with­out fur­ther safe­guards in place.” And, all images are water­marked in the bot­tom-right corner.

Cur­rent mod­els like Par­ti are trained on large, often noisy, image-text datasets that are known to con­tain bias­es regard­ing peo­ple of dif­fer­ent back­grounds. This leads such mod­els, includ­ing Par­ti, to pro­duce stereo­typ­i­cal rep­re­sen­ta­tions of, for exam­ple, peo­ple described as lawyers, flight atten­dants, home­mak­ers, and so on, and to reflect West­ern bias­es for events such as weddings.

Google is explor­ing this area and says tools like these can unlock joint human/computer cre­ativ­i­ty. Google wrote on its blog, “Our goal is to bring user expe­ri­ences based on these mod­els to the world in a safe, respon­si­ble way that will inspire creativity”.

“Text-to-image mod­els are excit­ing tools for inspi­ra­tion and cre­ativ­i­ty. They also come with risks relat­ed to dis­in­for­ma­tion, bias and safe­ty. We’re hav­ing dis­cus­sions around Respon­si­ble AI prac­tices and the nec­es­sary steps to safe­ly pur­sue this tech­nol­o­gy”, Google added.

Will Parti be available publicly?

No. Google isn’t cur­rent­ly releas­ing Par­ti or Ima­gen to the pub­lic because AI data sets car­ry the risk for bias. Because human beings cre­ate the data sets, they can inad­ver­tent­ly lean into stereo­types or mis­rep­re­sent cer­tain groups. Google says both Par­ti and Ima­gen car­ry bias toward West­ern stereotypes.

Stat­ing that these mod­els have many lim­i­ta­tions, Google writes, “nei­ther they can reli­ably pro­duce spe­cif­ic counts of objects (e.g. “ten apples”), nor place them cor­rect­ly based on spe­cif­ic spa­tial descrip­tions (e.g. “a red sphere to the left of a blue block with a yel­low tri­an­gle on it”)”.

Accord­ing to Google, these behav­iors are a result of sev­er­al short­com­ings, includ­ing lack of explic­it train­ing mate­r­i­al, lim­it­ed data rep­re­sen­ta­tion, and lack of 3D aware­ness. “We hope to address these gaps through broad­er rep­re­sen­ta­tions and more effec­tive inte­gra­tion into the text-to-image gen­er­a­tion process”, Google has written.

At its I/O devel­op­er con­fer­ence in May, CEO Sun­dar Pichai said AI is being used to help Google Trans­late add lan­guages, cre­ate 3D images in Maps and con­dense doc­u­ments into quick sum­maries. “The progress we’ve made is because of our years of invest­ment in advanced tech­nolo­gies, from AI to the tech­ni­cal infra­struc­ture that pow­ers it all”, said Pichai.

Par­ti and Ima­gen aren’t the only text-to-image mod­els around. Dall‑E, VQ-GAN+CLIP and Latent Dif­fu­sion Mod­els are oth­er non-Google text-to-image mod­els that have recent­ly made head­lines. Dall‑E Mini is an open-source text-to-image AI that’s avail­able to the pub­lic, but is trained on small­er datasets.


Investment in AI

Google has invest­ed heav­i­ly in arti­fi­cial intel­li­gence (AI) as a way to improve its ser­vices and devel­op ambi­ent com­put­ing, a form of tech­nol­o­gy so intu­itive it becomes part of the back­ground. As per a report on Apr 13, 2022, Google was to invest $9.5 bil­lion in US data cen­ters, offices — AI Busi­ness. In 2021, the glob­al total cor­po­rate invest­ment in arti­fi­cial intel­li­gence (AI) had reached almost 94 bil­lion U.S. dol­lars, a sig­nif­i­cant increase from the pre­vi­ous year.

Ama­zon and IBM are two of the biggest com­pa­nies invest­ing in AI. With Evi Tech­nolo­gies in 2012, when William Tun­stall-Pedoe first built “Evi,” a vir­tu­al assis­tant, he then did­n’t know that she would even­tu­al­ly become “Alexa.” One year lat­er, Ama­zon bought the Cam­bridge, Eng­land-based com­pa­ny for more than $26 mil­lion, even­tu­al­ly using its A.I. On the oth­er hand, IBM has been a leader in the field of arti­fi­cial intel­li­gence since the 1950s. The com­pa­ny exten­sive­ly invest­ed in its cloud and AI ser­vices, with an invest­ment of US$3.3 bil­lion in net cap­i­tal expenditures.


This shows that the future of arti­fi­cial intel­li­gence is still a big ques­tion, and this is why Google has not released Par­ti or Ima­gen to the pub­lic. Researchers and com­pa­nies are still find­ing ways on how to make AI more user-friend­ly and bias-free.

Data from both vari­ables will be of immense impor­tance, but there may also be eth­i­cal issues too. Research, along with eth­i­cal con­cerns can still be mean­ing­ful in tack­ling these ques­tions, as long as it’s prop­er­ly done in a safe environment.

Leave a Reply

Your email address will not be published.

Join our NewsletterDaily Glimple of Future

Our blog, "Daily Glimpse of Future", strives to make the future much clearer than it is today. Join our newsletter for free now.