Dall-E 3 + GPT-4V may bring an unprecedented wave of application innovation.

Robert’s MetaMask

Dall-E 3 has achieved the levels of quality in generating images comparable to Mid Journey and Stable Diffusion. Dall-E 2 falls far behind, but 3 has caught up completely. Moreover, Dall-E 3's understanding of text, control over details, and most importantly, its official API support will greatly surpass that hidden in Discord by MJ.


At the beginning, MJ using Discord to develop has its rationality, forming a user community through the Discord group. Users can see the graphics generated by others in the community and quickly understand MJ's capabilities, learning more prompts. This prevents the problem of blaming the tool for one's own lack of creativity. However, as time goes on, MJ has been staying in Discord and has not opened the API, so unbelievable!


In previous tests, we have found that using LLM to optimize first and then using LLM to assist in generating prompts for AIGC produces better image results than inputting prompts directly. It is evident that Bing's integrated Dall-E 3 first uses GPT-4 to generate prompts and also searches the web to better understand the content within the prompts. The results of this process are undeniably better.

GPT-4V is Open AI's latest model, GPT-4V(ision), which means it has the ability to understand "vision". This is the opposite direction of "text-to-image", understanding the meaning from images. This support will become very meaningful, as we will not only face LLM (Language Model) but also a Large Vision Model. The application scope suddenly becomes broader.

A recent article by Microsoft ( https://arxiv.org/pdf/2309.17421.pdf ) mentioned that GPT-4V has demonstrated unprecedented capabilities in understanding and processing arbitrary combinations of input images, sub-images, text, scene text, and visual pointers. GPT-4V also provides excellent support for techniques observed in LLM, including instruction following, thought chains, and contextual few-shot learning. This opens up infinite possibilities. If the original paper is too long to read, here is an article that provides a detailed introduction: https://mp.weixin.qq.com/s/8FtR6JcEFVcRLWCaANXQ6g .

I see a huge opportunity that Dall-E 3 + GPT-4V may bring an unprecedented wave of application innovation. The models provided by OpenAI are some basic building blocks that can be combined to form countless new applications. This era is approaching with great momentum, and innovators will have unlimited possibilities. It's so exciting!

你获得 0 积分