Let AIGC perform "Alone on the River Tower"...

Robert’s MetaMask
2023年9月29日
cover

Another year of the Mid-Autumn Festival has arrived. Regarding the Mid-Autumn Festival, there is a poem that I have always liked since my high school days:


"Alone on the river tower, thoughts go far away,

Moonlight like water, water like the sky.

Where are the people who come to admire the moon together?

The scenery vaguely resembles last year."


So what effect would it have if we use AIGC to interpret this poem? And what are the differences in the visuals generated by different AIGC engines? Let's experiment.


Input the poem directly

First, directly input Chinese poetry as a prompt for the experiment. It can be expected that the effect will not be very good. A few months ago, in one of my tests, Dall-E understood Chinese the best, while others were not as good.


Dall-E 2

image.png

The basic content has been drawn, but it completely lacks the artistic conception it should have.


Dall-E 3

Image

Mid journey

Image

As always, MJ's support for Chinese promt is very poor. Among the four pictures, the first one is somewhat close (but without the moon, it is basically a failure). But the next three, especially the fourth one, are completely nonsense.


Stable Diffusion

Generated by Poe's Stable Diffusion XL Bot just announced yesterday. Generated 4 times and selected the best one.

Image

Created a Chinese painting! The content is satisfactory, featuring the unique scene of “standing alone on the river pavilion”, but the crucial element of “moonlight like water, water like the sky” is missing.


LLM + AIGC

Quite simple, first use LLM to understand the text of the poetry, generate an English descriptive prompt, and then let each model generate. (This can also be considered as a realization of the so-called 'multimodal'.)

I tried direct content translation as the prompt for AIGC, and wrote an LLM prompt to understand the content and generate an AIGC prompt, and made a comparison.

I wrote a brief prompt for understanding poetry, it should be able to improve if refined.

Please briefly describe in English the scenery depicted in the following poem sentences.


Literal translation:

Alone on the Chinese riverside pavilion, my thoughts soar like the vast expanse, Moonlight like water, water like the sky. Where are the people who came here to admire the moon? The scenery faintly resembles that of last year.


Interpretation version of the prompt:

The poet is alone on a Chinese river tower, feeling the scenery is ethereal. The moonlight is reflected on the water, making it look like the sky. The poet wonders where the person who came to admire the moon with them is. The scenery seems vaguely reminiscent of last year.



Dall-E

Direct translation:

image.png

Interpretation:

image.png


Mid journey

Direct translation:

Image

Interpretation:

Image


Stable Diffusion

Generated 4 times, picked my favorite one.


Direct translation:

Image

Interpretation:

Image


Conclusion

Using LLM to preprocess before feeding it to AIGC can significantly improve the quality of output generated (even if it is just for translation, as MJ, SD, and others may be too weak for non-English prompts). This is obviously unquestionable. For Open AI's AIGC, this step may not be as crucial, but the potential for improvement is also evident.


In terms of visual quality, Dall-E is the roughest, MJ has a somewhat heavy artistic feel, the first time you will feel amazed, but after seeing it multiple times, you will find it somewhat monotonous and overly artistic. SD's quality is not very stable, but you can choose the ones you like.

你获得 0 积分