Semantic Image and Text Alignment: Storyboard Synthesis

17 Feb 2024

Introduction

In this transformative era of advertising and recognizing the potential for technology to streamline and enhance the ad creation process, Adludio is embarking on an ambitious initiative to automate the end-to-end process of advertising production.

Adludio is a well renowned company in the business of advertisement creation for different customers across the globe. One aspect involves automation of this process so that the generation of potential creative concepts based on the client’s brief. This transformation process aims to visually depict the narrative flow and user interaction within advertisements, making the conceptualization of digital campaigns both more intuitive and impactful.

Our task for this week, as part of this transformative process, is to architect and develop a cutting-edge machine learning solution that automates the conversion of textual advertisement concepts, assets descriptions into visually compelling storyboards.

Overview

Bring the concept to life, we will be using the perks of machine learning and diffusion models broadly. As a general scheme for the workflow, we have divided the project into three tasks for ease of approach. 1) Image Generation and EDA analysis on the given Assets

This task involves the generation of images using a model by adding the
concepts or ideas given as data. These concepts need some adjustment before they are handed out to the LLMs for image generation.
As for the EDA, that is a crucial step to help us identify the positions and
dimensions of images we find on a certain image on a Storyboard(either the landing page or the Ending page). We can use the image search on image to check and examine the segmented images.

2) Image Composition

This will have a composition of the several generated images and integrate one or two images on a single frame.

3) Storyboard Creation

This is the final output as this will depict the final product. A story board is nothing bu s a sequence of narration that will be visualized on a board.

All tasks have different under lists and we will approach all as we dive deeper into the project.

Implementation

Image Generation

For Image generation, we used the Stable Diffusion Model to create a good and quality image from a given prompt.

EDA

The EDA was only performed to analyze the position and dimension of segments on a given image. Particularly, either on the Landing Page and End Frame. The two pages are found on the beginning and the end of the creatives (Ads). This means that they contain all the other images integrated such as Text image, Buttons,…etc.

Prompt Engineering

The better the prompt, the better the quality and realism of the image that will be generated by the Image Generating Model. And so, we were able to give emphasis on this particular activity as this is the heart of the all the creatives to be generated from our description set.

A concise and descriptive prompt can help us achieve this and we were able to make use of the perks of Open-AI for this in helping us automate and generate a better prompt.

Result

The Generated images have improved a lot in-terms of quality and also realism. Because after the altercation of prompts we were able to get more realistic images and they are some how considered an achievement.

Conclusion

The task was very enticing as we got the chance to explore more on Machine Learning as well as Computer vision and Deep learning. These are very interesting concepts. They have so much potential and seem very relevant in the near future for further advancements in the AI field.

We also got the chance to work on prompt engineering and that also is a very lucrative sector as that is now in demand for the betterment of LLMs and their generating application

Due to time constrains, the achievements were very much limited. But given more time and resource a better storyboard that can be generated from a trained model can be implemented for creating captivating creatives.