Time for an update. To take a break from trying to create videos from an existing script (which is still fairly time consuming becasue I’m learning the tools at the same time), I decide to play a bit.
I took an initial iClone render as first frame of the first and then I created a series of subsequent clips with for the most part the end frame of a clip being the start frame of the next. I rendered using Kling 3.0 at 720p and scaled up to 4K using Upscayl. For the environment I requested an “urban setting with a cyberpunk vibe”, which gave the look I wanted. Here is the initial frame:
To maintain quality, I enhanced the 720p resolution of the end frames using Topaz Gigapixel before using them as start frames. Here is the result:
I tried the prompts I gave to the AI to be relatively simple, unless the output was unsatisfactory and corrections were needed.
For the inital clip (done with Kling 2.5) the prompt was: “Young woman in short skirt and boots walks around in a fururistic urban area with a cyberpunk vibe. It is evening and there is a drizzle of rain.”
I then learned about chararacter references which I used from then on.
For the second clip I added some tension: “Young woman @sophia in short skirt and boots walks around in a fururistic urban area with a cyberpunk vibe. It is evening and there is a drizzle of rain. She occasionally looks over her shoulder as if someone is following her.
[expanded prompt] After 6 seconds she quickly turns into a side street and waits. After 8 seconds older man @yuri_cp_2 walks into the frame and starts to look around to find her.”
So in this case I imposed some control to the action. However, I didn’t include extensive camera direction because I wanted to see what the AI came up with.
Prompt for the third clip: “Futuristic urban area with a cyberpunk vibe.
Older man @yuri_cp_2 keeps looking around, then after 4 seconds, he shrugs his shoulders and walks away out of the frame.
At 6 seconds @sophia appears into the frame. She hesitates, looks left and right and then continues her walk.”
The prompt for the next clip is a bit different: “Futuristic urban area with a cyberpunk vibe. Young woman @sophia slowly leaves her hiding place in @Start image . She walks to a colorful main street as in @End image.
She moves past the camera, which slowly pans to reveal the street as in @End image.”
So here I included an End frame because I needed Sophia to end up at a specific place:
This image is a hybrid. The right side is the orginal end frame of the current clip. On the left side I added the bar and the remainder of the left side of the street. This was then combined in Photoshop. So the effect in the video is that she comes out of her hiding place and then sees the bar.
I created a model in iClone and then gave the render a “cyberpunk vibe” to make it match. Quite some work but also a fun problem to solve.
On to the next clip: “Futuristic urban area with a cyberpunk vibe. There is a drizzle of rain. @sophia steps from the deserted street shown in @Start image onto the sidewalk. @sophia continues walking to a bar called SINNERS located on the left side shown in @End image. @sophia then enters the bar.”
We finally made it into the bar, where Sophia takes a seat and then orders a glass of red wine. As the bar interior started as a 3D model, I provided specific Start and End frames, while describing the action.
Surprisingly, this took many retries. Having Sophia sit was not to hard: “Bar with a cyberpunk vibe as in @Start image. Young woman @sophia walks to the bar counter and sits down on one of the barstools. @casual_m_0009 sits at the end of the barcounter. He looks at @sophia briefly. @bartender addresses @sophia as in @End image to ask what she desires. A young woman sits in a booth in the background. Next to her stands a young African man.”
I can only have three character references, so the background characters have none, but they worked out. What turned out be a problem was additional characters or wallscreens so I had to include negative prompts.
The final clip is where the bartender serves Sophia a glass of red wine, which she then drinks. That was extremely hard. For some reason the bartender wanted a glass of wine too, even though I gave a negative prompt. In one generation, the bartender took a sip of the wine before giving it to Sophia! So in the end I split up the action: first the bartender places a glass of wine on the counter. Then in a second clip Sophia picks up the glass and drinks the wine.
I added a sound track using Krotos Studio Pro, which analizes the video and then creates a sound track split up in stems. It’s good as a start. One thing it has trouble with are footsteps, which are not synched well.