The term “essay”, coined by Michel de Montaigne, derives from the French infinitive essayer: “to try” or “to attempt”. Which what this work literally is – an attempt to create a video using modern Artificial Intelligence algorithms for generating creative assets.
It all started with a short script draft I wrote and asked Chat GPT to improve. AI provided it’s own version of the text and among the “corporate propaganda” sounding sentences, I found several interesting phrase formulations and graceful verbal expressions, that were included to the final version of the script.
Then I used 11ElevenLabs service to generate the voice over. I was familiar with their parametric system, so it took only 28 takes to generate audio clips, suitable for assembling the final VO.
Music was the hardest part and at some point I was close to giving up on the attempt to generate the track. But after trying many services I finally chose SOUNDRAW . Their algorithms were able to generate professionally sounding compositions ready to be used as background music. With the help of their web interface – that seemed limited but was enough to change the key and layering of the track – I generated a set of compositions, that were edited into the final piece in the NLE.
Having the music set I was able to determine the pace of the video edit, and used Flim AI to find corresponding scenes in the famous video works and movies. The service provides stills by their natural language description along with the timecode of the clip. I encountered several cases of wrong or missing information in the outputs, but overall process was pretty straight forward.
Finally I used MidJourney to generate graphical assets for motion graphic design. Which I performed to create the intro sequence and a couple of video inserts and transitions meant to fill the gaps in the final cut.
The pace of modern advancements in generative neural network algorithms is astonishing! Though every aforementioned AI system can already be used in the professional post production process, they are still more of an advisors, brainstormers and junior artists in terms that their output still needs significant processing before getting in to final product. AI algorithms are still failing to deliver those detailed changes that make all the difference – details that distinguish sketch and final product. They are good generators, getting better and better at understanding natural language tasks, but human operator is still the main interpreter and integrator of their outputs.
Most of the promising AIs are versatile but their ability to get deeper into particular narrow topics is way too limited for industrial use. When generated picture is promising, but you can’t get the algorithms to further improve that output in the certain direction. Or when you like the musical theme but can’t achieve the right development or enough variation to create a set of tracks.
To my logic – the next evolutionary step of the generative neural networks development will happen when AIs will be able to provide output in the form of an industry standard project file. That change could greatly improve the productivity of creative professionals, taking out a lot of technical work, and providing more time for creativity and finishing.
21 Aug 2023