Skip to content

Eye on AI: Platonic banana trouble

Wandering Generalities

The individual clips are remarkable and jaw-dropping for the technology they represent, but the use of the clips depends on your understanding of implicit or explicit shot generation.

Suppose you ask SORA for a long tracking shot in a kitchen with a banana on a table. In that case, it will rely on its implicit understanding of ‘banana-ness’ to generate a video showing a banana.

Through training data, it has ‘learnt’ the implicit aspects of banana-ness: such as ‘yellow’, ‘bent’, ‘has dark ends’, etc. It has no actual recorded images of bananas. It has no ‘banana stock library’ database; it has a much smaller compressed hidden or ‘latent space’ of what a banana is.

Every time it runs, it shows another interpretation of that latent space. Your prompt replies on an implicit understanding of banana-ness.

FXguide: Actually Using Sora

“I have that within which passes show / these just the trappings and the suits of banana.”

I'd love to hear your thoughts and recommended resources...