Unsupervised learning of object representations from video sequences using attention over space and time

Grant US12586353B2 Kind: B2 Mar 24, 2026

Assignee

GDM Holding LLC

Inventors

Rishabh Kabra, Daniel Zoran, Goker Erdogan, Antonia Phoebe Nina Creswell, Loic Matthey-de-l'Endroit, Matthew Botvinick, Alexander Lerchner, Christopher Paul Burgess

Abstract

A computer-implemented video generation neural network system, configured to determine a value for each of a set of object latent variables by sampling from a respective prior object latent distribution for the object latent variable. The system comprises a trained image frame decoder neural network configured to, for each pixel of each generated image frame and for each generated image frame time step process determined values of the object latent variables to determine parameters of a pixel distribution for each of the object latent variables, combine the pixel distributions for each of the object latent variables to determine a combined pixel distribution, and sample from the combined pixel distribution to determine a value for the pixel and for the time step.

CPC Classifications

G06V 10/771 G06V 10/44 G06V 10/82 G06T 9/00 G06N 3/045 G06N 3/0455 G06N 3/0464 G06N 3/047 G06N 3/0475 G06N 3/0895 G06N 3/092 G06N 3/088

Filing Date

2022-05-27

Application No.

18289171

Claims

View original document →