It's indeed sequence of images. Actually a 1024x1024 spritesheet per recording which I think works best.
When someone is in the box they see themselves on the screen where they follow the basic instructions on when to turn the black and white sheet.
Most of the time other people are shouting black, white
And as with every screen we have "broken" pixels. But all this creates super nice variation and randomness.