CTWC Stream Transparency/Cropping effect

Brandon Arnold · 18 Oct 2015

Hi all,

I wonder how they set the NES Tetris scene up in the Tetris Championship stream. Here's an example of it on their practice stream video. They cropped out the playfield and stats, and overlaid them on the face cam with a transparency effect. Any idea what kind of production software can be used for that?

-Brandon

Qlex · 18 Oct 2015

If I'm not mistaken, they have a pretty powerful algorithm made by one of the people (Trey) that recognizes the playfield. This way it can actually display any kind of info with the best possible quality, because they know all the necessary infos in the game. They could even change the shape of the blocks and whatnot, but they decided to make it classic, because they're the classic TWC I suppose '

Brandon Arnold · 18 Oct 2015

Qlex said: ↑

If I'm not mistaken, they have a pretty powerful algorithm made by one of the people (Trey) that recognizes the playfield. This way it can actually display any kind of info with the best possible quality, because they know all the necessary infos in the game. They could even change the shape of the blocks and whatnot, but they decided to make it classic, because they're the classic TWC I suppose '
Click to expand...

Huh. I wonder if The Tetris Company is okay with people doing that. I mean, normal people, not the CTWC crew...

-Brandon

Sumez · 23 Oct 2015

Curious how they do it - not so much the algorithm itself, that stuff seems pretty simple to me, but the actual implementation.
Is it a plugin to OBS, or are they just running an external program that's actually able to read the captured image?

t2k · 6 Nov 2018

Hey guys, I developed the software used to produce the CTWC 2015 stream - glad to see the appreciation for it here.

It was built using a modular/node-based programming platform called "Salvation", which I have been developing for the past 15 years (in C++). In that time I've built up a large collection of "modules" in Salvation which handle things like live video capture, image loading, video file loading, output to OpenGL displays, processing with OpenGL GLSL shaders, and a ton of math/control value manipulations. Salvation is also the core/engine behind the "Ai" media server sold by Avolites Media out of the UK - http://www.avolites.com/products/video/media-servers/ex8

All of the additions specific to processing live video from NES Tetris were added in the months leading up to the competition. There are no pixels from the NES that end up in the final composited output - after analysis, all of the graphics and text are regenerated/drawn in OpenGL at varying sizes, depending on the layout (all 4 players or 1 on 1).

In the planning of the tournament we did submit screenshots and demo videos of the layout to Blue Planet and got their approval for it.

I could go into way more detail but I don't know how much you guys want to know =) if you have more questions feel free to ask.

Brandon Arnold · 28 Oct 2015

Love it! Nice work. Thanks for the info. This is a lot of the same stuff I'm getting started on for TGM. Salvation is pretty much all proprietary, which is the platform your Tetris imaging analysis stuff is written in, right?

I'm working on a way to record moves and both display them and peform analysis. My prelim research is to use C++ libraries libVLC or libUVC for capturing the frames (from a UVC stream or a video) and CImg for analyzing the video frames. I've also got a good start on a TGM engine in Scala to calculate the best next move, but I need data to work with, hence the first step of being able to record the moves.

Anything you can give me to help?

-Brandon

t2k · 6 Nov 2018

Yeah it's all proprietary. I'm not familiar with the libraries you mentioned but any open source thing that simplifies the process of accessing live frames is a good idea. In my software (which runs primarily on Windows) I use DirectShow for live video input, which works well enough with the capture thingy I used (Hauppauge Live USB 2) although I did have to add a bit of code to do annoying crap in DirectShow 'graph' land like adding crossbars.. or something, I thankfully don't remember that well hah.

I don't have any code I can share with you but once you get the capture and display stuff going the analysis should be much more interesting to write. TGM I believe has various backgrounds and extracting the game field from the background there may be complicated.. Looking forward to see what you come up with, post videos of your progress somewhere =)

zid · 28 Oct 2015

t2k said: ↑

Yeah it's all proprietary. I'm not familiar with the libraries you mentioned but any open source thing that simplifies the process of accessing live frames is a good idea. In my software (which runs primarily on Windows) I use DirectShow for live video input, which works well enough with the capture thingy I used (Hauppauge Live USB 2) although I did have to add a bit of code to do annoying crap in DirectShow 'graph' land like adding crossbars.. or something, I thankfully don't remember that well hah.

I don't have any code I can share with you but once you get the capture and display stuff going the analysis should be much more interesting to write. The nice thing about NES Tetris is that the game is on a black background. TGM I believe has various backgrounds and extracting the game field from the background there may be more complicated.. I'm able to look at a single pixel per block and figure out which block type it is (there are 3 possible blocks in NES Tetris, and the colors change with each level). I imagine you'd probably need to scan a bit more than 1 pixel to be sure you're actually detecting a block, and not falsely detecting something from the background. Looking forward to see what you come up with, post videos of your progress somewhere =)
Click to expand...

As I mentioned to steadshot, for TGM i'd probably be easiest to rescale the image such that each block ends up as 1 pixel in the resulting image. Then the image processing code remains fixed and simple, and the mess and guts live in the scaler code.

t2k · 28 Oct 2015

zid said: ↑

As I mentioned to steadshot, for TGM i'd probably be easiest to rescale the image such that each block ends up as 1 pixel in the resulting image. Then the image processing code remains fixed and simple, and the mess and guts live in the scaler code.
Click to expand...

That seems like it should work if the colors you are matching against vary by an amount that is greater than the error that will be introduced when the edges of neighboring blocks get blurred together during the rescale process. If I were writing this, my gut tells me to do it the same way I did the NES analyzer, but read a chunk of pixels at the block center and take the average value, rather than reading just a single pixel.

zid · 28 Oct 2015

t2k said: ↑

That seems like it should work if the colors you are matching against vary by an amount that is greater than the error that will be introduced when the edges of neighboring blocks get blurred together during the rescale process. If I were writing this, my gut tells me to do it the same way I did the NES analyzer, but read a chunk of pixels at the block center and take the average value, rather than reading just a single pixel.
Click to expand...

But you are reading a chunk of pixels and taking an average, that's what the rescaler does. Except now you can use any rescaler method you like, and the rest of the code stays the same, it's just an architectural trick.

t2k · 28 Oct 2015

zid said: ↑

But you are reading a chunk of pixels and taking an average, that's what the rescaler does. Except now you can use any rescaler method you like, and the rest of the code stays the same, it's just an architectural trick.
Click to expand...

A rescaler will blend the edges of your blocks together. Go ahead and do it however you want and let me know how it works, I'm just telling you how I'd do it =)

Qlex · 29 Oct 2015

Thank you for your input. Was about to get to you after CTWC/your maxout video, but I guess I can directly speak here!

As a matter of fact, some other people are working or have worked on a way to capture the field. The TGM1 problem came up, but it's not really problematic since the background is much darker than the actual pieces (with the exception of the dark blue J, sometimes), so it's still possible to do some math here. Also, it is TGM1-specific, TAP and TI don't have backgrounds within the play field, which eases the following problem a bit :

Some of us are mainly considering converting an invisible Tetris session into an actually visible one. The best shots that we have at it are looking at the piece flashing (this happens right when it gets locked), and looking at the color of the blocks when they clear, among other things.

One huge problem with looking at pixels : line clearing doesn't have a nice animation at all. In TGM1, the blocks pop out and fall to the bottom. In TAP and TI, the blocks explode and you have pixels corresponding to their colors for a brief moment. In some potentially fast downstacking sequences (like a center well in Shirase/Death) there could be "dead" pixels to look at which would make the recognition very confused. In @steadshot's (pretty good) algorithm here is a threshold that generally takes care of that, but we didn't get into limit cases like the crazy downstacking I talked about. Do you happen to have an idea about cases like these?

Also, on a related note, I was wondering how you capture text. This one is a tough cookie, because it's pretty difficult to differentiate fast enough, say, an S8 from a S7. A lot of pixels need to be looked at live, and sometimes that even gets conflicted by the line clear pixels. Jago (@K) did a lot of work here, but even then there are limits to the model because of having to know which pixel to make the algorithm look. I can imagine there are less problems in your case, but still they're not very big numbers so they're hard to differentiate. Which solution did you come up with?

Last question (maybe more to come ) : With the signal you're receiving, is every block located at the exact same pixel position? Do you need to calibrate?

Thanks in advance for your answers, CTWC was a very entertaining watch and the layout was pretty fantastic

Brandon Arnold · 29 Oct 2015

Qlex said: ↑

One huge problem with looking at pixels : line clearing doesn't have a nice animation at all. In TGM1, the blocks pop out and fall to the bottom. In TAP and TI, the blocks explode and you have pixels corresponding to their colors for a brief moment. In some potentially fast downstacking sequences (like a center well in Shirase/Death) there could be "dead" pixels to look at which would make the recognition very confused. In @steadshot's (pretty good) algorithm here is a threshold that generally takes care of that, but we didn't get into limit cases like the crazy downstacking I talked about. Do you happen to have an idea about cases like these?
Click to expand...

Hey there, Qlex--way cool. This is where I think video far exceeds static images. If there is a lot of variation in the past 10 frames of the playfield, you can reject all of them, until you get (say) five frames that have very little variation (during the piece spawn delay).

Qlex said: ↑

Also, on a related note, I was wondering how you capture text. This one is a tough cookie, because it's pretty difficult to differentiate fast enough, say, an S8 from a S7. A lot of pixels need to be looked at live, and sometimes that even gets conflicted by the line clear pixels.
Click to expand...

I have been thinking of using something like Tesseract for the timer OCR. For bigger text with fewer variations like the rank (S9, m, etc), you can take a monochrome screen grab of those cropped letters and process them as a "convolution" matrix on the real-time feed.

Also interested to hear Trey's perspective.

-Brandon

Muf · 29 Oct 2015

Moving to R&D.

steadshot · 29 Oct 2015

Qlex said:

Some of us are mainly considering converting an invisible Tetris session into an actually visible one. The best shots that we have at it are looking at the piece flashing (this happens right when it gets locked), and looking at the color of the blocks when they clear, among other things.

One huge problem with looking at pixels : line clearing doesn't have a nice animation at all. In TGM1, the blocks pop out and fall to the bottom. In TAP and TI, the blocks explode and you have pixels corresponding to their colors for a brief moment. In some potentially fast downstacking sequences (like a center well in Shirase/Death) there could be "dead" pixels to look at which would make the recognition very confused. In @steadshot's (pretty good) algorithm here is a threshold that generally takes care of that, but we didn't get into limit cases like the crazy downstacking I talked about. Do you happen to have an idea about cases like these?
Click to expand...

The method I've thought up so far (but which I haven't had the time to implement yet) works something like this:
You keep a history of the last few frames and you also keep track of the level counter and the next piece. As soon as the level counter changes (either a line clear or just the standard increment by 1) you check those last few frames for the first "usable" one (meaning the algorithm detects the piece and doesn't mistake the locking J for background or whatever). This way you avoid the fireworks from the line clear and you're also not forced to look for the lock flash that might not be there at all if the recording dropped these frames somehow. If it's a line clear, ignore the following increment of the level counter, it's just the next piece coming in. Fast combos aren't hard to deal with, either; when you know both the type of line clear (single, double, triple, tetris) and the piece that caused that line clear, you automatically know the placement, because there's only one possibility. A line clear followed by a very fast placement is the only potential problem, but I can think of a few ways to interpolate that placement regardless, e.g. by subtracting the playfield a few frames after the placement (but still before the placement after that, so the explosion will have faded away) from the playfield after the line clear. During level stops you could track changes in the block count.

Qlex said:

Also, on a related note, I was wondering how you capture text. This one is a tough cookie, because it's pretty difficult to differentiate fast enough, say, an S8 from a S7. A lot of pixels need to be looked at live, and sometimes that even gets conflicted by the line clear pixels. Jago (@K) did a lot of work here, but even then there are limits to the model because of having to know which pixel to make the algorithm look. I can imagine there are less problems in your case, but still they're not very big numbers so they're hard to differentiate. Which solution did you come up with?
Click to expand...

As Brandon pointed out this is not particularly difficult, either. My idea is to extract the contours of each digit and then match them with pre-calculated contours (within a certain tolerance) and I think it would easily work fast enough for live streaming, but I'm curious how quick using Tesseract would be.

t2k · 6 Nov 2018

Regarding the fast downstacking question, I think I am not familiar enough with TGM1 to offer any useful advice.. if you point me to a relevant video though I'd take a look.

Thinking more generally, if there is something characteristic about the piece lock down mechanism, you could use that to maintain your own internal representation of the board, and simply process line clears yourself when you detect that lines have been completed. So rather than scanning the field, you just keep track of every piece placement and maintain your own board. Looking at this video -
- it seems like detecting that white flash would actually be pretty easy.. if that flash always occurs when a piece is placed, this approach could probably be quite reliable. When you detect that piece placements have lead to a line clear, you could just turn your detection algorithm off for the appropriate number of frames so that the line clear animation doesn't confuse you... update your internal representation of the board, and carry on. For the actual piece-in-play, you would still need to scan the field but that should be easy enough.

Brandon Arnold · 30 Oct 2015

Great information, Trey. Thank you. I won't even try the Tesseract route, then--the numbers and ranks are all we're interested in, which even simplifies this a bit.

-Brandon

Qlex · 1 Nov 2015

Yeah, I'm considering using a kd-tree to store the data and optimize the character recognition algorithm, but from what Jago had said it looks pretty slow to try comparing 20x20 pixels. I'll give it a thorough try!

That was an example of a more extreme case where pixels can be a problem. I thought of looking at neighboring pixels or just taking more pixels into account for the average. Also this is a very extreme case, most of the time it actually is okay

And yeah knowing the state of the field beforehand definitely helps!

t2k · 6 Nov 2018

Hmmm interesting video, I see how it goes very fast and would be hard to detect... the lockdown white flash seems detectable but it doesn't actually do that when you do a line clear. Unfortunately I couldn't frame advance through the youtube video to see exactly what's going on during the line clear but if you want to send me a .mov/.mp4/something I can view offline I'll take a look.

Qlex · 6 Nov 2015

I'll try and give you a video when I get home, but you are indeed correct, during a line clear in TAP, it seems that the blocks of the mino that are not concerned by the line clear flash, but the others are used in the line clear sequence and don't flash. In case all the blocks are used (for example a tetris) there is no flash at all. I should also look up what happens in TGM3.

The only thing that can be useful is the line clear animation : You can distinguish some of the blocks according to their colors (but only one block out of two is used for the animation, it does a little checker pattern and the rest of the blocks simply disappears)

I'll look it up more in detail. In any case the piece of advice concerning scaling the font can be pretty useful, but I don't know algorithms that can do the job. Are they generally fast?

EDIT : A video I can give of high quality at the top of my head : https://www.dropbox.com/s/urp7w5vulgsr6eb/sick_death_recovery_gm.flv?dl=0

... I don't mean to brag, it was the only Death video I had of good quality '

CTWC Stream Transparency/Cropping effect

Share This Page

Useful Searches