Strategy: More on Playing Forever.

Thread in 'Discussion' started by Zaphod77, 18 Dec 2014.

  1. Zaphod77

    Zaphod77 Resident Misinformer

  2. I think it might be worth mentioning that it should not be too difficult to obtain a large number of
    partial loops (using exhaustive methods). It is something that should be very interesting to do
    as a project but it doesn't seem like I would have enough free time to do it for quite a while. So I
    might as well describe it here.

    I will try to keep the description very brief and leave large amount of details. The reason for
    leaving some of the details are:
    -- some are simply not relevant to the specific problem
    -- some I have forgotten. I could re-call them but would require sitting down and spelling out the
    details in writing (as I never made any detailed notes about it)
    -- some details might be suited for a later post as this one is already going to be way too long
    -- many of the things I don't know

    The basic idea is that any finite memory version of a game can ofc be described as a state diagram.
    Of course, that's how actually how games run on computers. We don't concern ourselves here with
    the description of the game (which may or may not be finite), but how it actually runs on a computer
    (which would have finite memory in "real" life). Secondly, we shall concern ourselves with the non-deterministic variant of a finite memory game. That is, a unique action could take us into any of a set of states (rather than a unique state).

    Now being done with this there are two main variants one could consider:
    1) games with 'WIN' and 'LOSE' states
    2) games with just a 'LOSE' state
    The first category is more traditional whereas the second category corresponds to "endless modes"
    style of play. It is the second category which we are concerned with here.

    One could then consider some further sub-variations (based on classification of states) in this, but I
    will describe one for each.
    1)
    Terminal States: WIN, LOSE (I capitalized the terminal states to distinguish them from other states
    with same or similar names). We assume that there is always some path to both the WIN and LOSE states
    from any in-game state.

    Ingame States: W (win states), WL (win-lose states)
    W-states describes positions where there exists a strategy that would always guarantee us to win the game
    WL-states describe positions from where we might or might-not win. That is, at least a good strategy will give you some chance to win (depending on non-deterministic behavior).


    There are two "kinds" of states that I have "eaten" up here. One is:
    -- lose states (written as L) where you lose for sure no matter what you do. We just sort of assume this state
    has been amalgamated with LOSE state(terminal one).
    -- neutral states (written as N) where you are stuck. That is, both the WIN and LOSE states have become
    unreachable from here.

    Since we assume a path to both WIN and LOSE states from any in-game states, that's why these states
    won't appear in our game.

    2) These are games which we are concerned with here.
    Terminal States: LOSE

    In-game States: O (orbit state), OL (orbit-lose state)

    Once again we have "eaten" up a kind of state here. That is:
    -- lost state(written as L) -- description given before

    Note that there is no "stuck" state that has been "eaten" here (because that falls into category
    O-orbit state). In general though, we could assume (if we want to) that we always have a path to the
    terminal LOSE state.
    ==========

    Now having described all this in detail we come to the topic why we can't always do this sort of thing
    in practice. Of course, the answer is combinatorial explosion (as with the practical solutions of
    many finite problems). This is a topic I know very little about. However, I am still reasonably
    confident that we should be able to large number of partial solutions to randomizers that are easier
    to play (like 1-bag, 2-bag, and other relatively easy-to-play randomizers). This is something that
    I will try to justify using qualitative descriptions (towards the end of post).

    First some technical stuff about how to actually find all these kind of states that I have been
    talking about.
    1) For the first type of games, how we can give specific well-defined criteria of when a specific state
    is a W-state or WL-state.

    Basically how it works it is that we try all possible sub-sets of the game-states(in-game states). Ofc
    the number of sub-sets are exponential w.r.t. number of game-states.

    A given sub-set of game states-A can be called a Winning Set when:
    -- There exists some choice of inputs, one (or more) action/input for each state (that is for every element of A), that will take us to "some subset of A" UNION {WIN}.

    The main point is:
    -- All members of a winning set are W-states(from where there exists a guaranteed strategy to win).

    Now consider a set of game states-B that we know for sure to be a winning set. Consider some
    specific state(say s). The state "s" is a W-state (or alternatively "B UNION {s}" is a winning set)
    when:
    -- There exists some input/action for state s that will take us to a subset of "B UNION {s}".

    Essentially when we know some states to be W-states, we keep trying more and more states to find
    whether they are W-states. We could also specify criterion for WL-states too (which I leave out for
    this post).

    Our final goal is to find all the W-states and WL-states.

    2) Coming back to these types of games:
    We try all possible sub-sets of the game-states(in-game states).
    A given sub-set of game states-A can be called a Orbit Set when:
    -- There exists some choice of inputs, one (or more) action/input for each state (that is for every element of A), that will take us to a sub-set of "A".
    -- All members of an Orbit set are O-states(from where there exists a guaranteed strategy to loop forever)

    Now consider a set of game states-B that we know for sure to be a Orbit Set. Consider some
    specific state(say s). The state "s" is a O-state (or alternatively "B UNION {s}" is an Orbit Set)
    when:
    -- There exists some input/action for state s that will take us to a subset of "B UNION {s}".

    Essentially when we know some states to be O-states, we keep trying more and more states to find
    whether they are O-states. We could also specify criterion for OL-states too (which I leave out for
    this post).

    Our final goal is to find all the O-states and OL-states.

    ==========

    Now finally trying to comeback to our original problem of loops, and whether we can find many of
    them in a not so unreasonable time for easier randomizers.

    We first specifiy some specific rules to make things more concrete. For example, something like:
    -- 1 preview ahead
    -- no softdrop, only direct harddrop (to make things simpler)
    -- have some simple mapping between action and tetromino rotation+placement

    We begin from the empty stack and keep adding states as we encounter more and more blocks. States that
    lead to LOSE or are close to lose (based on backtracking) are checked for L-states and OL-states to
    get some initial points.

    Opening exhaustively (based on min distance from start) might not be a good idea (given the ridiculous number
    of total states reachable from start). A better idea seems to be open the points that are better
    poised to loop first (see below). On another extreme open points that are better and better poised
    to lead to OL-states and L-states (the ones for which gap increase almost linearly with number of blocks
    placed).

    Now the main point of finding O-states. Well, the thing is that more of orbit states we know at a given
    point we are in a better position than when we know less. For cases like easy randomizers, since we know
    O-states should exist, we try for them in more obvious places first.

    We roughly categorize how good our stack is at some point by some kind of a number (to give us some heuristic guidance). One would note that
    one very good number for our case could be the number of gaps. Any closed shapes (inside stack) count
    as a gap. Similarly shapes that are unreachable via simple harddrop count as gap(even when it isn't
    directly closed by stack). For something as simple as 1-bag, most of the game can be played in 0,1,2 gaps etc.
    For 1-bag, we could further take advantage by putting in the known O-states as hardwired.
    Similarly states that correspond to high number of gaps are better candidates to select for L and OL.

    Other obvious checks also need to be in place. For example, starting from start state(and giving it number 0)
    and then giving the set of (new) states discovered in successive iterations 1,2,3, etc. It is obviously
    useless trying any sub-set of states that are from 1 and 3 as Orbit States.

    Even finding partial solutions will still be taxing. But if you did a continuous computation
    (that could be resumed) with a well-made heuristic and all the basic checks, partial solutions would start
    turning up. I am not certain about the TOTAL number of reachable states though. Whether they would
    just be very very high or just prohibitively large. Finding OL and L points seems of some interest
    if one knows that reachable states are not prohibitively large (which they might be).

    On that point, how should a loop solution be organized. I think a good way would be that
    once a user starts the program, at each turn it asks them to select a block (allowed by rule of randomizer).
    Then the program shows all the highlighted position/rotations to user for those parts where it has confirmed
    a loop. The user selects a particular input corresponding to a specific placement and then the program
    proceeds again asking them to select a block.
    Or possibly, the user could ask the program to show them some random loop.

    Edit: Corrected a mistake and underlined it.
     
    Last edited: 19 Jan 2015
  3. Looking back at the previous post it seems the description is still too vague because I didn't clarify quite a few things like:
    -- how to precisely define a non-deterministic finite memory game
    -- the role of transitions, their relations to states
    -- a "complete" description of a specific category (like the second one in the last post)
    -- the lack of clarification of implicit point that the descriptions are descriptive, not prescriptive. Ofc when one understands the descriptions various algorithms can be made to find the points of a specific type (or possibly a complete solution for smaller games). Undoubtedly some of the algorithms will be better than others in terms of efficiency.

    I will do this later. But a few asides first before I try to describe these.

    Keeping these two points in mind(that are specific to the problem) we note that we are withholding some states(that is in terms of min distance from start) and don't open them up (initially, before we can find other O-states). That is only open states that would lead to say 0 gaps (with 1-bag it seems fairly obvious, unless you start with S,Z,O you would find some way to loop with 0 gaps .... even with 2-bag I think it seems highly plausible -- excluding some initial tetromino strings).

    If this type of opening also gets too taxing, we could give further priority based upon positioning. Like suppose that stack is in a given state with 0 gaps. A particular tetromino (say I) is placed and with both rotational positions of I we can place it successfully (without creating a further gap). But the number of positions can in general be quite high. For example, with I vertical there are going to be like 10 positions, with I horizontal maximum of 7 positions (considering these positions will be multiplied with each tetromino this can get quite high). What we could do is ignore most of these positions initially and try to clear more lines first.
    Maybe the height of stack should be brought in here? If the stack is kept roughly leveled (small deviation from average height) a loop has a higher chance of occurring in less turns. Also, when to clear a line would be a consideration (that is, what priority to give to it). Presumably giving it a high priority is also important because we wanna clear as soon as chance arrives (clearing decreases the stack and hence increases the chance of repetition).

    Others here can have probably have more refined heuristics than this.

    Also, it is worth noting that once you find some Orbit states, the program has to go back and try to do it for more and more states. So the program has to be fairly sophisticated in that sense that it should remember which states were not expanded upon.
    Continuing of calculation(upon any disruption) is another point. We have to save the whole structure before we disrupt the calculation(or something similar). When the calculation is resumed, the resuming also has to be done correctly.

    ==========
    ===Not Related to Specific Topic===
    For some points that are not related to specific topic but might be worth mentioning for general interest (when I use the word "game" I mean short-hand for non-deterministic state based game). I will keep this very brief.
    -- Perfect or Imperfect information can be captured fairly well in the abstraction. Imperfect information is when given a specific string of inputs from start the player has no way (in principle) to determine his state in the game.
    Roughly, every game with imperfect information can be converted (algorithmically) into an "equivalent" one with perfect information. The cost of conversion is probably quite high.

    -- For games that involve continuous quantities, this kind of analysis would be quite forced (and usually not practical to do in real life even for simplest games), while still being technically correct (because of finite memory). For example, any kind of games that involve distance.

    -- Multiplayer games can also be encapsulated in the same model (with different games from pov of each player). Then the non-deterministic behavior that occurs for one player is determined both by the game and the (other player). The model could miss specific subtle points that depend on luck or mind games between players (or relate to execution). But the W-states(winning points), if they do exist for one player, would not be affected by these.

    -- Probabilities can be introduced (and in fact make lot of sense for randomizers like in tetris). However, I haven't thought about their analysis much. They could (potentially) help gain more information.
     
    Last edited: 19 Jan 2015

Share This Page