Fast and Precise: Adjusting Planning Horizon with

Adaptive Subgoal Search




Michał Zawalski*    Michał Tyrolski*   Konrad Czechowski*

Tomasz Odrzygóźdź    Damian Stachura    Piotr Piękos

Yuhuai Wu    Łukasz Kuciński    Piotr Miłoś



ICLR 2023

notable-top-5%

Abstract

Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreachable subgoals swiftly, allowing to focus on feasible further subgoals. In this way, AdaSubS benefits from the efficiency of planning with longer subgoals and the fine control with the shorter ones, and thus scales well to difficult planning problems.

We show that AdaSubS significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik's Cube, and inequality proving benchmark INT.

Videos

General audience video (4')

Short technical presentation (8')

AdaSubS @Warsaw.AI (20')

Try Adaptive Subgoal Search yourself!

Play with each part of the algorithm in our walk-through colab.

Environments

Sokoban is a single-player game in which the player controls an agent whose goal is to push the boxes on target locations (scroll down to play). It is difficult due to its combinatorial complexity and the existence of irreversible states. Deciding, if Sokoban boards are solvable, is an  NP-hard problem.

Rubik’s Cube is a classical 3D combination puzzle. It is particularly challenging due to the fact that the search space has more than 4.3 × 1018 configurations.


INT provides a generator of inequalities, which produces mathematical formulas along with their proofs. Proofs are sequences of consecutively applied mathematical axioms. An action in INT is a tuple containing an axiom and its input entities. The action space in this problem can be large, reaching up to 106 elements, which significantly complicates planning.

Adaptive Subgoal Search

AdaSubS utilizes the following key components: subgoal generators, verifier, conditional low-level policy (CLLP), and value function. These components are implemented using trained neural networks. AdaSubS iteratively builds a tree of subgoals reachable from the initial state until the target state is reached or the search budget is depleted. In each iteration, it chooses a node in the tree that is expanded by one of the generators. The chosen generator creates a few subgoal candidates, i.e., states expected to be a few steps closer to the target than the current node. For each of them, we use the verifier and CLLP to check whether they are valid and reachable within a few steps. For the correct subgoals, we compute their value function, place them in the search tree, and the next iteration follows.

Perhaps the most important design decision is how to select the next node to be processed and the details of the expansion phase. To this end, AdaSub uses a priority queue of states with the keys being a distance-value pair  (k, v) sorted in lexicographical order. After extracting the max node, we use the corresponding k-generator to create subgoals. For valid s', we estimate its value v' and add s' to the queue once for each generator distance k', i.e., once for each (k', v').

Using other queue orderings implements different adaptivity mechanisms. We discuss four possible options in Sec 4.4 of the paper.

Other variants of adaptive planners

There are a few natural ways to implement adaptivity, but not all work equally well. A detailed discussion and results of all methods we considered can be found in Appendix H in our paper.

Results

The two plots showcase the performance of three methods, AdaSubS, kSubS, and BestFS, in terms of available budget i.e., graph size. The success rates of these methods are evaluated for in-distribution and out-of-distribution cases. The in-distribution performance (left figure) represents the success score on proofs of lengths 15 after being trained on the same length. The out-of-distribution (right figure) represents performance carried out on INT with a proof length of 20, while the training is performed on a proof length of 15. The plots provide insights into the effectiveness of these three methods, especially AdaSubS, for low budgets (~100) and high budgets (~1000).

The plot illustrates the out-of-distribution performance of AdaSubS and kSubS for long proofs in INT, with a fixed budget of 5000 nodes. The methods were trained on proofs of length 15, and were evaluated on problems with up to 28 proof length. The success rates are shown on the barplot with error bars that correspond to 95% confidence intervals. AdaSubS is progressively better than kSubS on longer proof lengths as well as its performance decays slower. This highlights an AdaSubS strong OOD performance.

Results in other domains

Adaptive Subgoal Search masters also Sokoban and Rubik's Cube, check the main results in our paper. See also the OOD experiments in those environments, presented in Appendix K.

Explore subgoals 

Below, we provide an interactive visualization of precomputed subgoals, which mimics the traces of AdaSubS execution for some Sokoban boards. To make the visualization more diverse, for each board, we present the 2-, 4-, and 8-subgoals, which are reachable (i.e., the ones verified by the low-level policy or the verifier). Subgoals for an arbitrary board can be generated in our colab.


Play Sokoban