Only 30 lines of Python code are used to demonstrate the X algorithm

  • 2020-05-05 11:25:04
  • OfStack

If you're interested in singular logarithm solutions, you've probably heard of the exact coverage problem. Given the set Y of X and X subsets, there exists a subset Y* of Y, making Y* constitute a division of X.

Here's an example written by Python.
 


X = {1, 2, 3, 4, 5, 6, 7}
Y = {
  'A': [1, 4, 7],
  'B': [1, 4],
  'C': [4, 5, 7],
  'D': [3, 5, 6],
  'E': [2, 3, 6, 7],
  'F': [2, 7]}

The only solution to this example is ['B', 'D', 'F'].

The exact coverage problem is NP complete. The X algorithm was invented and implemented by Daniel gartner. He came up with an efficient implementation technique called dance chain, which USES a bidirectional linked list to represent the matrix of the problem.

However, the dance chain can be quite cumbersome to implement and difficult to write correctly. Now it's time to show off the Python magic! One day I decided to use Python to write the X algorithm, and I came up with an interesting variation of the dance chain.
algorithm

The main idea is to use dictionaries instead of bidirectional linked lists to represent matrices. We already have Y. From it we can quickly access the column elements of each row. Now we also need to generate a reverse table of rows, in other words, quickly access the row elements from the column. For this purpose, we convert X into a dictionary. In the example above, it should be written as
 


X = {
  1: {'A', 'B'},
  2: {'E', 'F'},
  3: {'D', 'E'},
  4: {'A', 'B', 'C'},
  5: {'C', 'D'},
  6: {'D', 'E'},
  7: {'A', 'C', 'E', 'F'}}

Sharp-eyed readers will notice that this is slightly different from Y. In fact, we need to be able to quickly delete and add rows to each column, which is why we use collections. On the other hand, gartner didn't mention this, and in fact all the rows in the whole algorithm remain the same.

Here is the code for the algorithm.
 


def solve(X, Y, solution=[]):
  if not X:
    yield list(solution)
  else:
    c = min(X, key=lambda c: len(X[c]))
    for r in list(X[c]):
      solution.append(r)
      cols = select(X, Y, r)
      for s in solve(X, Y, solution):
        yield s
      deselect(X, Y, r, cols)
      solution.pop()
 
def select(X, Y, r):
  cols = []
  for j in Y[r]:
    for i in X[j]:
      for k in Y[i]:
        if k != j:
          X[k].remove(i)
    cols.append(X.pop(j))
  return cols
 
def deselect(X, Y, r, cols):
  for j in reversed(Y[r]):
    X[j] = cols.pop()
    for i in X[j]:
      for k in Y[i]:
        if k != j:
          X[k].add(i)

It's really only 30 lines!
format input

Before we can solve the actual problem, we need to convert the input to the format described above. You can do
as easily as this


X = {j: set(filter(lambda i: j in Y[i], Y)) for j in X}

But it's too slow. If the size of X is m and Y is n, the number of iterations is m*n. The sudoku grid in this example is N in size, so that takes N to the fifth power. We have a better way.
 


X = {j: set() for j in X}
for i in Y:
  for j in Y[i]:
    X[j].add(i)

This is still the complexity of O(m*n), but it's the worst case. On average, it performs much better because it doesn't need to traverse all of the whitespace bits. In the case of sudoku, the matrix has exactly four entries per row, regardless of size, so it has the complexity of N^3.
advantages

      simple: there is no need to construct complex data structures, all the structures used are provided by Python.       readability: the first example above was directly transcribed from the Wikipedia example!       flexibility: can be easily extended to solve sudoku.

solve sudoku

What we need to do is to describe sudoku as an exact coverage problem. Here is the complete sudoku solution code, which can handle any size, 3×3, 5×5, even 2×3, all code less than 100 lines, and contains doctest! Thank you Winfried Plappert and David Goodger for their comments and Suggestions


Related articles: