Introduction¶

You may be familiar with permuations: a rearrangement of a sequence. If you are, you probably know that for a sequence of length \(n\), there are \(n!\) arrangements (if the elements are all distinct – duplicates will reduce the number of unique rearrangements). There are many, many uses of permutations and many specific permutations of interest.

The permutation I am interested in today is the sorted permutation: i.e., the one where the elements are in non-decreasing order. For any two arrangements (permutations) of a sequence, we can define a permutation function that maps the first sequence to the second sequence. In a few moments, I’m going to start referring to permutation-functions as permutations. It should be clear from context, but I’ll make a note if it is unclear. So, here’s an example. \(seq_1=[4,2]\) and \(seq_2=[2,4]\). To mutate \(seq_1 \rightarrow seq_2\), we have to move some elements around. Namely, using 0-based sequence indexing: \(seq_1[0] \rightarrow seq_1[1]\) and \(seq_1[1] \rightarrow seq_1[0]\). This can be expressed as a permuation-function (in terms of the sequence indices): \(\begin{pmatrix}0 & 1 \\ 1 & 0\end{pmatrix}\). The top row represents the source indices and the bottom rows represents the target indices. Thus, the first column says, "move item at position 0 to position 1". When we apply the entire permutation-function, we’re effectively doing \(seq_1[0] \leftrightarrow seq_1[1]\) and this is a very common task in introductory algorithms. You may recognize it as a swap procedure that updates the underlying sequence: swap(\(seq_1[0], seq_1[1]\)).

I’m going to expand on these ideas and look at the relationship between permutation-functions, swapping, and sorting. Before we go down the rabbit hole, here’s a bit of motivation. Suppose, (1) you have a sequence which you sort using "ordinary" method and (2) you can easily extract the permutation-function that takes you from \(seq \rightarrow sorted(seq)\). Now, you have another data structure that needs to be modified (with the same permutation), and it only be modified by using pairwise swaps. If we can convert the permutation to a sequence of swaps, we can solve this problem nicely. That’s our goal. I’ll show a specific example where this arises at the end of the post (you might want to jump down to The Orginal Motivation to see if this motivation works for you).

One other note, an alternative to this use of permutations would be to perform the sorting on the primary data structure above and either (1) whenever that sort causes a swap (assuming an in-place sort) also perform an analogous swap in the secondary data structure or (2) extract a list of swaps and then apply them to the secondary data structure. We actually going to do something similar to (2), but we will extract the swaps needed to apply a permutation function.

A Bit more on Permutations¶

Here’s another sequence and a permutation-function to sort it: \[\begin{align}
seq_1 &= [50,100,10] \\
sorted(seq_2) &= [10,50,100] \\
&\begin{pmatrix}{}
0 & 1 & 2 \\
1 & 2 & 0
\end{pmatrix}
\end{align}\]

We can think about what it means to apply the permutation to \(seq_1\). Here goes:

\(seq_1[0] \leftrightarrow seq_1[1]\)

After we perform this, \(seq_1\) has been mutated to \(seq_1 = [100,50,10]\). Position 1 (with value 50) is in its final resting place. It doesn’t need to move anywhere. Position 0 used to have the value 50, now it is 100. Obviously (or maybe not!), performing this swap improved the ordering of \(seq_1\). But, it invalidated our permutation-function in two ways: (1) we no longer need "fix up" position 1 (i.e, we no longer need a target position of 1) and (2) we no longer want to move position 1 somewhere else (which is exactly what the second column says to do).

Let’s update the permutation-function so it can be applied to the rest of \(seq_1\) to sort it. The first thing we’ll do is remove the operation we just applied (we don’t want to do it again): \(\begin{pmatrix}{} 1 & 2 \\ 2 & 0 \end{pmatrix}\). Now, first column says we want to move: \(seq_1[1] \rightarrow seq_1[2]\). But, this can’t be right! Remember \(seq_1[1]\) is in its final home. What went wrong? We moved what used to be at position 1 to position 0. So, what we really want to do is move \(seq_1[0] \rightarrow seq_1[2]\). Which means we want our permutation to be: \(\begin{pmatrix}{} 0 & 2 \\ 2 & 0 \end{pmatrix}\).

If we use src and tgt to refer to the first and second indices we used for swapping, what we said was "after swapping at src and tgt: drop their column, find tgt in the sources – the first row – and replace it with src. That is, instead of coming from tgt, we now come from src (since we swapped them).

We do the same process again with updated structures: \(seq_1 = [100,50,10]\) and \(\begin{pmatrix}{} 0 & 2 \\ 2 & 0 \end{pmatrix}\). You may have noticed that this particular permutation-function has a special structure. And it is related to the fact that if only two items in a list are out of place, you only need one swap to fix both of them (someone once mentioned something about two birds, one stone – but I digress). Let’s see what happens.
\(seq_1[0] \leftrightarrow seq_1[2]\)

Now, \(seq_1 = [10,50,100]\). Our sequence is sorted, and we could just stop. But let’s fix the permutation anyway. First, we drop the initial column: \(\begin{pmatrix}{} 2 \\ 0 \end{pmatrix}\). Second, in the sources locate tgt (2) and replace it with src (0). This gives a permutation: \(\begin{pmatrix}{} 0 \\ 0 \end{pmatrix}\). Now, since swapping position 0 with position 0 is a null operation (it has no effect), we don’t even have to apply it. Which gels with out intuition of only needing one swap to fix two (and only two) out of place elements.

Some Helper Code¶

In [1]:

import numpy as np
import itertools as it
def swap(lst, a, b):
    lst[a], lst[b] = lst[b], lst[a]

Apply a Permutation to a List (and use it to Sort)¶

Now, let’s encode in Python the steps we carried out by hand — since computers are designed for this sort of thing. We’ll make use of two helpers on the way to applying permutations. One is our representation of a permutation. We’ll use a dictionary so that the permutation \(\begin{pmatrix}0 & 1 & 2 \\ 1 & 2 & 0 \end{pmatrix}\) is stored in the python dictionary {0:1, 1:2, 2:0}. This is a natural data structure for an explicitly defined mathematical function and it allows constant time lookup of a key when we need to "fix it up". The second helper we use is a decorate-sort-undecorate pattern to find the sorting permutation. Here is how that works:

In [2]:

lst = [50, 100, 10]
valAndOrigin = sorted((j,i) for i,j in enumerate(lst)) # sort by lst value, carry the index with it
print valAndOrigin

sortedValues, origPos = zip(*valAndOrigin) # zip(*foo) says to "unzip" foo
print sortedValues
print origPos

# we can drop any null operations (i.e., src == tgt)
sortingPermutation = dict((j,i) for i,j in enumerate(origPos) if i != j)
print sortingPermutation

[(10, 2), (50, 0), (100, 1)]
(10, 50, 100)
(2, 0, 1)
{0: 1, 1: 2, 2: 0}

In [3]:

def extract_sorting_permutation(lst):
    _, sortingPerm = zip(*sorted((j,i) for i,j in enumerate(lst)))
    return dict((j, i) for i,j in enumerate(sortingPerm) if i != j)

Now, we can write the code to apply a permutation to a sequence (for us, the sequence is a python list). One quick note: we explicitly test to see if source and target are the same — and if they are, we skip the update. You might ask, "how can that happen, other than in the last spot?" It turns out the the example permutations we used above had only one cycle. I’ll discuss cycles more below (actually, they are going to be in a post of their own), but suffice to say that in each cycle, there will come a point of two differences (one swap needed). So, if there are many cycles in the permutation, even if we stop when we apply the next to last column (of the whole permutation), we will still have some identity swaps (null operations) at the end of earlier cycles.

In [4]:

def apply_permutation_to(perm, lst):
    src, tgt = perm.popitem()
    while perm:
        # only need test if we haven't decomposed permutation to cycles
        if tgt != src: 
            swap(lst, tgt, src)
            # update permutation FROM {tgt:perm[tgt]} TO {src:perm[tgt]}
            perm[src] = perm[tgt]; del perm[tgt]
        src, tgt = perm.popitem()
    return lst

With those pieces, we can sort using the round-about process of decorated sort \(\rightarrow\) permutation \(\rightarrow\) auxilliary sort.

In [5]:

def create_permutation_and_sort(lst):
    sortingPerm = extract_sorting_permutation(lst)
    if not sortingPerm:
        return lst
    return apply_permutation_to(sortingPerm, lst)

And now we can test this by creating all permutations of lists from length 1 to length 8.

In [6]:

def test_perm_sort():
    MAX_LIST_SIZE = 8
    testLists = [range(2,(i+1)*2,2) for i in range(1,MAX_LIST_SIZE+1)]
    for lst in testLists:
        for testCase in it.permutations(lst, len(lst)):
            assert sorted(testCase) == create_permutation_and_sort(list(testCase))

In [7]:

try:
    test_perm_sort()
    print "passed"
except AssertionError:
    print "failed"

passed

Given a Permutation, Convert it to Series of Swaps (and use it to Sort)¶

Above, we discussed the need to extract a sequence of swaps from a permutation. It turns out that this is basically the same process as applying the permutation. The only difference is that instead of performing a swap, we record the swap.

In [8]:

def extract_swaps_from_permutation(perm):
    src, tgt = perm.popitem()
    swaps = []
    while perm:
        # print "swap: ", src, tgt 
        if tgt != src: # only need test if we haven't decomposed perm to cycles
            swaps.append((tgt, src))
            # update permutation FROM {tgt:perm[tgt]} TO {src:perm[tgt]}
            perm[src] = perm[tgt]; del perm[tgt]
        src, tgt = perm.popitem()
    return swaps

def apply_swaps_to(swaps, lst):
    for s,t in swaps:
        swap(lst, s, t)
    return lst

And now we can do our sort in terms of applied swaps (instead using the permutation, we use the swaps directly).

In [9]:

def test_perm_swap_sort():
    MAX_LIST_SIZE = 8
    testLists = [range(2,(i+1)*2,2) for i in range(1,MAX_LIST_SIZE)]
    for lst in testLists:
        for testCase in it.permutations(lst, len(lst)):
            perm = extract_sorting_permutation(lst)
            if not perm: 
                continue
            swaps = extract_swaps_from_permutation(perm)
            # print testCase, sorted(testCase), apply_swaps_to(swaps, list(testCase))
            assert sorted(testCase) == apply_swaps_to(swaps, list(testCase))

In [10]:

try:
    test_perm_swap_sort()
    print "passed"
except AssertionError:
    print "failed"

passed

Factor the Common Code¶

Since there is such a great similarity between apply the permutation and recording the swaps it makes, we can make a single common routine to do both tasks. Essentially, we create the right function to use inside the loop. We either use a simple swap function (curried to swap only on the given list) or we use an a function that can append to a record of the swaps. Both functions take a tuple (src, tgt) as their argument. Here is a first pass:

In [11]:

def apply_or_extract_permutation(perm, lst=[]):
    if not lst:
        # print "in not lst"
        swaps = []
        innerFunc = swaps.append
        # print "building swaps: ", swaps, innerFunc
    else:
        def innerFunc(args):
            swap(lst, args[0], args[1])
            
    src, tgt = perm.popitem()
    while perm:
        # print "swap: ", src, tgt 
        if tgt != src: # only need test if we haven't decomposed perm to cycles
            innerFunc((tgt, src))
            # print "added swaps (%s,%s): %s" % (tgt, src, swaps)
            # update permutation FROM {tgt:perm[tgt]} TO {src:perm[tgt]}
            perm[src] = perm[tgt]; del perm[tgt]
        src, tgt = perm.popitem()
    return lst if lst else swaps

extract_swaps_from_permutation = apply_or_extract_permutation
apply_permutation_to = apply_or_extract_permutation

In [12]:

test_perm_sort()
test_perm_swap_sort()

And a Cleaner Refactoring¶

The refactoring to apply_or_extract_permatation ended up being very dirty: there was "a lot" of ugliness in the creation of the two alternate inner functions. Also, the default argument (lst=[]) effects both the semantics of the call and the internal operation. Personally, I prefer one execution path through the function and to encapsulate the different semantics in their own, appropriate, functions.

In [13]:

def walk_permutation(perm, op):            
    src, tgt = perm.popitem()
    while perm:
        if tgt != src: # only need test if we haven't decomposed perm to cycles
            op((tgt, src))
            # update permutation FROM {tgt:perm[tgt]} TO {src:perm[tgt]}
            perm[src] = perm[tgt]; del perm[tgt]
        src, tgt = perm.popitem()
    return

def extract_swaps_from_permutation(perm):
    swaps = []; op = swaps.append
    walk_permutation(perm, op)
    return swaps
    
def apply_permuation_to(perm, lst):
    def op(args):
        swap(lst, args[0], args[1])
    walk_permutation(perm, op)
    return lst

walk_permutation now does just that: it moves across the permutation and keeps track of updating it. It also applies the give side-effect operation. Also, the tasks specific to recording and performing are now completely factored into their own functions. We need to make sure we haven’t broken anything.

In [14]:

test_perm_sort()
test_perm_swap_sort()

The Lightbulb¶

If you recall, we were extracting the swaps so that we could perform some other set of swapping. Well, we can extract the swaps and then apply them one at a time. Or, we could give a carefully constructed function that is performing the required swaps as we get them. And we can do this using walk_permutation‘s second argument. Let’s take a look at the problem that inspired this post.

The Original Motivation¶

Suppose we have a numpy array that also has an explicitly definied domain over which its dimensions are specified. For example, in probability problems we might specify a complete probability distribution over the events in the sample space. If we draw yellow and green balls from Urn I and red and blue balls from Urn II, the distribution might be:

\[\begin{align}
p(I = y) &= .4 \\
p(II = b) &= .7 \\
\\
p(I=y, II=b) &= .4 * .7 = .28 \\
p(I=y, II=r) &= .4 * .3 = .12 \\
p(I=g, II=b) &= .6 * .7 = .42 \\
p(I=g, II=r) &= .6 * .3 = .18
\end{align}\]

And we can represent that as a \(2×2\) NumPy array:

p = np.array([[.28, .12],
              [.42, .18]])

The two dimensions in p (the visual rows and columns) express the act that there are two random variable (the two Urns). The two entries per row (and per column) mean that each variable has two values. When multiplying two probability tables (really, multiplying two potential functions, but I’ll call them tables to simplify the terminology), the variables that occur in both need to be aligned and given special treatment. We call the set of all the random variables that occur in the table its domain.

Hence, when we multiply (and perform other operations) on two tables, it is handy to have the variables in a sorted order. We can use binary search instead of linear search to locate a variable in the domain of the potential.

Here’s a more complicated table.

In [15]:

domain0 = ["B", "A"]
p0 = np.array([[.28, .12],
              [.42, .18]])

In [16]:

domain1 = ["E", "B", "A"]
p1 = np.array([[[1-.001, .001],
               [1-.94 , .94 ]], 
              [[1-.29 , .29 ],
               [1-.95 , .95 ]]])

In [17]:

domain2 = ["E", "B", "A"]
p2 = np.random.uniform(0,1,(4,3,2))
p2 = p2[:,:] / p2.sum(2)[:,:,np.newaxis]

To sort the domain (and the dimensions of the array), we can:

In [18]:

def apply_swaps_to_array_axes(swaps, arr):
    for s,t in swaps:
        arr = np.swapaxes(arr, s, t)
    return arr

print domain1
print p1.shape
print p1 

perm = extract_sorting_permutation(domain1)
swaps = extract_swaps_from_permutation(perm)
result = apply_swaps_to_array_axes(swaps, p1)

print sorted(domain1)
print "shape: ", result.shape
print result

['E', 'B', 'A']
(2, 2, 2)
[[[ 0.999  0.001]
  [ 0.06   0.94 ]]

 [[ 0.71   0.29 ]
  [ 0.05   0.95 ]]]
['A', 'B', 'E']
shape:  (2, 2, 2)
[[[ 0.999  0.71 ]
  [ 0.06   0.05 ]]

 [[ 0.001  0.29 ]
  [ 0.94   0.95 ]]]

Lightbulb (Part II)¶

But wait. We can also apply those swaps with walk_permutation. But, we run into a problem. NumPy’s swapaxes routine returns view of the original array (it doesn’t update it). Because it doesn’t have side-effects, we have to do some very ugly code to force the side-effect. I can’t in good conscience recommend doing this. Thus, I’m going to stick with the solution above (and think if there is a nicer way to deal with this).

Two notes: this routine updates the input array (pseudo-in-place b/c a copy is made internally). Also, I have to do some ugliness to force the array to update itself (see def swapper).

In [19]:

print domain1
print p1

perm = extract_sorting_permutation(domain1)
print perm

def apply_permutation_on_axes(perm, arr):
    def swapper(args):
        view = np.swapaxes(arr, args[0], args[1])
        shape = view.shape
        arr.flat = view.flatten() # update the array values (from copy)
        arr.shape = shape
    walk_permutation(perm, swapper)
    return arr

res = apply_permutation_on_axes(perm, p1)
print res.shape
print res

['E', 'B', 'A']
[[[ 0.999  0.001]
  [ 0.06   0.94 ]]

 [[ 0.71   0.29 ]
  [ 0.05   0.95 ]]]
{0: 2, 2: 0}
(2, 2, 2)
[[[ 0.999  0.71 ]
  [ 0.06   0.05 ]]

 [[ 0.001  0.29 ]
  [ 0.94   0.95 ]]]

The fundamental problem with the answer above is that I’m forcing array mutation (a side-effect) when array’s really want to work on modified views. We can do this if we turn the walk_permutation that works by returning values instead of relying on side-effects. Here goes. It is a lot cleaner. twoarg_swapaxes serves to massage the arguments to np.swapaxes, it isn’t necessary – we could make a different op(arr, (tgt, src)) call. But this matches the code above.

In [20]:

def walk_and_update_with_permutation(arr, perm, op):            
    src, tgt = perm.popitem()
    while perm:
        if tgt != src: # only need test if we haven't decomposed perm to cycles
            arr = op(arr, (tgt, src))
            # update permutation FROM {tgt:perm[tgt]} TO {src:perm[tgt]}
            perm[src] = perm[tgt]; del perm[tgt]
        src, tgt = perm.popitem()
    return arr

def twoarg_swapaxes(arr, axes):
    return np.swapaxes(arr, axes[0], axes[1])

domain1 = ["E", "B", "A"]
p1 = np.array([[[1-.001, .001],
               [1-.94 , .94 ]], 
              [[1-.29 , .29 ],
               [1-.95 , .95 ]]])

print domain1
print p1

perm = extract_sorting_permutation(domain1)
print walk_and_update_with_permutation(p1, perm, twoarg_swapaxes)

['E', 'B', 'A']
[[[ 0.999  0.001]
  [ 0.06   0.94 ]]

 [[ 0.71   0.29 ]
  [ 0.05   0.95 ]]]
[[[ 0.999  0.71 ]
  [ 0.06   0.05 ]]

 [[ 0.001  0.29 ]
  [ 0.94   0.95 ]]]

Additional Resources

You can grab a copy of this notebook.

Even better, you can view it using nbviewer.

License

Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).

DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.

drsfenner.org

Blogging our way through …

Sorting Friends and Permutations