More Abstracted Algebra in Rust
Or, Does 'Group' distribute over 'Ring'?
As I mentioned previously, I have been reading Alexander Stepanov and Daniel Rose's From Mathematics to Generic Programming. It's actually quite interesting, if you kind of fuzz over the math to get to the programming parts. It's not that I'm against math, heavens no, but I have seen many other, and some better, presentations of abstract algebra (I recommend A Book of Abstract Algebra by Charles Pinter.) and of the history of algebra and number theory.
The best parts of From Mathematics to Generic Programming are about programming, specifically about generic programming using examples from algebra. In fact, I do wish Stepanov and Rose had spent more time on the history and theory behind the STL rather than Galois. I would recommend this book to anyone getting into more advanced C++ or perhaps Java and Ada, or anyone getting into Rust. The generics system is built tightly into Rust and an important part of the language environment.
The topic of this post is solutions to their Exercises 8.7 and 8.8, from the section Matrix Multiplication and Semirings. If you followed the last post on this stuff, you'll already know about generalizing addition, multiplication, and matrix multiplication operations to create a generic power function: a recurrence operation that applies a unary operation to an initial value, then to the resulting value, and so on. In this section of the book, Stepanov and Rose take the matrix multiplication operation from there and generalize it to other underlying operations to compute the transitive closure of a social network graph and the distance between the shortest paths in a network.
To begin with, I have to back off a bit. (Why? Why do I always do this? It's probably a long story.)
Generic algebra
This is sort of like math, so let's start with some basic definitions:
Term | Definition | Rust |
---|---|---|
Magma | An algebraic structure consisting of a set of values and an operation closed over that set. |
|
Associativity | (x `op` y) `op` z == x `op` (y `op` z) |
|
Semigroup | A magma whose operation is associative. |
|
Monoid | A semigroup equipped with an identity element such that x `op` id == id `op` x == x. |
|
Group | A monoid that comes with an inverse operation such that x `op` x.inverse() == x.inverse() `op` x == id. |
|
Commutativity | The arguments to the operation go to work on weekdays, such that x `op` y == y `op` x. |
|
Abelian Group | A group whose operation is commutative. (Who names this stuff? It's ridiculous. Oh, Niels Henrick Abel. Nice.) |
|
Now, if you look back at the previous post, where I had simpler definitions for semigroup and implementations for the primitive Rust types along with the definition of the power operation that used them, you'll see that none of this is actually anywhere near as complicated than it looks. Most of the traits are simply raw, type-level assertions; markers to indicate that addition is associative, for example. (It would be nice to be able to ensure that any operation declared as associative actually satisfied the definition, but I don't think I can do that without dependent types and that would be as complicated as it looks. Or more.
For example, here's the addition and multiplication operations defined for all of the primitive types, roughly the same as before:
pub struct Multiplication<Elt>(PhantomData<Elt>);
pub struct Addition<Elt>(PhantomData<Elt>);
macro_rules! add_mult_operations_for {
( $( $t:ty ),* ) => {
$(
impl Magma for Addition<$t> {
type Elt = $t;
fn op(x: $t, y: $t) -> $t { x + y }
}
impl Magma for Multiplication<$t> {
type Elt = $t;
fn op(x: $t, y: $t) -> $t { x * y }
}
)*
}
}
add_mult_operations_for!(usize,u8,u16,u32,u64,i8,i16,i32,i64,f32,f64);
And, as for what you can do with them, here is the Egyptian-multiplication-algorithm-based implementation of the power operation:
fn power_accumulate_semigroup<N,S>(mut r: S::Elt, mut a: S::Elt, mut n: N) -> S::Elt
where N: Natural,
S: Semigroup
{
loop {
assert!(n > N::zero());
if n.odd() {
r = S::op(r.clone() ,a.clone());
if n == N::one() { return r; }
}
n = n.half();
a = S::op(a.clone(), a.clone());
}
}
pub fn power_semigroup<N: Natural, S: Semigroup>(mut a: S::Elt, mut n: N) -> S::Elt {
assert!(n > N::zero());
while !n.odd() {
a = S::op(a.clone(), a.clone());
n = n.half();
}
if n == N::one() { return a }
let y = S::op(a.clone(), a.clone());
return power_accumulate_semigroup::<N,S>(a, y, (n - N::one()).half());
}
(If you're interested in the Natural traits, see that previous post; I don't want to go into it here.)
The definition above actually makes serious use of associativity to execute the repeated operations in O(log n) rather than O(n) time. Below, I make use of the other algebraic definitions to handle edge cases for the Natural argument.
pub fn power_monoid<N: Natural, M: Monoid>(a: M::Elt, n: N) -> M::Elt {
assert!(n >= N::zero());
if n == N::zero() { M::identity() }
else { power_semigroup::<_,M>(a, n) }
}
pub fn power_group<N: Integral, G: Group>(mut a: G::Elt, mut n: N) -> G::Elt {
if n < N::zero() {
n = -n;
a = G::inverse(a);
}
power_monoid::<N,G>(a, n)
}
So far, what I have is just additional formalization of the things I presented in that previous post. Not very interesting, eh? Heck, I'm already bored. So, let's introduce some new definitions.
Term | Definition | Rust |
---|---|---|
Distribution | One way two operations can interact: x `m` (y `a` z) = (x `m` y) `a` (x `m` z), and (y `a` z) `m` x = (y `m` x) `a` (z `m` x). |
|
Absorption | There is a special, choosen, elemnt which, when used as an argument to an operation, always results in itself: x `op` 0 == 0 `op` x == 0. |
|
Semiring | An algebraic structure with two operations (here called Ad and Mu), both monoids. Ad should be commutative and Mu should have an absorbing element (possibly equal to Ad's identity?) and Mu should distribute across Ad. Also, Ad::identity != Mu::identity. |
|
Ring | A semiring where Ad includes an inverse. (Completely unimportant for the rest of this post, but hey, somebody might need it sometime.) |
|
What we have here, besides a failure to communicate, is the definition of a semiring, the algebraic structure needed to generalize matrix multiplication. If you remember your linear algebra, or the code from the previous post, you know that matrix multiplication requires element-wise multiplication and then addition to compute the values in the result matrix. In fact, normal matrix multiplication uses what I'm going to call the Numerical semiring:
pub struct Numerical<Elt>(PhantomData<Elt>);
macro_rules! numerical_semiring_for {
( $( $t:ty ),* ) => { $(
impl Semiring for Numerical<$t> {
type Elt = $t;
type Ad = Addition<$t>;
type Mu = Multiplication<$t>;
} )*
}
}
numerical_semiring_for!(usize,u8,u16,u32,u64,i8,i16,i32,i64,f32,f64);
Technically, as Stepanov and Rose point out, you don't need a semiring for matrix multiplication; specifically, you don't need the identities for Ad and Mu. But I'm not going through this mess again to get a "weak semiring". In fact, for all of this algebraic nonsense, mathematical structures are only bags of properties; my semiring structure should be replaced by something that defines Ad and Mu in terms of Magmas and the actual requirements (associativity, commutativity, etc.) added on as required by the algorithm. Maybe next time.
Matrix multiplication
The matrix I presented last time was simple; this time I came up with something less simple: I added a special value to act as an identity for multiplication.
#[derive(Debug)]
pub enum Matrix<T> {
Identity,
Matrix {
rows: usize,
cols: usize,
m: Vec<Vec<T>>,
}
}
Matrix multiplication gets much more verbose when defined in terms of a semiring. The basic flow is this: if one of the arguments is the identity value, the result is the other. If neither is the identity, then it creates a new result matrix (an n x m matrix multiplied by a m x p matrix produces a n x p matrix) where the cells from the two matrices are combined as aik `Mu` bkj and the results are summed using the Ad operation with Ad::identity as the zero.
pub fn matrix_multiply<S>(a: Matrix<S::Elt>, b: Matrix<S::Elt>) -> Matrix<S::Elt>
where S: Semiring, S::Elt: Clone
{
match (a,b) {
(Matrix::Identity, b) => b,
(a, Matrix::Identity) => a,
(Matrix::Matrix{rows: ref arows, cols: ref acols, m: ref am},
Matrix::Matrix{rows: ref brows, cols: ref bcols, m: ref bm}) => {
let n = *arows;
let m = *acols;
assert_eq!(m, *brows);
let p = *bcols;
let mut results: Vec<S::Elt> = Vec::new();
for i in 0..n {
for j in 0..p {
let cell = (0..m)
.map( |k| {
<S::Mu as Magma>::op( am[i][k].clone(), bm[k][j].clone() )
})
.fold( <S::Ad as Monoid>::identity(), <S::Ad as Magma>::op );
results.push(cell);
}
}
Matrix::new(n, p, &results)
}
}
}
One thing to note: when you pick out S::Mu::op and S::Ad::identity, you need to specify which specific trait's operation and identity with identifier as trait syntax. I'm not entirely sure why, since I don't think the declarations provided allow for multiple definitions.
Matrix multiplication naturally forms an algebraic operation itself.
pub struct MatrixMultiply<S:Semiring>(PhantomData<S>);
impl<S> Magma for MatrixMultiply<S> where S: Semiring, S::Elt: Clone {
type Elt = Matrix<S::Elt>;
fn op(x: Matrix<S::Elt>, y: Matrix<S::Elt>) -> Matrix<S::Elt> {
matrix_multiply::<S>(x,y)
}
}
impl<S> Associative for MatrixMultiply<S> { }
impl<S:Semiring> Semigroup for MatrixMultiply<S> where S::Elt: Clone { }
Given a semiring S, multiplication of matrices over S::Elts forms a semigroup itself. Which is good, because I'll need that later.
At this point, I could go back through and repeat the Fibonacci example using this generalized matrix multiplication, but I won't. I'm off to new things.
Social networks
Assume you have seven people and the relationship of who is friends with whom among the seven. If you represent the friendship relation as a matrix of boolean values (true if the two people are friends and false otherwise), you can use matrix multiplication on a boolean semiring to compute the transitive closure of the relation, which describes the social network of the group of people.
Here is the friendship matrix:
let friends: Matrix<bool> = Matrix::new(7,7, &vec!(
true, true, false, true, false, false, false,
true, true, false, false, false, true, false,
false, false, true, true, false, false, false,
true, false, true, true, false, true, false,
false, false, false, false, true, false, true,
false, true, false, true, false, true, false,
false, false, false, false, true, false, true,
));
The boolean semiring I need, the {Or,And}-semiring, replaces addition with boolean or and multiplication with boolean and. To start with, I need to define the two boolean operations.
Or is the boolean or operation, implemented by Rust's || operator. It is associative, commutative, and has an identity in the false value.
pub struct Or;
impl Magma for Or {
type Elt = bool;
fn op(x: bool, y: bool) -> bool { x || y }
}
impl Associative for Or { }
impl Semigroup for Or { }
impl Monoid for Or {
fn identity() -> bool { false }
}
impl Commutative for Or { }
And is the boolean and operation, &&. It is associative; has an identity, true; has an absorbing element, false. (The absorbing part isn't mentioned in Stepanov and Rose, although the requirement is present in their defining axioms for the semiring. A lack of formalism, I suspect, led them to overlook it.) And also distributes over Or.
pub struct And;
impl Magma for And {
type Elt = bool;
fn op(x: bool, y: bool) -> bool { x && y }
}
impl Associative for And { }
impl Semigroup for And { }
impl Monoid for And {
fn identity() -> bool { true }
}
impl Absorbing for And {
fn absorbing() -> bool { false }
}
impl DistributesOver<Or> for And { }
The result is the boolean semiring:
pub struct BooleanSemiring;
impl Semiring for BooleanSemiring {
type Elt = bool;
type Ad = Or;
type Mu = And;
}
Given all of the above, the code to actually produce the transitive closure is trivial, using the power_semigroup function.
let result = power_semigroup::<_,MatrixMultiply<BooleanSemiring>> ( friends, 6 );
By the way, the 6 comes from 7, the number of people in the group, minus 1.
The result is:
[
true true true true false true false
true true true true false true false
true true true true false true false
true true true true false true false
false false false false true false true
true true true true false true false
false false false false true false true
]
There are two cliques in this network, one containing Ari(0), Bev(1), Cal(2), Don(3), and Fay(5) and another containing Eva(4) and Gia(6). If the two cliques were not disjoint, the graph would contain all true values, but they are and there is no transitive friendship relation between Eva and Gia and the rest of the group.
Directed graphs
Suppose you have a directed graph with weighted links between some of the nodes. You can represent this graph as a matrix, with the weight between each node and itself as 0 and the weight between unconnected nodes as infinity:
let graph: Matrix<f64> = Matrix::new(7, 7, &vec!(
0.0, 6.0, INF, 3.0, INF, INF, INF,
INF, 0.0, INF, INF, 2.0, 10.0, INF,
7.0, INF, 0.0, INF, INF, INF, INF,
INF, INF, 5.0, 0.0, INF, 4.0, INF,
INF, INF, INF, INF, 0.0, INF, 3.0,
INF, INF, 6.0, INF, 7.0, 0.0, 8.0,
INF, 9.0, INF, INF, INF, INF, 0.0,
));
You can use repeated matrix multiplication over a very peculiar semiring to compute a matrix with the minimum path weight between any node and any other node. The peculiar semiring? It's tropical. (I really have no idea where these names come from. Tropical? Ridiculous. I do not have a daiquiri in my hand. There is no sand here. It's freaking hot outside, but I'm currently in Amarillo, TX, so it's a dry heat with a lot of wind so that I can taste the landscape. Bleah.)
pub struct TropicalSemiring;
impl Semiring for TropicalSemiring {
type Elt = f64;
type Ad = Min;
type Mu = Addition<f64>;
}
In this tropical semiring, Min (as in min(a,b) = a if a < b; b otherwise) takes the place of addition and addition takes the place of multiplication. (That is one of the sources of peculiarity.)
Min itself makes a pretty simple algebraic structure. It is associative, commutative, and has an identity (infinity). Addition distributes over it since a + min(b,c) = min(a+b, a+c).
pub struct Min;
impl Magma for Min {
type Elt = f64;
fn op(x: f64, y: f64) -> f64 { f64::min(x,y) }
}
impl Associative for Min { }
impl Commutative for Min { }
impl Semigroup for Min { }
impl Monoid for Min {
fn identity() -> f64 { f64::INFINITY }
}
impl DistributesOver<Min> for Addition<f64> { }
On the other hand, addition proves to be a problem. It's the standard addition operation described above, but the semiring definition requires whatever replaces the multiplication operation to be absorbing. How do you get an element such that e + x = x + e = e? Zero is not it, one is not it; are there any other interesting numbers? Fortunately, IEEE 754 floating point numbers come to our rescue here: it provides representations for positive and negative infinity, and infinity is almost as sticky as NaN. Cast your eyes at this chart: IEEE 754 Infinity and NaN Arithmetic Rules. In the upper-left quarter is addition; infinity plus any other number (except negative infinity and NaN) is always infinity. (Infinity plus negative infinity is NaN; any operation on NaN is also Not a Number, so infinity under addition is almost as sticky as NaN and, although the axioms are not strictly satisfied for negative infinity and NaN, it is close enough for government work. Especially since it isn't used here.)
/// Only true for addition on floating point numbers; see IEEE754.
macro_rules! absorbing_addition_for {
( $( $t:ty: $v:expr ),* ) => { $(
impl Absorbing for Addition<$t> { fn absorbing() -> Self::Elt { $v } }
)*
}
}
absorbing_addition_for!(f32: f32::INFINITY, f64: f64::INFINITY);
Once I have all of the definitions in place, using the power function to compute the transitive closure of the graph link weights is easy:
let result = power_semigroup::<_,MatrixMultiply<TropicalSemiring>>(graph, 7-1);
The final result is:
[
0 6 8 3 8 7 11
23 0 16 26 2 10 5
7 13 0 10 15 14 18
12 18 5 0 11 4 12
35 12 28 38 0 22 3
13 17 6 16 7 0 8
32 9 25 35 11 19 0
]
So, travelling from A(0) to B(1) (in the top row, second from the left) has weight 6; travelling from B to A has weight 23 (second row, leftmost column).
That is exercise 8.8; exercise 8.9, which calls for not only finding the shortest distance but the shortest paths, should be fairly easy; simply record which nodes you use when you reduce with min. Maybe next time.
Conclusions
Throughout this post, I used semirings even though I did not need all of their properties; in fact, the absorbing thing was a bit of a problem. At one point in the book, Stepanov and Rose write,
Categorical Theories versus STL
For a long time, people believed that only categorical theories [a theory is a set of propositions; in a categorical theory, all models, which define the operations in the theory and make the axioms true, of the theory are isomorphic] were good for programming. When the C++ Standard Template Library (STL) was first proposed, some computer scientists opposed it on the grounds that many of its fundamental concepts, such as Iterator, were underspecified. In fact, it is this underspecification that gives the library its generality. While linked lists and arrays are not computationally isomorphic, many STL algorithms are defined on input iterators and work on both data structures. If you can get by with fewer axioms, you allow for a wider range of implementations.
I seriously hope they are talking about the problem of dealing with bundles of propositions such as semirings, where all of the requirements for the bundle are not used by a given algorithm. In that case, it is easy for a structure to be underspecified, since if a property is not used then it never needs to be mentioned in the code. The alternative, that if it is used but cannot easily be described (like the associativity trait), it may only be implicit, is a dandy source of hard-to-find bugs. Making things like associativity explicit, but trivial, is only a bandaid. It tries to remind the programmer that water is needed, but cannot make them drink.