Skip to content →

Learners and Poly

Brendan Fong, David Spivak and Remy Tuyeras cooked up a vast generalisation of neural networks in their paper Backprop as Functor: A compositional perspective on supervised learning.

Here’s a nice introduction to neural networks for category theorists by Bruno Gavranovic. At 1.49m he tries to explain supervised learning with neural networks in one slide. Learners show up later in the talk.

Poly is the category of all polynomial functors, that is, things of the form
p=ip(1)yp[i] : SetsSetsSip(1)Maps(p[i],S)
with p(1) and all p[i] sets.

Last time I gave Spivak’s ‘corolla’ picture to think about such functors.

I prefer to view pPoly as an horribly discrete ‘sheaf’ P over the ‘space’ p(1) with stalk p[i]=Pi at point ip(1).



A morphism pq in Poly is a map φ1:p(1)q(1), together with for all ip(1) a map φi#:q[φ1(i)]p[i].

In the sheaf picture, this gives a map of sheaves over the space p(1) from the inverse image sheaf φ1Q to P.



But, unless you dream of sheaves in the night, by all means stick to Spivak’s corolla picture.

A learner AB between two sets A and B is a complicated tuple of things (P,I,U,R):

  • P is a set, a parameter space of some maps from A to B.
  • I is the interpretation map I:P×AB describing the maps in P.
  • U is the update map U:P×A×BP, the learning procedure. The idea is that U(p,a,b) is a map which sends a closer to b than the map p did.
  • R is the request map R:P×A×BA.

Here’s a nice application of Poly’s set-up:

Morphisms PyPMaps(A,B)×Maps(A×B,A)yA×B in Poly coincide with learners AB with parameter space P.

This follows from unpacking the definition of morphism in Poly and the process CT-ers prefer to call Currying.

The space-map φ1:PMaps(A,B)×Maps(A×B,A) gives us the interpretation and request-map, whereas the sheaf-map φ# gives us the more mysterious update-map P×A×BP.

Learn(A,B) is the category with objects all the learners AB (for all paramater-sets P), and with morphisms defined naturally, that is, maps between the parameter-sets, compatible with the structural maps.

A surprising result from David Spivak’s paper Learners’ Languages is

Learn(A,B) is a topos. In fact, it is the topos of all set-valued representations of a (huge) directed graph GAB.

This will take some time.

Let’s bring some dynamics in. Take any polynmial functor pPoly and fix a morphism in Poly
φ=(φ1,φ[]) : p(1)yp(1)p
with space-map φ1 the identity map.

We form a directed graph:

  • the vertices are the elements of p(1),
  • vertex ip(1) is the source vertex of exactly one arrow for every ap[i],
  • the target vertex of that arrow is the vertex ϕ[i](a)p(1).

Here’s one possibility from Spivak’s paper for p=2y2+1, with the coefficient 2-set {green dot, yellow dot}, and with 1 the singleton {red dot}.



Start at one vertex and move after a minute along a directed edge to the next (possibly the same) vertex. The potential evolutions in time will then form a tree, with each node given a label in p(1).

If we start at the green dot, we get this tree of potential time-evolutions



There are exactly #p[i] branches leaving a node labeled ip(1), and all subtrees emanating from equal labelled nodes are isomorphic.

If we had started at the yellow dot we had obtained a labelled tree isomorphic to the subtree emanating here from any yellow dot.

We can do the same things for any morphism in Poly of the form
φ=(φ1,φ[]) : SySp
Now, we have a directed graph with vertices the elements sS, with as many edges leaving vertex s as there are elements ap[φ1(s)], and with the target vertex of the edge labeled a starting in s the vertex φ[φ1(s)](A).

Once we have this directed graph on #S vertices we can label vertex s with the label φ1(s) from p(1).

In this way, the time evolutions starting at a vertex sS will give us a p(1)-labelled rooted tree.

But now, it is possibly that two distinct vertices can have the same p(1)-labeled tree of evolutions. But also, trees corresponding to equal labeled vertices can be different.

Right, I guess we’re ready to define the graph GAB and prove that Learn(A,B) is a topos.

In the case of learners, we have the target polynomial functor p=CyA×B with C=Maps(A,B)×Maps(A×B,A), that is
p(1)=Cand allp[i]=A×B

Start with the free rooted tree T having exactly #A×B branches growing from each node.

Here’s the directed graph GAB:

  • vertices vχ correspond to the different C-labelings of T, one C-labeled rooted tree Tχ for every map χ:vtx(T)C,
  • arrows vχvω if and only if Tω is the rooted C-labelled tree isomorphic to the subtree of Tχ rooted at one step from the root.

A learner AB gives a set-valued representation of GAB.

We saw that a learner AB is the same thing as a morphism in Poly
φ=(φ1,φ[]) : PyPCyA×B
with P the parameter set of maps.

Here’s what we have to do:

1. Draw the directed graph on vertices pP giving the dynamics of the morphism φ. This graph describes how the learner can cycle through the parameter-set.

2. Use the map φ1 to label the vertices with elements from C.



3. For each vertex draw the rooted C-labeled tree of potential time-evolutions starting in that vertex.

In this example the time-evolutions of the two green vertices are the same, but in general they can be different.



4. Find the vertices in GAB determined by these C-labeled trees and note that they span a full subgraph of GAB.



5. The vertex-set Pv consists of all elements from p whose (C-labeled) vertex has evolution-tree Tv. If vw is a directed edge in GAB corresponding to an element (a,b)A×B, then the map on the vertex-sets corresponding to this edge is
fv,(a,b) : PvPwpφ[φ1(p)](a,b)



A set-valued representation of GAB gives a learner AB.

1. Take a set-valued representation of GAB, that is, the finite or infinite collection of vertices V in GAB where the vertex-set Pv is non-empty. Note that these vertices span a full subgraph of GAB.

And, for each directed arrow vw in this subgraph, labeled by an element (a,b)A×B we have a map
fv,(a,b) : PvPw

2. The parameter set of our learner will be P=vPv, the disjoint union of the non-empty vertex-sets.

3. The space-map φ1:PC will send an element in Pv to the C-label of the root of the tree Tv. This gives us already the interpretation and request maps
I:P×ABandR:P×A×BA

4. The update map U:P×A×BP follows from the sheaf-map we can define stalk-wise
φ[φ1(p)](a,b)=fv,(a,b)(p)
if pPv.

That’s all folks!

Learn(A,B) is equivalent to the (covariant) functors GABSets.

Changing the directions of all arrows in GAB any covariant functor GABSets becomes a contravariant functor GABoSets, making Learn(A,B) an honest to Groth topos!

Every topos comes with its own logic, so we have a ‘learners’ logic’. (to be continued)

Published in geometry math

One Comment

  1. David David

    I’m glad you’re liking Poly! It’s really a spectacular thing.

Comments are closed, but trackbacks and pingbacks are open.