Learners’ logic – neverendingbooks

In the Learners and Poly-post we’ve seen that learners from $A$ to $B$ correspond to set-valued representations of a directed graph $G$ and therefore form a presheaf topos.

Any topos comes with its Mitchell-Benabou language, allowing us to speak of formulas, propositions and their truth values. Two objects play a special role in this: the terminal object $1$ , and the subobject classifier $Ω$ . It is a fun exercise to determine these special learners.

$T$ is the free rooted tree with branches sprouting from every node $n \in T_{0}$ for each element in $A \times B$ . $C$ will be our set of colours, one for each element of $M a p s (A, B) \times M a p s (A \times B, A)$ .

For every map $λ : T_{0} \to C$ we get a coloured rooted tree $T_{λ}$ , and for each branch $(a, b)$ from the root we get another rooted sub-tree $T_{λ} (a, b)$ which is again of the form $T_{μ}$ for a certain map $μ : T_{0} \to C$ .

The directed graph $G$ has a vertex $v_{λ} \in V$ for each coloured rooted tree $T_{λ}$ and a directed edge $v_{λ} \to v_{μ}$ if $T_{μ}$ is the isomorphism class of coloured rooted trees of the subtree $T_{λ} (a, b)$ for some $(a, b) \in A \times B$ .

There are exactly $# A \times B$ directed edges leaving every vertex in $G$ , but there may be (many) more incoming edges. We can colour each vertex $v_{λ}$ with the colour of the root of $T_{λ}$ .

The coloured directed graph $G$ depicts the learning process in a neural network, being trained to find a suitable map $A \to B$ . The colour of a vertex $v_{λ}$ gives a map $f \in M a p s (A, B)$ (and a request function). If the network now gives as output $b \in B$ for a given input $a \in A$ , we can move on to the end-vertex $v_{μ}$ of the directed edge labeled $(a, b)$ out of $v_{λ}$ . The colour of $v_{μ}$ gives us a new (hopefully improved) map $f_{n e w} \in M a p s (A, B)$ (and a new request function). A new training data $(a^{'}, b^{'})$ brings us to a new vertex and map, and so on.

Clearly, some parts of $G$ are more efficient to find the desired map than others, and the aim of the game is to distinguish efficient from inefficient learners. A first hint that Grothendieck topologies and their corresponding sheafifications will turn out to be important.

We’ve seen that a learner, that is a morphism $P y^{P} \to C y^{A \times B}$ in $Poly$ , assigns a set $P_{λ}$ to every vertex $v_{λ}$ (this set may be empty) and a map $P_{λ} \to P_{μ}$ to every directed edge $v_{λ} \to v_{μ}$ in $G$ .

The terminal object $1$ in this setting assigns to each vertex a singleton ${*}$ , and the obvious maps for each directed edge. In $Poly$ -speak, the terminal object is the morphism
$1 : V y^{V} \to C y^{A \times B}$
which sends each vertex $v_{λ} \in V$ to its colour $c \in C$ , and where the backtrack map $φ_{v_{λ}}^{#} [c]$ maps $(a, b)$ to $v_{μ}$ if this is the end-vertex of the edge labelled $(a, b)$ out of $v_{λ}$ . That is, $1$ contains all information about the coloured directed graph $G$ .

The subobject classifier $Ω$ assigns to each vertex $v_{λ}$ the set $Ω (v_{λ})$ of all subsets $S$ of directed paths in $G$ , starting at $v_{λ}$ , such that if $p \in S$ then also all prolongated paths belong to $S$ . Note that the emptyset $\emptyset$ satisfies this requirement, so is an element of this vertex set. Another special element in $Ω (v_{λ})$ is the set $1_{λ}$ of all oriented paths starting at $v_{λ}$ .

$Ω (v_{λ})$ is an Heyting algebra with $1 = 1_{λ}$ , $0 = \emptyset$ , partially ordered via inclusion, and logical operations $\land$ (intersection), $\lor$ (union), $\neg$ (with $\neg S$ the largest $S^{'} \in Ω (v_{λ})$ disjoint from $S$ ) and $\Rightarrow$ defined by $S \Rightarrow S^{'}$ is the union of all $S ” \in Ω (v_{λ})$ such that $S ” \cap S \subseteq S^{'}$ .

$S \land \neg S$ is not always equal to $1$ . Here, the union misses the left edge from the root. So, we will not be able to prove things by contradiction.

If $v_{λ} \to v_{μ}$ is the directed edge labeled $(a, b)$ , then the corresponding map $Ω (v_{λ}) \to Ω (v_{μ})$ takes an $S \in Ω (v_{λ})$ , drops all paths which do not pass through $v_{μ}$ and removes from those who do the initial edge $(a, b)$ . If no paths in $S$ pass through $v_{μ}$ then $S$ is mapped to $\emptyset \in Ω (v_{μ})$ .

If $Ω = ⨆_{λ} Ω (v_{λ})$ then the subobject classifier is the morphism in $Poly$
$Ω : Ω y^{Ω} \to C y^{A \times B}$
sending a path starting in $v_{λ}$ to the colour of $v_{λ}$ and the backtrack map of $(a, b)$ the image of the path under the map $Ω (v_{λ}) \to Ω (v_{μ})$ .

Ok, let’s define the Learner’s Mitchell-Benabou language.

We’ll view a learner $P y^{P} \to C y^{A \times B}$ as a set-valued representation $P$ of the directed graph $G$ with vertex set $P_{λ}$ placed at vertex $v_{λ}$ .

A formula $ϕ (p)$ of the language with a free variable $p$ is a morphism (of representations of $G$ ) from a learner $P$ to the subobject classifier
$ϕ : P \to Ω$
Such a morphism determines a sub-representation of $P$ which we can denote ${p | ϕ (p)}$ with vertex sets
${p | ϕ (p)}_{λ} = {p \in P_{λ} | ϕ (v_{λ}) (p) = 1_{λ}}$

On formulas we can apply logical connectives to get more formulas. For example, the formula $ϕ (p) \Rightarrow ψ (q)$ is the composition
$P \times Q \to^{ϕ \times ψ} Ω \times Ω \to^{\Rightarrow} Ω$

By quantifying all free variables we get a formula without free variables, and those correspond to morphisms $1 \to Ω$ , that is, to sub-representations of the terminal object $1$ .

For example, if $ϕ (p)$ is the formula with free variable $p$ corresponding to the morphism $ϕ : P \to Ω$ , then we have
$\forall p : ϕ (p) = {v_{λ} \in V | {p | ϕ (p)}_{λ} = P_{λ}}$
and
$\exists p : ϕ (p) = {v_{λ} \in V | {p | ϕ (p)}_{λ} \neq \emptyset}$

Sub-representations of $1$ again form a Heyting-algebra in the obvious way, so we can assign a “truth-value” to a formula without free variables as that sub-object of $1$ .

There’s a lot more to say, so perhaps this will be continued.