We start from a large data-set
We’re after a set of unknown events

(From Geometry of the space of phylogenetic trees by Billera, Holmes and Vogtmann)
Biology explains these differences from the fact that certain species may have had more recent common ancestors than others. Ideally, the measured distances between DNA-samples are a tree metric. That is, if we can determine the full ancestor-tree of these species, there should be numbers between ancestor-nodes (measuring their difference in DNA) such that the distance between two existing species is the sum of distances over the edges of the unique path in this phylogenetic tree connecting the two species.
Last time we’ve see that a necessary and sufficient condition for a tree-metric is that for every quadruple
is attained at least twice.
In practice, it rarely happens that the measured distances between DNA-samples are a perfect fit to this condition, but still we would like to compute the most probable phylogenetic tree. In the above example, there will be two such likely trees:

(From Geometry of the space of phylogenetic trees by Billera, Holmes and Vogtmann)
How can we find them? And, if the distances in our data-set do not have such a direct biological explanation, is it still possible to find such trees of events (or perhaps, a forest of event-trees) explaining our distance function?
Well, tracking back these ancestor nodes looks a lot like trying to construct colimits.
By now, every child knows that if their toy category
But then, the child can cobble together too many crazy constructions, and the parents have to call in the Grothendieck police who will impose one of their topologies to keep things under control.
Can we fall back on this standard topos philosophy in order to find these forests of the unconscious?

(Image credit)
We have a data-set
Still, we can define the set
which is again a
so
The good news is that
The mental picture of a
But there’s hardly a subobject classifier to speak of, and so no Grothendieck topologies nor internal logic. So, how can we select from the abundance of enriched presheaves, the nodes of our event-forest?
We can look for special properties of the ancestor-nodes in a phylogenetic tree.

For any ancestor node
In other words, for every
Compare this to Stephen Wolfram’s belief that if we looked properly at “what ChatGPT is doing inside, we’d immediately see that ChatGPT is doing something “mathematical-physics-simple” like following geodesics”.
Even if the distance on
Right, now let’s look at a non-tree distance function on
Then again, for every
The simplest non-tree example is
In this case,

If this were a tree-metric,

Let’s say that
For an arbitrary data-set
It is known that
Apart from the Dress-paper mentioned above, I’ve found these papers informative:
- Metric spaces in pure and applied mathematics by Dress, Huber and Moulton
- Computing the bounded subcomplex of an unbounded polyhedron by Hermann, Joswig and Pfetsch
- Tropical convexity by Develin and Sturmfels
- Tight spans, Isbell completions and semi-tropical modules by Willerton
So far, we started from a data-set
Recently, Simon Willerton gave a talk at the African Mathematical Seminar called ‘Looking at metric spaces as enriched categories’:
Willerton also posts a series(?) on this at the n-category cafe, starting with Metric spaces as enriched categories I.
(tbc?)
Previously in this series: