The reach is an important geometric invariant of submanifolds of Euclidean space. It is a real-valued global invariant incorporating information about the second fundamental form of the embedding and the location of the first critical point of the distance from the submanifold. In the subject of geometric inference, the reach plays a crucial role. I will give a new method of estimating the reach of a submanifold, developed jointly with Clément Berenfeld, Marc Hoffmann and Krishnan Shankar.

# Past Topological Data Analysis Seminar

The talk will introduce two general models of random simplicial complexes which extend the highly studied Erdös-Rényi model for random graphs. These models include the well known probabilistic models of random simplicial complexes from Costa-Farber, Kahle, and Linial-Meshulam as special cases. These models turn out to have a satisfying Alexander duality relation between them prompting the hope that information can be transferred for free between them. This turns out to not quite be the case with vanishing probability parameters, but when all parameters are uniformly bounded the duality relation works a treat. Time permitting I may talk about the Rado simplicial complex, the unique (with probability one) infinite random simplicial complex.

This talk is based on various bits of joint work with Michael Farber, Tahl Nowik, and Lewin Strauss.

Molecules are dynamical systems that can adopt a variety of three dimensional conformations which, in general, differ in energy and physical properties. The identification of energetically favourable conformations is fundamental in molecular physics and computational chemistry, since it is closely related to important open problems such as the prediction of the folding of proteins and virtual screening for drug design.

In this talk I will present theoretical and data-driven approaches to the study of molecular conformational spaces and their associated energy landscapes. I will show that the topology of the internal molecular conformational space might change after taking its quotient by the group action of a discrete group of symmetries. I will also show that geometric and topological tools for data analysis such as procrustes analysis, local dimensionality reduction, persistent homology and discrete Morse theory provide with efficient methods to study the mathematical structures underlying the molecular conformational spaces and their energy landscapes.

The goal of topological data analysis is to apply tools form algebraic topology to reveal geometric structures hidden within high dimensional data. Mapper is among its most widely and successfully applied tools providing, a framework for the geometric analysis of point cloud data. Given a number of input parameters, the Mapper algorithm constructs a graph, giving rise to a visual representation of the structure of the data. The Mapper graph is a topological representation, where the placement of individual vertices and edges is not important, while geometric features such as loops and flares are revealed.

However, Mappers method is rather ad hoc, and would therefore benefit from a formal approach governing how to make the necessary choices. In this talk I will present joint work with Francisco Belchì, Jacek Brodzki, and Mahesan Niranjan. We study how sensitive to perturbations of the data the graph returned by the Mapper algorithm is given a particular tuning of parameters and how this depend on the choice of those parameters. Treating Mapper as a clustering generalisation, we develop a notion of instability of Mapper and study how it is affected by the choices. In particular, we obtain concrete reasons for high values of Mapper instability and experimentally demonstrate how Mapper instability can be used to determine good Mapper outputs.

Our approach tackles directly the inherent instability of the choice of clustering procedure and requires very few assumption on the specifics of the data or chosen Mapper construction, making it applicable to any Mapper-type algorithm.

Configuration spaces of points in Euclidean space or on a manifold are well studied in algebraic topology. But what if the points have some positive thickness? This is a natural setting from the point of view of physics, since this the energy landscape of a hard-spheres system. Such systems are observed experimentally to go through phase transitions, but little is known mathematically.

In this talk, I will focus on two special cases where we have started to learn some things about the homology: (1) hard disks in an infinite strip, and (2) hard squares in a square or rectangle. We will discuss some theorems and conjectures, and also some computational results. We suggest definitions for "homological solid, liquid, and gas" regimes based on what we have learned so far.

This is joint work with Hannah Alpert, Ulrich Bauer, Robert MacPherson, and Kelly Spendlove.

In my talk I will discuss the use of topological methods in the analysis of neural data. I will show how to obtain good state spaces for Head Direction Cells and Grid Cells. Topological decoding shows how neural firing patterns determine behaviour. This is a local to global situation which gives rise to some reflections.

Lines and planes can be fitted to data by minimising the sum of squared distances from the data to the geometric object. But what about fitting objects from topology such as simplicial complexes? I will present a method of fitting topological objects to data using a maximum likelihood approach, generalising the sum of squared distances. A simplicial mixture model (SMM) is specified by a set of vertex positions and a weighted set of simplices between them. The fitting process uses the expectation-maximisation (EM) algorithm to iteratively improve the parameters.

Remarkably, if we allow degenerate simplices then any distribution in Euclidean space can be approximated arbitrarily closely using a SMM with only a small number of vertices. This theorem is proved using a form of kernel density estimation on the n-simplex.

In this talk, linear algebra for persistence modules will be introduced, together with a generalization of persistent homology. This theory permits us to handle the Mayer-Vietoris spectral sequence for persistence modules, and solve any extension problems that might arise. The result of this approach is a distributive algorithm for computing persistent homology. That is, one can break down the underlying data into different covering subsets, compute the persistent homology for each cover, and join everything together. This approach has the added advantage that one can recover extra geometrical information related to the barcodes. This addresses the common complaint that persistent homology barcodes are 'too blind' to the geometry of the data.

We can see the simplest setting of persistence from a functional point of view: given a fixed finite simplicial complex, we have the barcode function which, given a filter function over this complex, returns the corresponding persistent diagram. The bottleneck distance induces a topology on the space of persistence diagrams, and makes the barcode function a continuous map: this is a consequence of the stability Theorem. In this presentation, I will present ongoing work that seeks to deepen our understanding of the analytic properties of the barcode function, in particular whether it can be said to be smooth. Namely, if we smoothly vary the filter function, do we get smooth changes in the resulting persistent diagram? I will introduce a notion of differentiability/smoothness for barcode valued maps, and then explain why the barcode function is smooth (but not everywhere) with respect to the choice of filter function. I will finally explain why these notions are of interest in practical optimisation/learning situations.

The amount and complexity of biological data has increased rapidly in recent years with the availability of improved biological tools. When applying persistent homology to large data sets, many of the currently available algorithms however fail due to computational complexity preventing many interesting biological applications. De Silva and Carlsson (2004) introduced the so called Witness Complex that reduces computational complexity by building simplicial complexes on a small subset of landmark points selected from the original data set. The landmark points are chosen from the data either at random or using the so called maxmin algorithm. These approaches are not ideal as the random selection tends to favour dense areas of the point cloud while the maxmin algorithm often selects outliers as landmarks. Both of these problems need to be addressed in order to make the method more applicable to biological data. We study new ways of selecting landmarks from a large data set that are robust to outliers. We further examine the effects of the different subselection methods on the persistent homology of the data.