Understanding the mathematical instinct behind the curse of dimensionality
The curse of dimensionality refers back to the issues that come up when analyzing high-dimensional information. The dimensionality or dimension of a dataset refers back to the variety of linearly impartial options in that dataset, so a high-dimensional dataset is a dataset with numerous options. This time period was first coined by Bellman in 1961 when he noticed that the variety of samples required to estimate an arbitrary perform with a sure accuracy grows exponentially with respect to the variety of parameters that the perform takes.
On this article, we take an in depth have a look at the mathematical issues that come up when analyzing a high-dimensional set. Although these issues might look counterintuitive, it’s potential to erxpalin them intuitively. As an alternative of a purely theoretical dialogue, we use Python to create and analyze high-dimensional datasets and see how the curse of dimensionality manifests itself in apply. On this article, all pictures, except in any other case famous, are by the writer.
Dimension of a dataset
As talked about earlier than, the dimension of a dataset is outlined because the variety of linearly impartial options that it has. A linearly impartial characteristic can’t be written as a linear mixture of the options in that dataset. Therefore, if a characteristic or column in a dataset is a linear mixture of another options, it received’t add to the dimension of that dataset. For instance, Determine 1 reveals two datasets. The primary one has two linearly impartial columns and its dimension is 2. Within the second dataset, one column is a a number of of one other, therefore we solely have one impartial characteristic. Because the plot of this dataset reveals, regardless of having two options, all the info factors are alongside a 1-dimensional line. Therefore the dimension of this dataset is one.
The impact of dimensionality on quantity
The principle cause for the curse of dimensionality is the impact of the dimension on quantity. Right here, we concentrate on the geometrical interpretation of a dataset. Usually, we…