outline
"Gaussian" probability distribution function in the "Mixed Gaussian model" is clocked. In the case of one-dimensional, this is a symmetrical curve, the value is very low, then slowly grows, reaches the peak in the symmetrical center, and then gradually decay.
Standard k-mean clustering model, the model produces a number of clusters, and each cluster has a center. One way to consider this process is that the data to be clustered is in line with some probability distribution based on Gauss, and the mean of each probability distribution is the center of the cluster. These probability distributions give the probability value of each point in the space with the center of Gaussian distribution as a cluster. A number of Gaussian distributions are given, each distribution generates a cluster, which is the origin of the mixed Gaussian model name.
problem
Apply Gass distribution to cluster detection may bring two problems:
1, Gaussian distribution is one-dimensional, how to expand distribution To two-dimensional or even high dimensional?
2, Gaussian distribution is defined on the basis of mean and standard - how to find a suitable mean and standard deviation?
These issues are very It is important to solve these problems is the power of the mixed Gaussian model.
Multidavado model
Gaussle cluster curve defines the probability distribution of a single variable. The standard normal distribution of the curve is 0, the standard deviation is 1. The probability distribution after adding a variable is added to the joint probability distribution called statistically known. The final probability figure is similar to a hat or a symmetrical peak.
For normal distribution, the area under the curve is meaningful. If you want to know how much variable is negative, you need to calculate the area of all negative values under normal distribution curve. Since the curve is symmetrical, the area of the region is 50% of the total area.
In two dimensions, it is no longer a calculation of the area under the curve, but the volume under the surface. If you want to know how large the two variables are negative, it is necessary to calculate the volume of the area under the surface under the surface, and the result is 25% of the total volume.