I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.
|
Scipy's clustering implementations work well, and they include a k-means implementation. There's also scipy-cluster, which does agglomerative clustering; ths has the advantage that you don't need to decide on the number of clusters ahead of time. |
||||
|
SciPy's kmeans2() has some numerical problems: others have reported error messages such as "Matrix is not positive definite - Cholesky decomposition cannot be computed" in version 0.6.0, and I just encountered the same in version 0.7.1. For now, I would recommend using PyCluster instead. Example usage:
|
|||||||||||||||||
|
For continuous data, k-means is very easy. You need a list of your means, and for each data point, find the mean its closest to and average the new data point to it. your means will represent the recent salient clusters of points in the input data. I do the averaging continuously, so there is no need to have the old data to obtain the new average. Given the old average
Here is the full code in Python
you could just print the means when all the data has passed through, but its much more fun to watch it change in real time. I used this on frequency envelopes of 20ms bits of sound and after talking to it for a minute or two, it had consistent categories for the short 'a' vowel, the long 'o' vowel, and the 's' consonant. wierd! |
|||||
|
From wikipedia, you could use scipy, K-means clustering an vector quantization Or, you could use a Python wrapper for OpenCV, ctypes-opencv. Or you could OpenCV's new Python interface, and their kmeans implementation. |
||||
|
(Years later) this kmeans.py under is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means is straightforward and reasonably fast; it uses any of the 20-odd metrics in scipy.spatial.distance. |
||||
|
You can also use GDAL, which has many many functions to work with spatial data. |
|||
|
SciKit Learn's KMeans() is the simplest way to apply k-means clustering in Python. Fitting clusters is simple as:
This code snippet shows how to store centroid coordinates and predict clusters for an array of coordinates.
(courtesy of SciKit Learn's documentation, linked above) |
|||
|
Python's Pycluster and pyplot can be used for k-means clustering and for visualization of 2D data. A recent blog post Stock Price/Volume Analysis Using Python and PyCluster gives an example of clustering using PyCluster on stock data. |
||||
|
*This Code K-Means With Pyhon *
Read ... |
|||
|