## Python implements simple K-Means classification

Write a simple K-Means classification through Python

The specific method is actually very simple:

- Generate several types of random data points
- Randomly generate K centers
- For each point point, find the center point with the closest distance, that is, the classification
- Take the average of the data points in each classification set as the new center point coordinates
- If the distance between all new center points and old center points is less than a certain threshold, the classification is complete; otherwise, iterate

```
import matplotlib . pyplot as plt
import numpy as np
import random
from icecream import ic
from collections import defaultdict
from matplotlib . colors import BASE_COLORS
def random_centers ( k , points ) :
for i in range ( k ) :
# randomly generate k center points in the original possible coordinates
yield random . choice ( points [ : , 0 ] ) , random . choice ( points [ : , 1 ] )
def mean ( points ) :
#all_x,all_y are lists
all_x , all_y = [ x for x , y in points ] , [ y for x , y in points ]
return np . mean ( all_x ) , np . mean ( all_y )
def distance ( p1 , p2 ) :
# Find the distance between two points
x1 , y1 = p1
x2 , y2 = p2
return np . sqrt ( ( x1 - x2 ) ** 2 + ( y1 - y2 ) ** 2 )
def draw_points ( centers , centers_neighbor , colors ) :
# Traverse each center point
for i , c in enumerate ( centers ) :
# Get the set of points covered by the center point
_points = centers_neighbor [ c ]
all_x , all_y = [ x for x , y in _points ] , [ y for x , y in_points ]
# draw the corresponding points in color
plt . scatter ( all_x , all_y , c = colors [ i ] )
plt . show ( )
def kmeans ( k , points , centers = None ) : #Get
a list representing color information values
colors = list ( BASE_COLORS . values ( ) ) #If
no centers are generated, randomly generate one
if not centers :
centers = list ( random_centers ( k = k , points = points ) ) #easy
to debug
ic ( centers )
for i, c in enumerate ( centers ) : #enumerate() combines an iterable data object (such as a list, tuple or string) into a sequence of indices
plt . scatter ( [ c [ 0 ] ] , [ c [ 1 ] ] , s = 90 , marker = '*' , c = colors [ i ] ) # draw a scatterplot
plt . scatter ( * zip ( * points ) , c = 'black' )
#defaultdict is that when the key in the dictionary does not exist but is searched, it returns not a keyError but a default value set corresponding to set( ) , that is, an empty set is returned when there is no key
centers_neighbor = defaultdict ( set )
for p in points :
#min function returns a center point coordinate
closet_c = min ( centers , key = lambda c : distance ( p , c ) )
#Add points to the nearest center point set
centers_neighbor [ closet_c ] .add ( tuple ( p ) )
#ic(centers_neighbor)
draw_points ( centers , centers_neighbor , colors )
new_centers = [ ]
for c in centers_neighbor :
# Calculate the average of all points contained in each center point as a new center point
new_c = mean ( centers_neighbor [ c ] )
new_centers . append ( new_c )
threshold = 0.1
distances_old_and_new = [ distance ( c_old , c_new ) for c_old , c_new in zip ( centers , new_centers ) ]
#ic(distances_old_and_new)
if all ( c < threshold for c in distances_old_and_new ) :
return centers_neighbor
else :
kmeans ( k , points, new_centers )
if __name__ == '__main__' : #Randomly
generate four sets of data
points0 = np . random . normal ( loc = 1 , size = ( 100 , 2 ) )
points1 = np . random . normal ( loc = 2 , size = ( 100 , 2 ) )
points2 = np . random . normal( loc = 4 , size = ( 100 , 2 ) )
points3 = np . random . normal ( loc = 5 , size = ( 100 , 2 ) )
points = np . concatenate ( [ points0 , points1 , points2 , points3 ] )
kmeans ( 3 , points = points , centers = None )
```

- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84

Effect picture:

First iteration:

Second iteration:

Third iteration:

Fourth iteration:

Fifth iteration:

Classification complete!