## Python implements simple K-Means classification

Write a simple K-Means classification through Python

The specific method is actually very simple:

1. Generate several types of random data points
2. Randomly generate K centers
3. For each point point, find the center point with the closest distance, that is, the classification
• Take the average of the data points in each classification set as the new center point coordinates
• If the distance between all new center points and old center points is less than a certain threshold, the classification is complete; otherwise, iterate
``````import matplotlib . pyplot as plt
import numpy as np
import random
from icecream import ic
from collections import defaultdict
from matplotlib . colors import BASE_COLORS

def  random_centers ( k , points ) :
for i in  range ( k ) :
# randomly generate k center points in the original possible coordinates
yield random . choice ( points [ : ,  0 ] ) , random . choice ( points [ : ,  1 ] )

def  mean ( points ) :
#all_x,all_y are lists
all_x , all_y =  [ x for x , y in points ] ,  [ y for x , y in points ]
return np . mean ( all_x ) , np . mean ( all_y )

def  distance ( p1 , p2 ) :
# Find the distance between two points
x1 , y1 = p1
x2 , y2 = p2
return np . sqrt ( ( x1 - x2 )  **  2  +  ( y1 - y2 ) ** 2 )

def  draw_points ( centers , centers_neighbor , colors ) :
# Traverse each center point
for i , c in  enumerate ( centers ) :
# Get the set of points covered by the center point
_points = centers_neighbor [ c ]
all_x , all_y =  [ x for x , y in _points ] ,  [ y for x , y in_points ]
# draw the corresponding points in color
plt . scatter ( all_x , all_y , c = colors [ i ] )
plt . show ( )

def  kmeans ( k , points , centers = None ) : #Get
a list representing color information values
colors =  list ( BASE_COLORS . values ( ) ) #If
no centers are generated, randomly generate one
if  not centers :
centers =  list ( random_centers ( k = k , points = points ) ) #easy
to debug
ic ( centers )
for i, c in  enumerate ( centers ) : #enumerate() combines an iterable data object (such as a list, tuple or string) into a sequence of indices
plt . scatter ( [ c [ 0 ] ] ,  [ c [ 1 ] ] , s = 90 , marker = '*' , c = colors [ i ] ) # draw a scatterplot

plt . scatter ( * zip ( * points ) , c = 'black' )
#defaultdict is that when the key in the dictionary does not exist but is searched, it returns not a keyError but a default value set corresponding to set( ) , that is, an empty set is returned when there is no key
centers_neighbor = defaultdict ( set )

for p in points :
#min function returns a center point coordinate
closet_c =  min ( centers , key = lambda c : distance ( p , c ) )
#Add points to the nearest center point set
centers_neighbor [ closet_c ] .add ( tuple ( p ) )

#ic(centers_neighbor)

draw_points ( centers , centers_neighbor , colors )

new_centers =  [ ]

for c in centers_neighbor :
# Calculate the average of all points contained in each center point as a new center point
new_c = mean ( centers_neighbor [ c ] )
new_centers . append ( new_c )

threshold =  0.1
distances_old_and_new =  [ distance ( c_old , c_new )  for c_old , c_new in  zip ( centers , new_centers ) ]
#ic(distances_old_and_new)
if  all ( c < threshold for c in distances_old_and_new ) :
return centers_neighbor
else :
kmeans ( k , points, new_centers )

if __name__ ==  '__main__' : #Randomly
generate four sets of data
points0 = np . random . normal ( loc = 1 , size = ( 100 , 2 ) )
points1 = np . random . normal ( loc = 2 , size = ( 100 ,  2 ) )
points2 = np . random . normal( loc = 4 , size = ( 100 ,  2 ) )
points3 = np . random . normal ( loc = 5 , size = ( 100 ,  2 ) )

points = np . concatenate ( [ points0 , points1 , points2 , points3 ] )

kmeans ( 3 , points = points , centers = None )
``````
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10
• 11
• 12
• 13
• 14
• 15
• 16
• 17
• 18
• 19
• 20
• 21
• 22
• 23
• 24
• 25
• 26
• 27
• 28
• 29
• 30
• 31
• 32
• 33
• 34
• 35
• 36
• 37
• 38
• 39
• 40
• 41
• 42
• 43
• 44
• 45
• 46
• 47
• 48
• 49
• 50
• 51
• 52
• 53
• 54
• 55
• 56
• 57
• 58
• 59
• 60
• 61
• 62
• 63
• 64
• 65
• 66
• 67
• 68
• 69
• 70
• 71
• 72
• 73
• 74
• 75
• 76
• 77
• 78
• 79
• 80
• 81
• 82
• 83
• 84

Effect picture:

First iteration: Second iteration: Third iteration: Fourth iteration: Fifth iteration: Classification complete!

#### Tags: Python implements simple K-Means classification

AI/Machine Learning Topics python machine learning artificial intelligence