knn算法解析笔记
Knn算法实例解析
Knn 算法,个人理解是就是计算点的距离,找出与当前点最近的一些点。
前提知识
>>> a = np.array([0, 1, 2])
>>> np.tile(a, 2)
array([0, 1, 2, 0, 1, 2])
>>> np.tile(a, (2, 2))
array([[0, 1, 2, 0, 1, 2],
[0, 1, 2, 0, 1, 2]])
>>> np.tile(a, (2, 1, 2))
array([[[0, 1, 2, 0, 1, 2]],
[[0, 1, 2, 0, 1, 2]]])
实例解析
classify0 inX dataSet, labels, k 参数输入 inX [0,0] dataSet array([[ 1. , 1.1], [ 1. , 1. ], [ 0. , 0. ], [ 0. , 0.1]]) labels [‘A’, ‘A’, ‘B’, ‘B’]
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize,1)) - dataSet
diffMat = array([[-1. , -1.1],
# [-1. , -1. ],
# [ 0. , 0. ],
# [ 0. , -0.1]])
sqDiffMat = diffMat**2
# array([[ 1. , 1.21],
# [ 1. , 1. ],
# [ 0. , 0. ],
# [ 0. , 0.01]])
sqDistances = sqDiffMat.sum(axis=1)
# array([ 2.21, 2. , 0. , 0.01])
distances = sqDistances**0.5
# array([ 1.48660687, 1.41421356, 0. , 0.1 ])
sortedDistIndicies = distances.argsort()
# array([2, 3, 1, 0])
classCount={}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
# classCount = {'A': 1, 'B': 2}
sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
# [('B', 2), ('A', 1)]
return sortedClassCount[0][0]
参考
机器学习实践