knn算法解析笔记

Knn算法实例解析

Knn 算法,个人理解是就是计算点的距离,找出与当前点最近的一些点。

前提知识


>>> a = np.array([0, 1, 2])
>>> np.tile(a, 2)
array([0, 1, 2, 0, 1, 2])
>>> np.tile(a, (2, 2))
array([[0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2]])
>>> np.tile(a, (2, 1, 2))
array([[[0, 1, 2, 0, 1, 2]],
[[0, 1, 2, 0, 1, 2]]])


实例解析

classify0 inX dataSet, labels, k 参数输入 inX [0,0] dataSet array([[ 1. , 1.1], [ 1. , 1. ], [ 0. , 0. ], [ 0. , 0.1]]) labels [‘A’, ‘A’, ‘B’, ‘B’]


def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    diffMat = tile(inX, (dataSetSize,1)) - dataSet
	diffMat = array([[-1. , -1.1],
#       [-1. , -1. ],
#       [ 0. ,  0. ],
#       [ 0. , -0.1]])
    sqDiffMat = diffMat**2
#	array([[ 1.  ,  1.21],
#       [ 1.  ,  1.  ],
#       [ 0.  ,  0.  ],
#       [ 0.  ,  0.01]])
    sqDistances = sqDiffMat.sum(axis=1)
#   array([ 2.21,  2.  ,  0.  ,  0.01])
    distances = sqDistances**0.5
#   array([ 1.48660687,  1.41421356,  0.        ,  0.1       ])
    sortedDistIndicies = distances.argsort()
#   array([2, 3, 1, 0])
    classCount={}
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
#   classCount = {'A': 1, 'B': 2}
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
#   [('B', 2), ('A', 1)]
    return sortedClassCount[0][0]

参考

机器学习实践

numpy.tile

Loading Disqus comments...
Table of Contents