机器学习入门：K-近邻算法_技术资料_物联网_中国计算网——工业互联网一站式服务平台—

if (classifierResult != datingLabels[i]) : errorCount += 1.0 # 错误记录与处理等 print "the total error rate is: %f" % (errorCount / float(numTestVecs))

然后我们在python环境中通过

reload(kNN)

来重新加载kNN.py模块，然后调用

kNN.datingClassTest()

得到结果:

the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1...the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 1the total error rate is: 0.050000

所以我们看到，数据集的错误率是5%，这里会有一定的偏差，因为我们随机选取的数据可能会不同。

使用算法

我们使用上面建立好的分类器构建一个可用的系统，通过输入这些特征值帮她预测喜欢程度。我们来编写代码：

def classifyPerson() :    resultList = ['not', 'small doses', 'large does']    percentTats = float(raw_input("percent of time spent>"))    miles = float(raw_input("flier miles per year?"))    ice = float(raw_input("liters of ice-cream?"))    datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')    normMat, ranges, minVals = autoNorm(datingDataMat)    inArr = array([miles, percentTats, ice])    classifierResult = classify0((inArr - minVals) / ranges, normMat, datingLabels, 3)    print "you will like this person: ", resultList[classifierResult - 1]

这里的代码大家比较熟悉了，就是加入了raw_input用于输入，我们来看结果：

>>> reload(kNN)<module 'kNN' from 'kNN.py'>>>> kNN.classifyPerson()percent of time spent>?10flier miles per year?10000liters of ice-cream?0.5you will like this person:  small doses

我们在做近邻算法的时候也发现，并没有做训练算法这一环节，因为我们不需要训练，直接计算就好了。

同时我们也发现一个问题，k-近邻算法特别慢，它需要对每一个向量进行距离的计算，这该怎么优化呢？

6/6 首页上一页 4 5 6