其中y'是实际分布,y是预测的分布,即:
y_ = tf.placeholder("float", [None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
利用梯度下降法优化上面定义的Variable:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
其中0.01是学习速率,也就是每次对变量做多大的修正
按照上面的思路,最终实现的代码digital_recognition.py如下:
# coding:utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('data_dir', './', 'Directory for storing data')
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.InteractiveSession()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))
运行效果如下:
[[email protected] $] python digital_recognition.py
Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz
0.9039
解释一下
flags.DEFINE_string('data_dir', './', 'Directory for storing data')
表示我们用当前目录作为训练数据的存储目录,如果我们没有提前下好训练数据和测试数据,程序会自动帮我们下载到./
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
这句直接用库里帮我们实现好的读取训练数据的方法,无需自行解析
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
这几行表示我们循环1000次,每次从训练样本里选取100个样本来做训练,这样我们可以修改配置来观察运行速度
最后几行打印预测精度,当调整循环次数时可以发现总训练的样本数越多,精度就越高