背景:关于CRF的应用,尤其是在深度学习之前它是一个nlp序列建模的比较主流方法,即使是深度学习大行其道,也会出现crf的影子,比如bilstm+crf, bert+crf. 关于crf的原理可参考众多的资料,本文提供一个在tensorflow中使用crf的一个简要概述。
CRF使用的主要API
crf_log_likelihood
12345678910111213141516171819def crf_log_likelihood(inputs,tag_indices,sequence_lengths,transition_params=None):"""Computes the log-likelihood of tag sequences in a CRF.Args:inputs: A [batch_size, max_seq_len, num_tags] tensor of unary potentialsto use as input to the CRF layer.tag_indices: A [batch_size, max_seq_len] matrix of tag indices for which wecompute the log-likelihood.sequence_lengths: A [batch_size] vector of true sequence lengths.transition_params: A [num_tags, num_tags] transition matrix, if available.Returns:log_likelihood: A [batch_size] `Tensor` containing the log-likelihood ofeach example, given the sequence of tag indices.transition_params: A [num_tags, num_tags] transition matrix. This is eitherprovided by the caller or created in this function."""输入:
- inputs,一元势能得分,针对每个word级别,每个标签的预测概率值,<句子长度,标签大小>的tensor
- tag_indices:真实标签的序列,<句子长度,标签大小>的tensor
- sequence_lengths:实际标签序列的长度,为一个值
- transition_params:标签状态转移矩阵,学习的参数矩阵,可以预先给定
输出:
- log_likelihood,word级别的对数似然概率
- transition_params:学习后的状态转移矩阵
解码过程的两个可用的API
- tf.contrib.crf.viterbi_decode(tf_unaryscores, tf_transition_params)
- tf.contrib.crf.crf_decode(unary_scores, transition_params, sequence_lengths)
一个具体的例子:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859import numpy as npimport tensorflow as tf# Data settings.num_examples = 10num_words = 20num_features = 100num_tags = 5# Random features.x = np.random.rand(num_examples, num_words, num_features).astype(np.float32)# Random tag indices representing the gold sequence.y = np.random.randint(num_tags, size=[num_examples, num_words]).astype(np.int32)# All sequences in this example have the same length, but they can be variable in a real model.sequence_lengths = np.full(num_examples, num_words - 1, dtype=np.int32)# Train and evaluate the model.with tf.Graph().as_default():with tf.Session() as session:# Add the data to the TensorFlow graph.x_t = tf.constant(x)y_t = tf.constant(y)sequence_lengths_t = tf.constant(sequence_lengths)# Compute unary scores from a linear layer.weights = tf.get_variable("weights", [num_features, num_tags])matricized_x_t = tf.reshape(x_t, [-1, num_features])matricized_unary_scores = tf.matmul(matricized_x_t, weights)unary_scores = tf.reshape(matricized_unary_scores,[num_examples, num_words, num_tags])# Compute the log-likelihood of the gold sequences and keep the transition# params for inference at test time.log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(unary_scores, y_t, sequence_lengths_t)# Compute the viterbi sequence and score.viterbi_sequence, viterbi_score = tf.contrib.crf.crf_decode(unary_scores, transition_params, sequence_lengths_t)# Add a training op to tune the parameters.loss = tf.reduce_mean(-log_likelihood)train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)session.run(tf.global_variables_initializer())mask = (np.expand_dims(np.arange(num_words), axis=0) <np.expand_dims(sequence_lengths, axis=1))total_labels = np.sum(sequence_lengths)# Train for a fixed number of iterations.for i in range(1000):tf_viterbi_sequence, _ = session.run([viterbi_sequence, train_op])if i % 100 == 0:correct_labels = np.sum((y == tf_viterbi_sequence) * mask)accuracy = 100.0 * correct_labels / float(total_labels)print("Accuracy: %.2f%%" % accuracy)
CRF 编码/训练逻辑
获得表示:获得原始的model产出,比如lstm或者bert的表示,<句子长度,隐藏大小>的tensor.
计算每个word得分/全局的一元势函数(global unary potential):加入Project映射层计算每个word的得分,输出的大小为<句子长度,标签大小>
Train: 使用crf_log_likelihood进行极大似然估计的train,这里可以标签的转移矩阵需要学习,是一个参数矩阵,得到一个log_likelihood和标签状态转移表示(decode用)
log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(
unary_scores, gold_tags, sequence_lengths, trans)在word级别上累加计算loss
loss = tf.reduce_mean(-log_likelihood)
选择具体的优化算法进行学习
1train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
CRF 解码/测试逻辑
一种可以使用crf_decode的tf代码进行解码,另外一种使用Numpy代码的viterbi_decode解码方式
numpy的解码风格:
|
|
TF的风格
|
|
主要逻辑:
通过crf正向过程获得标签状态转移的参数transition_params。
log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(
unary_scores, gold_tags, sequence_lengths)尝试上面介绍的任意一种解码的风格,输入参数都是transition_params和unary_scores一元势能函数,得到具体的解码序列
要点:
1.transition_params标都是来自于crf_log_likelihood的第二个输出值
|
|
2.unary_scores,是针对每个词的输入的得分(一元势能函数得分),维度大小是<句子长度, 标签大小>,一般来源都是经过了一个project层/全联接层,比如
- bi-lstm, 输出为 <句子长度,2倍隐藏大小>, 这时需要一个映射层<2倍隐藏大小,标签大小>,转化为针对每个word的<句子长度, 标签大小>.2倍隐藏大小,标签大小>
- Bert,输出为<句子长度,隐层大小768>, 这时需要一个映射层<隐层大小,标签大小>,转化为针对每个word的<句子长度, 标签大小>.
3.训练过程和测试/解码过程都需要传入句子的实际长度的参数,主要是针对输入的长度进行了padding为最长,测试统计的时候要去掉这个部分。