Python知識分享網(wǎng) - 專業(yè)的Python學(xué)習網(wǎng)站 學(xué)Python,上Python222
BERT研究論文 PDF 下載
匿名網(wǎng)友發(fā)布于:2026-01-17 11:11:25
(侵權(quán)舉報)
(假如點擊沒反應(yīng),多刷新兩次就OK!)

BERT研究論文 PDF 下載 圖1

 

 

資料內(nèi)容:

 

2 Related Work
There is a long history of pre-training general lan-
guage representations, and we briefly review the
most widely-used approaches in this section.

 

2.1 Unsupervised Feature-based Approaches
Learning widely applicable representations of
words has been an active area of research for
decades, including non-neural (Brown et al., 1992;
Ando and Zhang, 2005; Blitzer et al., 2006) and
neural (Mikolov et al., 2013; Pennington et al.,
2014) methods. Pre-trained word embeddings
are an integral part of modern NLP systems, of-
fering significant improvements over embeddings
learned from scratch (Turian et al., 2010). To pre-
train word embedding vectors, left-to-right lan-
guage modeling objectives have been used (Mnih
and Hinton, 2009), as well as objectives to dis-
criminate correct from incorrect words in left and
right context (Mikolov et al., 2013)