TY - JOUR
T1 - Linearized Relative Positional Encoding
AU - Qin, Zhen
AU - Sun, Weixuan
AU - Lu, Kaiyue
AU - Deng, Hui
AU - Li, Dongxu
AU - Han, Xiaodong
AU - Dai, Yuchao
AU - Kong, Lingpeng
AU - Zhong, Yiran
N1 - Publisher Copyright:
© 2023, Transactions on Machine Learning Research. All rights reserved.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - Relative positional encoding is widely used in vanilla and linear transformers to repre-sent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Never-theless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for vari-ous applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classifi-cation. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.
AB - Relative positional encoding is widely used in vanilla and linear transformers to repre-sent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Never-theless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for vari-ous applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classifi-cation. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.
UR - http://www.scopus.com/inward/record.url?scp=86000158982&partnerID=8YFLogxK
M3 - 文章
AN - SCOPUS:86000158982
SN - 2835-8856
VL - 2023
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -