Linearized Relative Positional Encoding

Zhen Qin; Weixuan Sun; Kaiyue Lu; Hui Deng; Dongxu Li; Xiaodong Han; Yuchao Dai; Lingpeng Kong; Yiran Zhong

Linearized Relative Positional Encoding

Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Relative positional encoding is widely used in vanilla and linear transformers to repre-sent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Never-theless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for vari-ous applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classifi-cation. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

源语言	英语
期刊	Transactions on Machine Learning Research
卷	2023
出版状态	已出版 - 1 9月 2023
已对外发布	是

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d7b5946ca6044ca2b67ffe79ed077a9a,

title = "Linearized Relative Positional Encoding",

abstract = "Relative positional encoding is widely used in vanilla and linear transformers to repre-sent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Never-theless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for vari-ous applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classifi-cation. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.",

author = "Zhen Qin and Weixuan Sun and Kaiyue Lu and Hui Deng and Dongxu Li and Xiaodong Han and Yuchao Dai and Lingpeng Kong and Yiran Zhong",

year = "2023",

month = sep,

day = "1",

language = "英语",

volume = "2023",

journal = "Transactions on Machine Learning Research",

issn = "2835-8856",

publisher = "Transactions on Machine Learning Research",

}

TY - JOUR

T1 - Linearized Relative Positional Encoding

AU - Qin, Zhen

AU - Sun, Weixuan

AU - Lu, Kaiyue

AU - Deng, Hui

AU - Li, Dongxu

AU - Han, Xiaodong

AU - Dai, Yuchao

AU - Kong, Lingpeng

AU - Zhong, Yiran

PY - 2023/9/1

Y1 - 2023/9/1

N2 - Relative positional encoding is widely used in vanilla and linear transformers to repre-sent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Never-theless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for vari-ous applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classifi-cation. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

AB - Relative positional encoding is widely used in vanilla and linear transformers to repre-sent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Never-theless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for vari-ous applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classifi-cation. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

UR - http://www.scopus.com/inward/record.url?scp=86000158982&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:86000158982

SN - 2835-8856

VL - 2023

JO - Transactions on Machine Learning Research

JF - Transactions on Machine Learning Research

ER -

Linearized Relative Positional Encoding

摘要

其它文件与链接

指纹

引用此