Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. The algorithm was first described publicly by Philip Gage in a February 1994 article "A New Algorithm for Data Compression" in the C Users Journal. A variant of the technique has shown to be useful in several natural language processing (NLP) applications, such as Google's SentencePiece, and OpenAI's GPT-3.
Attributes | Values |
---|---|
rdfs:label |
|
rdfs:comment |
|
sameAs | |
dbp:wikiPageUsesTemplate | |
Subject | |
gold:hypernym | |
prov:wasDerivedFrom | |
Wikipage page ID |
|
page length (characters) of wiki page |
|
Wikipage revision ID |
|
Link from a Wikipage to another Wikipage | |
has abstract |
|
foaf:isPrimaryTopicOf | |
is Wikipage redirect of | |
is Link from a Wikipage to another Wikipage of | |
is foaf:primaryTopic of |