About: Bitap algorithm     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : dbo:Island, within Data Space : el.dbpedia.org associated with source document(s)

The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance – if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.

AttributesValues
rdf:type
rdfs:label
  • Bitap algorithm (en)
rdfs:comment
  • The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance – if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast. (en)
sameAs
dbp:wikiPageUsesTemplate
Subject
Link from a Wikipage to an external page
gold:hypernym
prov:wasDerivedFrom
Wikipage page ID
page length (characters) of wiki page
Wikipage revision ID
Link from a Wikipage to another Wikipage
has abstract
  • The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance – if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast. The bitap algorithm is perhaps best known as one of the underlying algorithms of the Unix utility agrep, written by Udi Manber, , and . Manber and Wu's original paper gives extensions of the algorithm to deal with fuzzy matching of general regular expressions. Due to the data structures required by the algorithm, it performs best on patterns less than a constant length (typically the word length of the machine in question), and also prefers inputs over a small alphabet. Once it has been implemented for a given alphabet and word length m, however, its running time is completely predictable – it runs in O(mn) operations, no matter the structure of the text or the pattern. The bitap algorithm for exact string searching was invented by Bálint Dömölki in 1964 and extended by R. K. Shyamasundar in 1977, before being reinvented by Ricardo Baeza-Yates and Gaston Gonnet in 1989 (one chapter of first author's PhD thesis) which also extended it to handle classes of characters, wildcards, and mismatches. In 1991, it was extended by Manber and to handle also insertions and deletions (full fuzzy string searching). This algorithm was later improved by Baeza-Yates and Navarro in 1996. (en)
foaf:isPrimaryTopicOf
is Wikipage redirect of
is Link from a Wikipage to another Wikipage of
is foaf:primaryTopic of
Faceted Search & Find service v1.17_git151 as of Feb 20 2025


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Nov 11 2024, on Linux (x86_64-ubuntu_focal-linux-gnu), Single-Server Edition (71 GB total memory, 1 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software