# CYK algorithm

The**Cocke-Younger-Kasami (CYK) algorithm**(alternatively called CKY) determines whether a

string can be generated by a given

The standard version of CYK recognizes languages defined by context-free grammars written in

The worst case

^{3}), where "n" is the length of the parsed string. This makes it one of the most efficient (in those terms) algorithms for recognizing any context-free language. However, there are other algorithms that will perform better for certain subsets of the context-free languages.

**The algorithm**

The CYK algorithm is a bottom up algorithm and is important theoretically, since it can be used to constructively prove that the

The CYK algorithm for the membership problem is as follows:

**Let**the input string consist of "n" letters, "a"

_{1}... "a"

_{"n"}.

**Let**the grammar contain "r" terminal and nonterminal symbols "R"

_{1}... "R"

_{"r"}. This grammar contains the subset R

_{s}which is the set of start symbols.

**Let**P [n,n,r] be an array of booleans. Initialize all elements of P to false.

**For each**i = 1 to n

**For each**unit production R

_{j}-> a

_{i},

**set**P [i,1,j] = true.

**For each**i = 2 to n -- Length of span

**For each**j = 1 to n-i+1 -- Start of span

**For each**k = 1 to i-1 -- Partition of span

**For each**production R

_{A}-> R

_{B}R

_{C}

**If**P [j,k,B] and P [j+k,i-k,C]

**then**set P [j,i,A] = true

**If**any of P [1,n,x] is true (x is iterated over the set s, where s are all the indices for R

_{s})

**Then**string is member of language

**Else**string is not member of language

**Informally**

In informal terms, this algorithm considers every possible subsequence of the sequence of words and sets P [i,j,k] to be true if the subsequence of words starting from i of length j can be generated from R

_{k}. Once it has considered subsequences of length 1, it goes on to subsequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two parts, and checks to see if there is some production P → Q R such that Q matches the first part and R matches the second part. If so, it records P as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire sentence is matched by the start symbol.

**Algorithm extension**

It is simple to extend the above algorithm to not only determine if a sentence is in a language, but to also construct a

It is also possible to extend the CYK algorithm to parse strings using weighted and

## No comments:

## Post a Comment