SHANTI - Sciences, Humanities, and Arts Network of Technological Initiatives

CLAWS (Constituent Likelihood Automatic Word-tagging System) / Profile

Home Page:

Part-of-speech (POS) tagging, also called grammatical tagging, is the commonest form of corpus annotation, and was the first form of annotation to be developed by UCREL at Lancaster. UCREL’s POS tagging software for English text, CLAWS (the Constituent Likelihood Automatic Word-tagging System), has been continuously developed since the early 1980s. The latest version of the tagger, CLAWS4, was used to POS tag c.100 million words of the British National Corpus (BNC). In tests, CLAWS consistently achieves 96-97% accuracy (the precise degree of accuracy varying according to the type of text). Judged in terms of major categories, the system has an error-rate of only 1.5%, with c.3.3% ambiguities unresolved, within the BNC.

Tool Type

Annotation-Commenting Tools

Interface Languages
English (416)