Parser Tree - NextGen Science Daily

A parser is a compiler / interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens or program instructions and usually builds a data structure in the form of a parse tree or an abstract syntax tree.

A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens -- this token is a number, that token is a string literal, this other token is an equality operator. A parser takes the stream of tokens from the lexer and turns it into an abstract syntax tree ...

O parser é um analisador sintático. Sua função é ler uma entrada de dados que possuem certas regras específicas - em geral é um texto reconhecível por humanos - e montar uma estrutura de como é sua composição.

Implementing the parser by hand is usually done by some frameworks. Implementing something like that by hand and efficiently is usually done at a university in the better part of a semester.

A parser for that language would accept AABB input and reject the AAAB input. That is what a parser does. In addition, during this process a data structure could be created for further processing. In my previous example, it could, for instance, to store the AA and BB in two separate stacks.

Are lexers and parsers really that different in theory? It seems fashionable to hate regular expressions: Coding Horror and another blog post. However, popular lexing-based tools Pygments, geshi, and