From Lazarus wiki
Revision as of 10:41, 15 December 2020 by E-ric (talk | contribs) (Scanner/Tokenizer)

English (en) français (fr)

Retour au contenu FPC internals


Le scanner et tokenizer est utilisé pour construire un flux d'entrée de jetons qui alimentera l'analyseur. C'est lors de cette étape que les prétraitement (preprocessing) est réalisé, que toutes les directives du compilateur qui sont lues modifient l'état interne des variables du compilateur, et que tous les caractères illégaux trouvés dans le flux d'entrée provoquent une erreur.

Infos sur comment les macros fonctionnent : Macro internals.


L'architecture générale du scanner est montrée dans la figure suivante:

Plusieurs types peuvent être lus depuis le flux d'entrée, une chaîne, traitée par readstring, une valeur numérique, traitée par readnumeric, des commentaires, des directives du préprocesseur et du compilateur.

Flux d'entrée

(last updated for fpc version 1.0.x)

The input data is handled via the standard way of handling all the I/O in the compiler. That is to say, that it is a hook which can be overriden in comphook.pas (do_openinputfile), in case where another I/O method wants to be used.

The default hook uses a non-buffered dos stream contained in files.pas


(last updated for fpc version 1.0.x)

The scanner resolves all preprocessor directives and only gives to the parser the visible parts of the code (such as those which are included in conditional compilation). Compiler switches and directives are also saved in global variables while in the preprocessor, therefore this is part is completely independent of the parser.

Compilation conditionnelle (, scanner.pas)

(last updated for fpc version 1.0.x)

The conditional compilation is handled via a preprocessor stack, where each directive is pushed on a stack, and popped when it is resolved. The actual implementation of the stack is a linked list of preprocessor directive items.

Commutateurs du compiler (, switches.pas)

(last updated for fpc version 1.0.x)

The compiler switches are handled via a lookup table which is linearly searched. Then another lookup table takes care of setting the appropriate bit flags and variables in the switches for this compilation process.

Interface du scanner

(last updated for fpc version 1.0.x)

The parser only receives tokens as its input, where a token is a enumeration which indicates the type of the token, either a reserved word, a special character, an operator, a numeric constant, string, or an identifier.

Resolution of the string into a token is done via lookup which searches the string table to find the equivalent token. This search is done using a binary search algorithm through the string table.

In the case of identifiers, constants (including numeric values), the value is returned in the pattern string variable , with the appropriate return value of the token (numeric values are also returned as non-converted strings, with any special prefix included). In the case of operators, and reserved words, only the token itself must be assumed to be preserved. The read input string is assmued to be lost.

Therefore the interface with the parser is with the readtoken() routine and the pattern variable.



Declaration: procedure ReadToken;
Description: Sets the global variable token to the current token read, and sets the pattern variable appropriately (if required).



Declaration: var Token: TToken;
Description: Contains the contain token which was last read by a call to ReadToken
See also: ReadToken


Declaration: var Pattern: String;
Description: Contains the string of the last pattern read by a call to ReadToken
See also: ReadToken

Interface du parseur d'assembleur

(last updated for fpc version 1.0.x)

The inline assembler parser is completely separate from the pascal parser, therefore its scanning process is also completely independent. The scanner only takes care of the preprocessor part and comments, all the rest is passed character per character to the assembler parser via the AsmGetChar() scanner routine.



Declaration: function AsmGetChar: Char;
Description: Returns the next character in the input stream.

Prochain chapitre: L'arbre d'analyse