Codetools

From Lazarus wiki
Revision as of 15:44, 28 July 2010 by Mattias2 (talk | contribs) (How the codetools parses sources, difference to a compiler)

Deutsch (de) English (en) français (fr) русский (ru)

What are the codetools

The codetools is a lazarus package providing tools to parse, explore, edit and refactor pascal sources. The codetools are a module of their own and are licensed under GPL. There are many examples how to use the codetools in your own programs under components/codetools/examples.

svn:

Using the codetools without the IDE

You can use the codetools without the IDE. This can be used to test a new tool. An easy example is

 <lazarusdir>/components/codetools/examples/methodjumping.lpi

To test find declaration, the codetools need to parse sources. Especially the RTL and FCL sources. The examples use the following environment variables:

  • FPCDIR: path to the FPC sources, the default is ~/freepascal/fpc.
  • PP: path to the compiler executable (/usr/bin/fpc or /usr/bin/ppc386 or C:\lazarus\ppc386.exe). The codetools need to ask the compiler for the settings. The default is to search for 'fpc' via the PATH variable.
  • FPCTARGETOS: tell the codetools to scan for another operating system (cross compiling). For example: linux, freebsd, darwin, win32, win64, wince
  • FPCTARGETCPU: when scanning for another CPU. For example: i386, powerpc, x86_64, arm, sparc
  • LAZARUSDIR: path of the lazarus sources. Only needed if you want to scan them.

FPC is a very complex project with lots of search paths, include files and macros. The codetools need to know all these paths and macros in order to parse this jungle. To setup all this easily the codetools contain predefined templates for FPC, Lazarus, Delphi and Kylix source directories. See for a find declaration example

 <lazarusdir>/components/codetools/examples/finddeclaration.lpi

Because the FPC sources contain multiple versions of some units, and the FPC sources change often, the codetools do not use a fixed path table, but instead scan first the whole FPC directory structure and apply a set of rules, what source is the right for the current TargetOS and TargetCPU. This scan may take a while depending on your disk speed. All examples save the result in codetools.config, so that on next start the scan is skipped.

Whenever the FPC sources have moved or a unit is renamed, just delete the file codetools.config. The Lazarus IDE has its own config file and does the rescan whenever the compiler executable has changed or the user forces a 'Environment > Rescan FPC source directory'.

Using the codetools in the IDE with the IDEIntf

See <lazarusdir>/examples/idequickfix/quickfixexample.lpk package. It demonstrates:

  • How to write an IDE package.
 When You install it will register a Quick Fix item.
  • How to write Quick Fix item for compiler messages 'Parameter "Sender" not used'
  • How to use the codetools to
     * parsing a unit
     * conversion of Filename,Line,Column to codetools source position
     * finding a codetools node at a cursor position
     * finding a procedure node and the begin..end node
     * creating a nice insertion position for a statement at the beginning of
       the begin..end block
     * getting the indentation of a line, so that the new line will
       work in sub procedure as well
     * inserting code with the codetools

Codetools rules for FPC sources

When the codetools searches the source of a fpc ppu it uses a set of rules. You can write your own rules, but normally you will use the standard rules, which are defined in the include file components/codetools/fpcsrcrules.inc. You can test the rules with the command line utility: components/codetools/examples/testfpcsrcunitrules.lpi.

Usage of testfpcsrcunitrules

Usage: lazarus/components/codetools/examples/testfpcsrcunitrules -h

  -c <compiler file name>, --compiler=<compiler file name>
         Default is to use environment variable PP.
         If this is not set, search for fpc

  -T <target OS>, --targetos=<target OS>
         Default is to use environment variable FPCTARGET.
         If this is not set, use the default of the compiler.

  -P <target CPU>, --targetcpu=<target CPU>
         Default is to use environment variable FPCTARGETCPU.
         If this is not set, use the default of the compiler.

  -F <FPC source directory>, --fpcsrcdir=<FPC source directory>
         Default is to use environment variable FPCDIR.
         There is no default.

  -u <unit name>, --checkunit=<unit name>
         Write a detailed report about this unit.

Example for testfpcsrcunitrules

Open the testfpcsrcunitrules.lpi in the IDE and compile it. Then run the utility in a terminal/console:

./testfpcsrcunitrules -F ~/fpc/sources/2.5.1/fpc/

This will tell you what compiler is used, what compiler is executed, what config files were tested and parsed, it warns about duplicate units in the FPC search path and duplicate source files for the same unit.

Duplicate source files

You find out that the codetools opens for target wince/arm the wrong source of the unit mmsystem. Run the tool with the -u parameter:

./testfpcsrcunitrules -F ~/fpc/2.5.1/fpc/ -T wince -P arm -u mmsystem

This will give you a detailed report where this unit was found and what score each source file got. For example:

Unit report for mmsystem
  WARNING: mmsystem is not in PPU search path
GatherUnitsInFPCSources UnitName=mmsystem File=packages/winunits-base/src/mmsystem.pp Score=11
GatherUnitsInFPCSources UnitName=mmsystem File=packages/winceunits/src/mmsystem.pp Score=11 => duplicate

This means there are two source files with the same score, so the codetools took the first. The last one in winceunits is for target wince and the first one is for win32 and win64.

Now open the rules file fpcsrcrules.inc.

Rules work like this: <Delphi> Score:=10; Targets:='wince'; Add('packages/winceunits'); </Delphi>

The Add adds a rule for all files beginning with 'packages/winceunits' that adds a score of 10 to all these files. The Targets is a comma separated list of target operating systems and/or target processors. For example Targets='wince,linux,i386' means: apply this rules to TargetOS wince or linux and to all TargetCPU i386.

How the codetools parses sources, difference to a compiler

A compiler is optimized to parse code linear and load needed units and include files as soon as it parses a uses section or a directive. The codetools are optimized to parse only parts of code. For example jumping from the method declaration to the method body only needs the unit and its include files. When a codetool search a declaration it searches backwards. That means it starts searching in the local variables, then upwards the implementation. When it finds a uses section it searches the identifier in the interface section of the units. When the identifier is found it stops. The result and some middle steps are cached. Because it often only needs to parse some interface sections it finds a single identifier fast.

The codetools do not parse a source in one step like the compiler but in several steps, depending on the need of the current function:

  • First a source file is loaded in a TCodeBuffer. The IDE uses this step to change the encoding to UTF8. The files are kept in memory and only reloaded if the modification date changes or a file is manually reverted. There are several tools and function which work directly on the buffer.
  • The next level is parsing a unit or include file. A unit must be parsed from the beginning, so the codetools try to find the main file, the first file of a unit. It does that by looking for a directive in the first line like {%MainUnit ../lclintf.pp}. If that does not exist, it searches in the includelink cache. The IDE saves this cache to disk, so the IDE learns over time.
  • After finding the main file TLinkScanner parses the source. It handles compiler directives, like include directives and if-else directives. The scanner can be given a range, so it can for instance only parse the interface of a unit. The scanner creates the clean source. The clean source is put together of all include files and stripped off of all skipped code in else parts. It also create a list of links, that maps between the clean source and the real source files. The clean source is now pascal. Note: there are also tools to scan a single source for all directives and create a tree of directives.
  • After creating the clean source a TCodeTool parses it and creates a tree of TCodeTreeNode. It can also be given a range. This parser skips a few parts, for example class members, begin..end blocks and parameter lists. Many tools don't need them. These sub nodes are created on demand. A TCodeTreeNode has a range StartPos..EndPos which are clean positions, that means positions in the clean source. There are only nodes for the important parts. Creating nodes for every detail would need more memory than the source itself and is seldom needed. There are plenty of functions to find out the details. For example if a function has calling convention 'cdecl'.
  • When searching for an identifier the search stores the found base types and creates caches for all identifiers in the interface section.

Every level has its own caches, which need to be checked and updated before calling a function. Many high level functions accessible via the CodeToolBoss do that automatically. For others it is the responsibility of the caller.

Example for:

unti1.pas:

<Delphi> unit Unit1; {$I settings.inc} interface uses

 {$IFDEF Flag}
 unix,
 {$ELSE}
 windows,
 {$ENDIF}
 Classes;

</Delphi>

settings.inc:

<Delphi> {%MainUnit unit1.pas} {$DEFINE Flag} </Delphi>

clean source:

<Delphi> unit Unit1; {$I settings.inc}{%MainUnit unit1.pas} {$DEFINE Flag} interface uses

 {$IFDEF Flag}
 unix,
 {$ELSE}{$ENDIF}
 Classes;

</Delphi>

CleanPos and CursorPos

There are several methods to define a position in the codetools.

Absolute position are related to a source as string and starts at 1. For example a TCodeBuffer holds the file content as one string in the property Source. Caret or cursor positions are given as X,Y, where Y is the line number starting at 1 and X is the column number starting at 1. A TCodeBuffer provides the member functions LineColToPosition' and AbsoluteToLineCol to convert. When working with multiple source files, like a unit, that can consists of several include files, the clean position relates to the absolute position in the stripped code Src. Src which is a string and clean positions start at 1. Cursor positions are specified as TCodeXYPosition (Code,X,Y). A TCodeTool provides the functions CaretToCleanPos, CleanPosToCaret to convert.