Pure functions

From Lazarus wiki
Revision as of 16:20, 8 July 2018 by CuriousKit (talk | contribs) (Test cases)
Jump to navigationJump to search

This page has been set up as a collaboration and design specification for the proposal and support of 'pure' functions, which allows for massive speed gains in situations where constant parameters are passed into such subroutines.

CuriousKit (talk) 16:45, 8 July 2018 (CEST)

Definition

A pure function is a subroutine that is entirely self-contained and contains no side-effects or dependencies on external values (e.g. global variables). As such, a pure function is deterministic and will always return the same value(s) for a given input, analogous to a mathematical function. As such, if a function is called with constant arguments, the result can be pre-computed by the compiler and the entire call replaced with the return value.

Theoretical examples of pure functions

  • Trigonometric sine
  • Returning the larger of two values (i.e. the Max function)
  • The factorial function
  • The bitwise functions or, and, not and xor.
  • Converting an integer into a string (though a programmer could easily replace, say, IntToStr(1) with just '1', another pure function that calls IntToStr with a variable argument might be able to take advantage of IntToStr's purity if data flow analysis determines that the argument is deterministic)
  • A hash function

Examples of impure functions

  • A random number generator (relies on an external seed)
  • GetMem (output is non-deterministic and might raise an exception that is highly dependent on the system state)
  • Reading from a file (has no control over the IO state or the contents of the file in question)

Design Proposals

While it is possible to treat all functions as pure until proven otherwise, this is prohibitively slow for a compiler that might have to deal with many hundreds of functions and thousands of calls in one sitting. Furthermore, most functions will either be impure or contain a non-deterministic parameter (this includes the value of Self in object methods). As such, to take pressure off the compiler, I propose the optional keyword "pure" that can be specified after the function header, as a hint to the compiler that the function should be checked for purity.

Proposed Compiler Warnings and Errors

There will be cases where functions are hinted to be pure but are really not, either due to programmer error or deliberate malice (or stress testing). If analysis of the function determines that the function is impure (which might only occur when it tries to replace a call to that function), then I propose the compiler returning a warning to say that the function is not pure and to remove any flags that say it is pure, so it doesn't attempt any further optimisation on that front. This would definitely be a warning that the programmer should fix because there may be function calls that are successfully optimised prior to the detection of something that makes the function impure (due to a different branch being taken in the function), and marking the function as impure at this point will prevent future optimisations that would otherwise be valid.

List of possible warnings:

  • "Procedure marked as pure accesses external variable"
  • "Procedure marked as pure calls an impure subroutine"
  • "Procedure marked as pure dereferences a pointer"
  • "Procedure is too complex to be pure, or contains an infinite loop" (See Extreme Conditions below)

There is one exception though where I propose an error instead of a warning, and that's in situations where a particular call to a pure function would raise an exception. The reason why I propose an error in these situations is because many of these functions are perfectly valid mathematical constructs and are often useful (e.g. division, which raises an exception if the denominator is zero, or the natural logarithm, which raises an exception if the input is negative), and to disable their purity for something that is likely to be a programmer error would be unfair.

Other scenarios can raise errors where pure would be invalid, such as on object methods, but this can reuse the procedure directive error that appears when virtual and inline are paired, for example.

Possible error message:

  • "Call to pure function would raise exception"

Extreme Conditions

There are cases where a function is pure but which could easily cause the compiler to take a prohibitively long time to evaluate it. Infinite loops are an example of a malicious case, but there are situations where the function will complete, but will take far too long. A constructed example is the Ackermann function, which has a very deep level of recursion. As a result, it is only sensible to include an upper limit in how complex a function can be before the compiler decides it shouldn't be treated as pure. In this case, a warning should be thrown if it reaches a node count or stack depth limit, which will also catch infinite loops.

Development Test Cases

The first test is a simple function with a single conditional check. It is also a function that can easily be calculated by hand for given values a and b to determine compiler correctness.

function Max(a, b: Double): Double; inline; pure;
begin
  if a > b then
    Result := a
  else
    Result := b;
end;

Once Max and similar functions are tested, I propose a more complex function that relies on recursion, hence cannot be inlined, but which is still pure:

function Factorial(n: QWord): QWord; pure;
begin
  if n <= 1 then
    Result := 1
  else
    Result := n * Factorial(n - 1);
end;

As a final test case, I propose the Ackermann Function. This will stress-test the compiler to see how it handles functions with deep recursion but which don't have infinite loops and will eventually finish but maybe in a prohibitively long time.

function Ackermann(m, n: QWord): QWord; pure;
begin
  if (m = 0) then
    Result := n + 1
  else if (n = 0) then
    Result := Ackermann(m - 1, 1)
  else
    Result := Ackermann(m - 1, Ackermann(m, n - 1)); 
end;

See Also