https://wiki.haskell.org/api.php?action=feedcontributions&user=Mfn&feedformat=atomHaskellWiki - User contributions [en]2021-12-05T04:24:00ZUser contributionsMediaWiki 1.27.4https://wiki.haskell.org/index.php?title=Yhc/TMR&diff=12433Yhc/TMR2007-04-05T21:44:47Z<p>Mfn: </p>
<hr />
<div>Authors: Neil Mitchell, Tom Shackell, Dimitry Golubovsky, Matthew Naylor<br />
<br />
This is a draft of the Yhc TMR article, deadline April 13th. It isn't intended as a wiki article beyond the listed authors (although if you want to fix some spelling, we don't mind!). If you are interested in helping email the Yhc list.<br />
<br />
== The beginning ==<br />
<br />
In the beginning there was the nhc compiler, which had a number of issues. We fixed some of them.<br />
<br />
Author: Tom/Neil/Andrew<br />
<br />
How we started up Yhc, this is the section that would have been in the History of Haskell paper if they had done a Yhc section :)<br />
<br />
Include the transition from CVS -> york Darcs -> haskell.org Darcs<br />
<br />
== Portability concerns ==<br />
<br />
From the beginning portability was a prime concern, while the original nhc was only running on Linux v old.old, and never Windows, Yhc was fully portable by design.<br />
<br />
Author: Tom, Andrew<br />
<br />
Why portability is such a concern, details of our ports system. Include our scons architecture, buildbot system etc. Mention that Yhc runs under Hugs, and indeed some of the developers use Hugs.<br />
<br />
<br />
<br />
== Yhc.Core ==<br />
<br />
Yhc.Core is one of our most successful libraries to date. The original nhc compiler used an intermediate core language called PosLambda – a basic lambda calculus extended with positional information. Although the language was very similar to a subset of Haskell, it wasn’t the same. In particular there were unusual constructs such as FatBar, and all names were stored in a symbol table.<br />
<br />
Rather than attempt to change the PosLambda language, a task that would have been decidedly painful, we chose instead to write a Core language from scratch, being inspired by both PosLambda and GHC Core. We then wrote a converter from PosLambda to Yhc.Core.<br />
<br />
In particular our idealised Core language differs from GHC Core in a number of ways:<br />
<br />
* Untyped – originally this was a restriction of PosLambda, but now some of us see this as a feature (others still see this as a bug).<br />
* Syntactically a subset of Haskell.<br />
* Minimal name mangling.<br />
<br />
All these features combine to create a Core language which resembles Haskell much more than Core languages in other Haskell compilers. This very low barrier to entry means that with merely the Haddock description of our abstract syntax tree, most Haskell programmers can feel at home with relatively little effort.<br />
<br />
Having reduced the barrier to entry for our Core language, the number of projects depending on it has increased. Consistently we have attempted to add facilities to the libraries for common tasks, rather than duplicating them separately in projects. As a result the Core library now has facilities for dealing with primitives, removing recursive lets, reachability analysis, strictness analysis, simplification, inlining and more.<br />
<br />
One of the first features we added to Core was whole program linking – any Haskell program, regardless of the number of modules, can be collapsed into one single Yhc.Core module. While this breaks separate compilation, if allows many types of analysis and transformation to be performed in a simpler manner. Naturally, if a technique turns out to be successful breaking the dependence on whole program compilation is a worthy goal – but this approach allows developers to pay that cost only when it is needed.<br />
<br />
=== A Small Example ===<br />
<br />
To give a flavour of what Core looks like, it is easiest to start with a small program:<br />
<br />
<haskell><br />
head2 (x:xs) = x<br />
<br />
map2 f [] = []<br />
map2 f (x:xs) = f x : map2 f xs<br />
<br />
test x = map2 head2 x<br />
</haskell><br />
<br />
Compiling this with <tt>yhc -showcore Sample.hs</tt> generates:<br />
<br />
<haskell><br />
Sample.head2 v220 =<br />
case v220 of<br />
(:) v221 v222 -> v221<br />
_ -> Prelude.error Sample._LAMBDA228<br />
<br />
Sample._LAMBDA228 =<br />
"Sample: Pattern match failure in function at 9:1-9:15."<br />
<br />
Sample.map2 v223 v224 =<br />
case v224 of<br />
[] -> []<br />
(:) v225 v226 -> (:) (v223 v225) (Sample.map2 v223 v226)<br />
<br />
Sample.test v227 = Sample.map2 Sample.head2 v227<br />
</haskell><br />
<br />
The generated Core looks very much like Haskell, and can be treated as such. The generated Core is the simplest form of Haskell, with many restrictions:<br />
<br />
* Case statements only examine their outermost constructor<br />
* No type classes<br />
* No where<br />
* No nested functions (only top level)<br />
* All names are fully qualified<br />
<br />
=== Yhc.Core.Overlay ===<br />
<br />
Describe the overlay here<br />
<br />
=== Semantics of Yhc Core ===<br />
<br />
In this section an evaluator for Yhc Core programs is presented in the<br />
form of a literate Haskell program. The aim is to define the meaning<br />
of Core programs while demonstrating a full, albeit simple,<br />
application of the <tt>Yhc.Core</tt> library.<br />
<br />
<haskell><br />
> module Eval where<br />
<br />
> import Yhc.Core<br />
</haskell><br />
<br />
Our evaluator is based around the function <tt>whnf</tt> that takes a<br />
core program (of type <tt>Core</tt>) along with a core expression (of<br />
type <tt>CoreExpr</tt>) and reduces that expression until it has the<br />
form:<br />
<br />
<ul><br />
<br />
<li>a data constructor with unevaluated arguments, or</li><br />
<br />
<li>an unapplied lambda expression.</li><br />
<br />
</ul><br />
<br />
In general, data values in Haskell are tree-shaped. The function<br />
<tt>whnf</tt> is often said to "reduce an expression to head normal<br />
form" because it reveals the head (or root) of a value's tree and no<br />
more. Stricly speaking, when the result of reduction could be a<br />
functional value (i.e. a lambda expression), and the body of that<br />
lambda is left unevaluated, then the result is said to be in "weak<br />
head normal form" -- this explains the strange acronym WHNF!<br />
<br />
The type of <tt>whnf</tt> is:<br />
<br />
<haskell><br />
> whnf :: Core -> CoreExpr -> CoreExpr<br />
</haskell><br />
<br />
Defining it is a process of taking each kind of core expression in<br />
turn, and asking "how do I reduce this to weak head normal form?" As<br />
usual, it makes sense to define the base cases first, namely<br />
constructors and lambda expressions:<br />
<br />
<haskell><br />
> whnf p (CoreCon c) = CoreCon c<br />
> whnf p (CoreApp (CoreCon c) as) = CoreApp (CoreCon c) as<br />
> whnf p (CoreLam (v:vs) e) = CoreLam (v:vs) e<br />
</haskell><br />
<br />
Notice that a constructor may take one of two forms:<br />
standalone with no arguments, or as function application to a list of<br />
arguments. Also, because of the way our evaluator is designed, we may<br />
encounter lambda expressions with no arguments. Hence, only lambdas<br />
with arguments represent a base-case. For the no-arguments case, we<br />
just shift the focus of reduction to the body:<br />
<br />
<haskell><br />
> whnf p (CoreLam [] e) = whnf p e<br />
</haskell><br />
<br />
Currently, lambda expressions do not occur in the Core output of Yhc.<br />
They are part of the Core syntax because they are useful conceptually,<br />
particularly when maniplating (and evaluating) higher-order functions.<br />
<br />
Moving on to case-expressions, we first reduce the case subject, then<br />
match it against each pattern in turn, and finally reduce the body of<br />
the chosen alternative. In Core, we can safely assume that patterns<br />
are at most one constructor deep, so reduction of the subject to WHNF<br />
is sufficient.<br />
<br />
<haskell><br />
> whnf p (CoreCase e as) = whnf p (match (whnf p e) as)<br />
</haskell><br />
<br />
We leave the definition of <tt>match</tt> until later.<br />
<br />
To reduce a let-expression, we apply the let-bindings as a<br />
substitution to the body of the let. This is easily done using the<br />
Core function <tt>replaceFreeVars</tt>. Of course, let-expressions in<br />
Core are recursive, but before evaluating a core program we transform<br />
all recursive-lets away (see below). Notice that we are in no way<br />
trying to preseve the sharing implied by let-expressions, although we<br />
have done so in more complex variants of the evaluator.<br />
Strictly-speaking, Haskell evaluators are not obliged to implement<br />
sharing -- this is why it is more correct to term Haskell non-strict<br />
than lazy.<br />
<br />
<haskell><br />
> whnf p (CoreLet bs e) = whnf p (replaceFreeVars bs e)<br />
</haskell><br />
<br />
When we ecounter an unapplied function we call <tt>coreFunc</tt> to<br />
lookup its definition (i.e. its argments and its right-hand-side), and<br />
construct a corresponding lambda expression:<br />
<br />
<haskell><br />
> whnf p (CoreFun f) = whnf p (CoreLam bs body)<br />
> where<br />
> CoreFunc _ bs body = coreFunc p f<br />
</haskell><br />
<br />
This means that when reducing function applications, we know that<br />
reduction of the function part will yield a lambda:<br />
<br />
<haskell><br />
> whnf p (CoreApp f []) = whnf p f<br />
> whnf p (CoreApp f (a:as)) =<br />
> case whnf p f of<br />
> CoreLam [] e -> whnf p (CoreApp e (a:as))<br />
> CoreLam (b:bs) e -> whnf p (CoreLet [(b,a)]<br />
> (CoreApp (CoreLam bs e) as))<br />
</haskell><br />
<br />
Core programs may contain positional information which we just ignore:<br />
<br />
<haskell><br />
> whnf p (CorePos _ e) = whnf p e<br />
</haskell><br />
<br />
And the final, fall-through case covers primitive literals and<br />
functions which we are not concerned with here:<br />
<br />
<haskell><br />
> whnf p e = e<br />
</haskell><br />
<br />
Now, for the sake of completeness, we return to our <tt>match</tt><br />
function. It takes the evaluated case subject and tries to match it<br />
against each case-alternative (a pattern-expression pair) in order of<br />
appearance. We use the "failure as a list of successes" technique to<br />
model the fact that matching may fail.<br />
<br />
<haskell><br />
> type Alt = (CoreExpr, CoreExpr)<br />
<br />
> match :: CoreExpr -> [Alt] -> CoreExpr<br />
> match e as = head (concatMap (try e) as)<br />
</haskell><br />
<br />
Before defining <tt>try</tt>, it is useful to have a function that<br />
turns the two possible constuctor forms into a single, normal form.<br />
This greatly reduces the number of cases we need to consider in the<br />
definition of <tt>try</tt>.<br />
<br />
<haskell><br />
> norm :: CoreExpr -> CoreExpr<br />
> norm (CoreCon c) = CoreApp (CoreCon c) []<br />
> norm x = x<br />
</haskell><br />
<br />
Hopefully by now the definition of <tt>try</tt> will be<br />
self-explanatory:<br />
<br />
<haskell><br />
> try :: CoreExpr -> Alt -> [CoreExpr]<br />
> try e (pat, rhs) =<br />
> case (norm pat, norm e) of<br />
> (CoreApp (CoreCon f) as, CoreApp (CoreCon g) bs)<br />
> | f == g -> [CoreLet (zip (vars as) bs) rhs]<br />
> (CoreVar v, e) -> [CoreLet [(v, e)] rhs]<br />
> _ -> []<br />
> where<br />
> vars = map fromCoreVar<br />
</haskell><br />
<br />
This completes the definition of <tt>whnf</tt>. However, we would<br />
like to be able to fully evaluate expressions -- to what we simply<br />
call "normal form" -- so that the resulting value's tree is computed<br />
in its entirety. Our <tt>nf</tt> function repeatedly applies<br />
<tt>whnf</tt> at progressively deeper nodes in the growing tree:<br />
<br />
<haskell><br />
> nf :: Core -> CoreExpr -> CoreExpr<br />
> nf p e =<br />
> case whnf p e of<br />
> CoreCon c -> CoreCon c<br />
> CoreApp (CoreCon c) es -> CoreApp (CoreCon c) (map (nf p) es)<br />
> e -> e<br />
</haskell><br />
<br />
All that remains is to turn our evaluator into a program by giving it<br />
a sensible <tt>main</tt> function. We first load the core file using<br />
<tt>loadCore</tt> and then apply <tt>removeRecursiveLet</tt>, as<br />
discussed ealier, before evaluating the expression <tt>CoreFun<br />
"main"</tt> to normal form and printing it.<br />
<br />
<haskell><br />
> module Main where<br />
<br />
> import System<br />
> import Monad<br />
<br />
> main :: IO ()<br />
> main = liftM head getArgs<br />
> >>= liftM removeRecursiveLet . loadCore<br />
> >>= print . flip nf (CoreFun "main")<br />
</haskell><br />
<br />
In future we hope to use a variant of this evaluator (with sharing) in<br />
a property-based testing framework. This will let us check that<br />
various program analyses and transformations that we have developed<br />
are semantics-preserving. As part of another project, we have<br />
sucessfully extended the evaluator to support various functional-logic<br />
evaluation strategies, including residuation and narrowing.<br />
<br />
<br />
== Javascript backend ==<br />
<br />
The Javascript backend is a unique feature of Yhc, something which our light weight approach makes easier.<br />
<br />
Author: Dimitry<br />
<br />
The idea to write a converter from Haskell to Javascript has been floating around for a while [[http://www.haskell.org/haskellwiki/Hajax Hajax]]. Many people expressed interest in such feature, but no practical implementation was visible.<br />
<br />
Initial goals of this subproject were:<br />
<br />
* To develop a conversion program that converts the [http://haskell.org/haskellwiki/Yhc/API/Core Yhc Core] to Javascript, thus making it possible to execute arbitrary Haskell code within a web browser<br />
<br />
* To develop an unsafe interface layer for quick access to Javascript objects with ability to wrap arbitrary Javascript code into a Haskell-callable function<br />
<br />
* To develop a typesafe interface layer on top of the unsafe interface layer for access to the Document Object Model (DOM) available to Javascript executed in a web browser<br />
<br />
* To develop or adopt an existing GUI library or toolkit working on top of the typesafe DOM layer for actual development of clientside Web applications.<br />
<br />
=== General concepts ===<br />
<br />
The Javascript backend converts a linked and optimized Yhc Core file into a piece of Javascript code to be embedded in a XHTML document. The Javascript code generator attempts to translate Core expressions to Javascript expressions one-to-one with minor optimizations on its own, taking advantage of Javascript capability to pass functions around as values.<br />
<br />
Three kinds of functions are present in the Javascript backend:<br />
<br />
* Unsafe functions that embed pieces of Javascript directly into the generated code: these functions pay no respect to types of arguments passed, and may force evaluation of their arguments if needed.<br />
<br />
* Typesafe wrappers that provide type signatures for unsafe functions. Such wrappers are either handwritten, or automatically generated from external interface specifications (such as the Document Object Model interface)<br />
<br />
* Regular library functions. These either come unmodified from the standard packages that come with Yhc, or are substituted by the Javascript backend using the Core overlay technique. An example of such a function is the <hask>toUpper</hask> function which is hooked up to the Javascript implementation supporting Unicode (the original library function currently works correctly only for the Latin1 range of characters).<br />
<br />
==== Unsafe interfaces ====<br />
<br />
The core part of unsafe interface to Javascript (or, in other words, Javascript FFI) is a pseudo-function <hask>unsafeJS</hask>.<br />
<br />
The function has a type signature: <br />
<br />
<hask><br />
foreign import primitive unsafeJS :: String -> a<br />
</hask><br />
<br />
Which means that it takes a string. Type of the return value does not matter: the function itself is never executed. Its applications are detected by the Yhc Core to Javascript conversion program at the time of Javascript generation. <br />
<br />
The unsafeJS function should be called with a string literal. Neither explicitly coded (with (:)) list of characters nor concatenation of two or more strings will work. The converter will report an error in this situation. <br />
<br />
A valid example of using unsafeJS is shown below: <br />
<br />
<haskell><br />
global_YHC'_Primitive'_primIntSignum :: Int -> Int<br />
<br />
global_YHC'_Primitive'_primIntSignum a = unsafeJS<br />
"var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;"<br />
</haskell><br />
<br />
This is a Javascript overlay (in the sense that it overlays the default Prelude definition of the <hask>signum</hask> function) of a function that returns sign of an <hask>Int</hask> value. <br />
<br />
The string literal <hask>unsafeJS</hask> is applied to is the Javascript code to be wrapped. <br />
<br />
Below is the Javascript representation of this function found in generated code. <br />
<br />
<pre><br />
strIdx["F_hy"] = "YHC.Primitive.primIntSignum";<br />
...<br />
var F_hy=new HSFun("F_hy", 1, function(a){<br />
var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;});<br />
</pre><br />
<br />
==== Typesafe wrappers ====<br />
<br />
These functions add type safety on top of unsafe interface to Javascript. Sometimes they are defined within the same module as unsafe interfaces themselves, thus avoiding the exposure of unsafe interfaces to programmers.<br />
<br />
An example of a handwritten wrapper is a function to create a new <hask>JSRef</hask> (a mechanism similar to <hask>IORef</hask>, but specific to Javascript).<br />
<br />
<haskell><br />
data JSRef a<br />
<br />
newJSRef :: a -> CPS b (JSRef a)<br />
<br />
newJSRef a = toCPE (newJSRef' a)<br />
newJSRef' a = unsafeJS "return {_val:a};"<br />
</haskell><br />
<br />
Technically, a <hask>JSRef</hask> is a Javascript object with a property named ''_val'' that holds a persistent reference to some value. On the unsafe side, invoking a constructor for such an object would be sufficient. It is however desired that:<br />
<br />
* calls to functions creating such persistent references are properly sequenced with calls to funcitons using these references, and<br />
<br />
* type of values referred to were known to the Haskell compiler.<br />
<br />
The unsafe part is implemented by the function <hask>newJSRef'</hask> which merely calls <hask>unsafeJS</hask> with proper Javascript constructor. The wrapper part <hask>newJSRef</hask> wraps the unsafe function into a CPS-style function, and is given a proper type signature, so the compiler is better informed.<br />
<br />
In some cases, such typesafe wrappers may be generated automatically, using some external interface specifications provided by third parties for their APIs.<br />
<br />
As an example of such API, the W3C DOM interface may be taken. For instance, this piece of OMG IDL:<br />
<br />
<pre><br />
interface Text : CharacterData {<br />
Text splitText(in unsigned long offset)<br />
raises(DOMException);<br />
};<br />
</pre><br />
<br />
is converted into:<br />
<br />
<haskell><br />
data TText = TText<br />
<br />
...<br />
<br />
instance CText TText<br />
<br />
instance CCharacterData TText<br />
<br />
instance CNode TText<br />
<br />
...<br />
<br />
splitText :: (CText this, CText zz) => this -> Int -> CPS c zz<br />
splitText a b = toCPE (splitText' a b)<br />
splitText' a b<br />
= unsafeJS "return((exprEval(a)).splitText(exprEval(b)));"<br />
</haskell><br />
<br />
again, giving the Haskell compiler better control over types of this function's (initially type-agnostic) arguments.<br />
<br />
=== Usage of Continuation Passing Style ===<br />
<br />
Initially it was attempted to build a monadic framework. The <hask>JS</hask> monad was designed to play the same role as the <hask>IO</hask> monad plays in "regular" Haskell programming. There were however arguments in favor of using [http://haskell.org/haskellwiki/Continuation Continuation Passing Style] (CPS):<br />
<br />
* CPS involves less overhead as each expression passes its continustion iself instead of <hask>bind</hask> which takes the expression and invokes the continuation<br />
<br />
* CPS results in Javascript patterns that are easy to detect and optimize for (this is one of the future plans).<br />
<br />
* The [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets] GUI library internals are written in CPS, so taking CPS as general approach to programming is believed to make adoption of Fudgets easier.<br />
<br />
=== Integration with DOM ===<br />
<br />
The [http://www.w3.org Web Consortium] provides [http://www.omg.org/gettingstarted/omg_idl.htm OMG IDL] files to describe the API to use with the [http://www.w3.org/DOM/ Document Object Model] (DOM). An utility was designed, based on [http://www.haskell.org/hdirect/ HaskellDirect], to parse these files and convert them to set of Haskell modules. The way interface inheritance is reflected differs from the original HaskellDirect way: in HaskellDirect this was achieved by declaration of "nested" algebraic data types, while the Javascript backend utility takes advantage of Haskell typeclasses, representing DOM types with phantom types, and declaring them instances of appropriate class(es).<br />
<br />
=== Unicode support ===<br />
<br />
Despite the fact that all modern Web browsers support Unicode, this is not the case with Javascript: no access to Unicode characters' properties is provided. In the same time it is impossible for a Haskell application running in a browser not to have access to such information. The approach used is the same as used in [http://www.haskell.org/hugs Hugs] and [http://www.haskell.org/ghc GHC]: the Unicode characters database file from [http://www.unicode.org Unicode Consortium] was converted into a set of Javascript arrays, each array entry representing a range of character code values, or a case conversion rule for a range (for this implementation, Unicode support was limited to character category, and simple case conversions). First, a range is found by character code using binary search; then character category and case conversion distances (values to add to character code to convert between upper and lower cases) are retrieved from the range entry. The whole set of arrays adds about 70 kilobytes to the web page size, if embedded inside a &lt;script&gt; tag.<br />
<br />
Using the Core overlay technique, Haskell character functions (like <hask>toUpper</hask>, <hask>isAlpha</hask>, etc.) were hooked up to the Javascript implementations supporting Unicode. This did not result in considerable slowdowns, rather, some browsers even showed minor speedup in heavy functions like <hask>read::String -> Int</hask>.<br />
<br />
=== Examples of code generation ===<br />
<br />
The examples below show some real-life functions' conversion from Haskell via Core to Javascript. It would be good to mention that the Javascript code generator changes over time as the Javascript backend evolves, so these examples really describe the situation as of the moment this article is being written.<br />
<br />
This function:<br />
<br />
<haskell><br />
fromRoman = foldr fromNumeral 0 . maxmunch . map toUpper<br />
</haskell><br />
<br />
converted to Yhc Core:<br />
<br />
<haskell><br />
Roman.fromRoman =<br />
Prelude._LAMBDA27191<br />
(Prelude._LAMBDA27191<br />
(Prelude.map Data.Char.toUpper)<br />
(Roman.maxmunch Prelude.Prelude.Num.Prelude.Int))<br />
(Prelude.foldr<br />
(Roman.fromNumeral<br />
Prelude.Prelude.Num.Prelude.Int<br />
Prelude.Prelude.Ord.Prelude.Int)<br />
0)<br />
<br />
-- The LAMBDA is similar to the composition function (.), only with<br />
-- inverted order of application: _LAMBDA27191 f g x = g (f x)<br />
<br />
Prelude._LAMBDA27191 v22167 v22166 v2007 = v22166 (v22167 v2007)<br />
</haskell><br />
<br />
converted to Javascript:<br />
<br />
<pre><br />
<br />
/* fromRoman, code was formatted manually */<br />
<br />
var F_g8=new HSFun("F_g8", 0, function(){<br />
return (F_e9)._ap([(F_e9)._ap([new HSFun("F_gz", 0, <br />
function(){<br />
return (F_gz)._ap([F_Z]);<br />
}), new HSFun("F_g9", 0, <br />
function(){<br />
return (F_g9)._ap([F_dC]);<br />
})]), new HSFun("F_gp", 0, <br />
function(){<br />
return (F_gp)._ap([new HSFun("F_g7", 0, <br />
function(){<br />
return (F_g7)._ap([F_dC, F_d1]);<br />
}), 0]);<br />
})]);<br />
});<br />
<br />
/* _LAMBDA27191 */<br />
<br />
var F_e9=new HSFun("F_e9", 3, function(_b3, _b2, _bO){return (_b2)._ap([(_b3)._ap([_bO])]);});<br />
</pre><br />
<br />
During the conversion to Javascript, all identifiers found in Yhc Core are renamed to much shorter ones consisting only of alphanumeric characters and thus surely valid for Javascript (identifiers in Yhc Core often are very long, or contain special characters, etc.)<br />
<br />
While it is really hard to understand anything from the Javascript for the <hask>fromRoman</hask> function (other than the Javascript backend already makes a good obfuscator), something may be seen in the Javascript for the composition function. It builds an application of its first argument to the third, and then the application of the second to the previous application, and returns the latter.<br />
<br />
Another example of a function whose implementation was replaced via the Overlay technique is the <hask>isSpace</hask> function:<br />
<br />
<haskell><br />
global_Data'_Char'_isSpace = f . ord<br />
where f a = unsafeJS "return uIsSpace(exprEval(a));"<br />
</haskell><br />
<br />
<haskell><br />
Data.Char.isSpace =<br />
Prelude._LAMBDA27191<br />
Data._CharNumeric.ord<br />
StdOverlay.StdOverlay.Prelude.287.f<br />
</haskell><br />
<br />
<pre><br />
var F_W=new HSFun("F_W", 0, function(){return (F_e9)._ap([F_bh, F_hk]);});<br />
</pre><br />
<br />
In the Haskell code, the <hask>global_Data'_Char'_isSpace</hask> identifier tells the Core Overlay engine that the function with qualified name <hask>Data.Char.isSpace</hask> is to be replaced with a new implementation. In Yhc Core, the previously reviewed reversed composition function can be seen which composes the <hask>ord</hask> function, and an inner function that actually invokes the Javascript function which in turn performs the Unicode properties lookup for a given character numeric code.<br />
<br />
=== Browsers compatibility ===<br />
<br />
Compatibility with major browsers such as Netscape/Mozilla/Firefox and Microsoft Internet Explorer, and also Opera was observed. Compatibility with Safari has not been reached so far.<br />
<br />
=== Future plan: Fudgets ===<br />
<br />
It is planned to port some portion of [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets], so it becomes possible to write Web applications using this library. Several experiments showed that the Stream Processors (SP), and some parts of Fudget Kernel layers worked within a Javascript application. More problems are expected with porting the toplevel widgets due to differences in many concepts between Web browser and X Window, for which the Fudgets library was originally developed.<br />
<br />
== Wacky features ==<br />
<br />
Yhc is going in many interesting directions. Some of these directions are likely to become very important in the future, some are likely to fade away. Yhc is a genuine research bed for brand new ideas.<br />
<br />
Author: All<br />
<br />
When you don't spend all the time on wacky type systems, you get a lot more time left to work on Wacky other stuff. Include Java interpetter, .NET back end, Javascript back end, Python interpretter, Hat debugging, yhi-stack, whole program optimisation. Lots of these things are breeding grounds for various useful technologies, and most are marching towards genuine usefulness.<br />
<br />
== The Future ==<br />
<br />
The original structure of nhc was as one big set of modules – some were broadly divided into type checking/parsing etc, but the overall structure and grouping was weaker than in other compilers – particularly GHC which has a very well defined structure. One of our first actions was to split up the code into hierarchical modules, introducing Type.*, Parse.* etc to better divide the components.<br />
<br />
Some of the sections are shared – for example both hpc and Hat share the nhc Parse modules, and could equally share the Yhc parse modules. This direction is obviously attractive – by opening up more parts of the compiler for reuse in other projects we gain in many ways:<br />
<br />
* The authors of other project benefit from having libraries to build upon, not needlessly replicating the functionality already present in Yhc.<br />
* We benefit from having more people focused and dependent on our code, resulting in a bigger community.<br />
* We get additional bug reports.<br />
* By structuring code as a library, “hacks” that before might have been tolerated quickly take on a whole new aura of ugliness.<br />
* By isolating each component, we minimize the amount of whole compiler knowledge required.<br />
<br />
This direction attracts us, and we see this as the future direction of our compiler.<br />
<br />
Interestingly, almost at the same time the GHC developers have introduced a GHC API. We do not wish to replicate the GHC API, in particular their API is more complex than ours. We also wish to design an API, and then construct some future version of Yhc around the API we develop – rather than exporting an API.<br />
<br />
For the next major Yhc version we would hope to rewrite the parser, type checker, renamer and desugarer – leaving only the new ByteCode and Core aspects still intact. This is clearly a lot of work, and we view this as a very long term plan. For the moment we hope to push the Core and ByteCode features of Yhc into new areas.<br />
<br />
In the meantime things like libary compatability and Haskell' loom more closely on horizon. Our progress towards these goals relies on the help of volunteers.<br />
<br />
== Acknowledgements ==<br />
<br />
Thanks to everyone who has submitted a patch, become a buildbot, reported bugs or done anything else to benefit the Yhc project. We've put together a list of most of the people (if we've missed you, please shout, and we'll add your name in future versions of this document!)<br />
<br />
Andrew Wilkinson, Bernie Pope, Bob Davie, Brian Alliet, Christopher Lane Hinson, Dimitry Golubovsky, Gabor Greif, Goetz Isenmann, Isaac Dupree, Kartik Vaddadi, Krasimir Angelov, Malcolm Wallace, Michal Palka, Mike Dodds, Neil Mitchell, Robert Dockins, Samuel Bronson, Stefan O'Rear, Thorkil Naur, Tom Shackell</div>Mfnhttps://wiki.haskell.org/index.php?title=Yhc/TMR&diff=12432Yhc/TMR2007-04-05T21:20:53Z<p>Mfn: </p>
<hr />
<div>Authors: Neil Mitchell, Tom Shackell, Dimitry Golubovsky, Matthew Naylor<br />
<br />
This is a draft of the Yhc TMR article, deadline April 13th. It isn't intended as a wiki article beyond the listed authors (although if you want to fix some spelling, we don't mind!). If you are interested in helping email the Yhc list.<br />
<br />
== The beginning ==<br />
<br />
In the beginning there was the nhc compiler, which had a number of issues. We fixed some of them.<br />
<br />
Author: Tom/Neil/Andrew<br />
<br />
How we started up Yhc, this is the section that would have been in the History of Haskell paper if they had done a Yhc section :)<br />
<br />
Include the transition from CVS -> york Darcs -> haskell.org Darcs<br />
<br />
== Portability concerns ==<br />
<br />
From the beginning portability was a prime concern, while the original nhc was only running on Linux v old.old, and never Windows, Yhc was fully portable by design.<br />
<br />
Author: Tom, Andrew<br />
<br />
Why portability is such a concern, details of our ports system. Include our scons architecture, buildbot system etc. Mention that Yhc runs under Hugs, and indeed some of the developers use Hugs.<br />
<br />
<br />
<br />
== Yhc.Core ==<br />
<br />
Yhc.Core is one of our most successful libraries to date. The original nhc compiler used an intermediate core language called PosLambda – a basic lambda calculus extended with positional information. Although the language was very similar to a subset of Haskell, it wasn’t the same. In particular there were unusual constructs such as FatBar, and all names were stored in a symbol table.<br />
<br />
Rather than attempt to change the PosLambda language, a task that would have been decidedly painful, we chose instead to write a Core language from scratch, being inspired by both PosLambda and GHC Core. We then wrote a converter from PosLambda to Yhc.Core.<br />
<br />
In particular our idealised Core language differs from GHC Core in a number of ways:<br />
<br />
* Untyped – originally this was a restriction of PosLambda, but now some of us see this as a feature (others still see this as a bug).<br />
* Syntactically a subset of Haskell.<br />
* Minimal name mangling.<br />
<br />
All these features combine to create a Core language which resembles Haskell much more than Core languages in other Haskell compilers. This very low barrier to entry means that with merely the Haddock description of our abstract syntax tree, most Haskell programmers can feel at home with relatively little effort.<br />
<br />
Having reduced the barrier to entry for our Core language, the number of projects depending on it has increased. Consistently we have attempted to add facilities to the libraries for common tasks, rather than duplicating them separately in projects. As a result the Core library now has facilities for dealing with primitives, removing recursive lets, reachability analysis, strictness analysis, simplification, inlining and more.<br />
<br />
One of the first features we added to Core was whole program linking – any Haskell program, regardless of the number of modules, can be collapsed into one single Yhc.Core module. While this breaks separate compilation, if allows many types of analysis and transformation to be performed in a simpler manner. Naturally, if a technique turns out to be successful breaking the dependence on whole program compilation is a worthy goal – but this approach allows developers to pay that cost only when it is needed.<br />
<br />
=== A Small Example ===<br />
<br />
To give a flavour of what Core looks like, it is easiest to start with a small program:<br />
<br />
<haskell><br />
head2 (x:xs) = x<br />
<br />
map2 f [] = []<br />
map2 f (x:xs) = f x : map2 f xs<br />
<br />
test x = map2 head2 x<br />
</haskell><br />
<br />
Compiling this with <tt>yhc -showcore Sample.hs</tt> generates:<br />
<br />
<haskell><br />
Sample.head2 v220 =<br />
case v220 of<br />
(:) v221 v222 -> v221<br />
_ -> Prelude.error Sample._LAMBDA228<br />
<br />
Sample._LAMBDA228 =<br />
"Sample: Pattern match failure in function at 9:1-9:15."<br />
<br />
Sample.map2 v223 v224 =<br />
case v224 of<br />
[] -> []<br />
(:) v225 v226 -> (:) (v223 v225) (Sample.map2 v223 v226)<br />
<br />
Sample.test v227 = Sample.map2 Sample.head2 v227<br />
</haskell><br />
<br />
The generated Core looks very much like Haskell, and can be treated as such. The generated Core is the simplest form of Haskell, with many restrictions:<br />
<br />
* Case statements only examine their outermost constructor<br />
* No type classes<br />
* No where<br />
* No nested functions (only top level)<br />
* All names are fully qualified<br />
<br />
=== Yhc.Core.Overlay ===<br />
<br />
Describe the overlay here<br />
<br />
=== Semantics of Yhc Core ===<br />
<br />
In this section an evaluator for Yhc Core programs is presented in the<br />
form of a literate Haskell program. The aim is to define the meaning<br />
of Core programs while demonstrating a full, albeit simple,<br />
application of the <tt>Yhc.Core</tt> library.<br />
<br />
<haskell><br />
> module Eval where<br />
<br />
> import Yhc.Core<br />
</haskell><br />
<br />
Our evaluator is based around the function <tt>whnf</tt> that takes a<br />
core program (of type <tt>Core</tt>) along with a core expression (of<br />
type <tt>CoreExpr</tt>) and reduces that expression until it has the<br />
form:<br />
<br />
<ul><br />
<br />
<li>a data constructor with unevaluated arguments, or</li><br />
<br />
<li>an unapplied lambda expression.</li><br />
<br />
</ul><br />
<br />
In general, data values in Haskell are tree-shaped. The function<br />
<tt>whnf</tt> is often said to "reduce an expression to head normal<br />
form" because it reveals the head (or root) of a value's tree and no<br />
more. Stricly speaking, when the result of reduction could be a<br />
functional value (i.e. a lambda expression), and the body of that<br />
lambda is left unevaluated, then the result is said to be in "weak<br />
head normal form" -- this explains the strange acronym WHNF!<br />
<br />
The type of <tt>whnf</tt> is:<br />
<br />
<haskell><br />
> whnf :: Core -> CoreExpr -> CoreExpr<br />
</haskell><br />
<br />
Defining it is a process of taking each kind of core expression in<br />
turn, and asking "how do I reduce this to weak head normal form?" As<br />
usual, it makes sense to define the base cases first, namely<br />
constructors and lambda expressions:<br />
<br />
<haskell><br />
> whnf p (CoreCon c) = CoreCon c<br />
> whnf p (CoreApp (CoreCon c) as) = CoreApp (CoreCon c) as<br />
> whnf p (CoreLam (v:vs) e) = CoreLam (v:vs) e<br />
</haskell><br />
<br />
Notice that there are two forms in which a constructor may appear:<br />
alone with no arguments, or as function application to a list of<br />
arguments. Also, because of the way our evaluator is designed, we may<br />
encounter lambda expressions with no arguments. Hence, only lambdas<br />
with arguments represent a base-case. For the no-arguments case, we<br />
just shift the focus of reduction to the body:<br />
<br />
<haskell><br />
> whnf p (CoreLam [] e) = whnf p e<br />
</haskell><br />
<br />
Currently, lambda expressions do not occur in the Core output of Yhc.<br />
They are part of the Core syntax because they are useful conceptually,<br />
particularly when maniplating (and evaluating) higher-order functions.<br />
<br />
Moving on to case-expressions, we first reduce the case subject, then<br />
match it against each pattern in turn, and finally reduce the body of<br />
the chosen alternative. In Core, we can safely assume that patterns<br />
are at most one constructor deep, so reduction of the subject to WHNF<br />
is sufficient.<br />
<br />
<haskell><br />
> whnf p (CoreCase e as) = whnf p (match (whnf p e) as)<br />
</haskell><br />
<br />
We leave the definition of <tt>match</tt> until later.<br />
<br />
To reduce a let-expression, we apply the let-bindings as a<br />
substitution to the body of the let. This is easily done using the<br />
Core function <tt>replaceFreeVars</tt>. Of course, let-expressions in<br />
Core are recursive, but before evaluating a core program we transform<br />
all recursive-lets away (see below). Notice that we are in no way<br />
trying to preseve the sharing implied by let-expressions, although we<br />
have done so in more complex variants of the evaluator.<br />
Strictly-speaking, Haskell evaluators are not obliged to implement<br />
sharing -- this is why it is more correct to term Haskell non-strict<br />
than lazy.<br />
<br />
<haskell><br />
> whnf p (CoreLet bs e) = whnf p (replaceFreeVars bs e)<br />
</haskell><br />
<br />
When we ecounter an unapplied function we call <tt>coreFunc</tt> to<br />
lookup its definition (i.e. its argments and its right-hand-side), and<br />
construct a corresponding lambda expression:<br />
<br />
<haskell><br />
> whnf p (CoreFun f) = whnf p (CoreLam bs body)<br />
> where<br />
> CoreFunc _ bs body = coreFunc p f<br />
</haskell><br />
<br />
This means that when reducing function applications, we know that<br />
reduction of the function part will yield a lambda:<br />
<br />
<haskell><br />
> whnf p (CoreApp f []) = whnf p f<br />
> whnf p (CoreApp f (a:as)) =<br />
> case whnf p f of<br />
> CoreLam [] e -> whnf p (CoreApp e (a:as))<br />
> CoreLam (b:bs) e -> whnf p (CoreLet [(b,a)]<br />
> (CoreApp (CoreLam bs e) as))<br />
</haskell><br />
<br />
Core programs may contain positional information which we just ignore:<br />
<br />
<haskell><br />
> whnf p (CorePos _ e) = whnf p e<br />
</haskell><br />
<br />
And the final, fall-through case covers primitive literals and<br />
functions which we are not concerned with here:<br />
<br />
<haskell><br />
> whnf p e = e<br />
</haskell><br />
<br />
Now, for the sake of completeness, we return to our <tt>match</tt><br />
function. It takes the evaluated case subject and tries to match it<br />
against each case-alternative (a pattern-expression pair) in order of<br />
appearance. We use the "failure as a list of successes" technique to<br />
model the fact that matching may fail.<br />
<br />
<haskell><br />
> type Alt = (CoreExpr, CoreExpr)<br />
<br />
> match :: CoreExpr -> [Alt] -> CoreExpr<br />
> match e as = head (concatMap (try e) as)<br />
</haskell><br />
<br />
Before defining <tt>try</tt>, it is useful to have a function that<br />
turns the two possible constuctor forms into a single, normal form.<br />
This greatly reduces the number of cases we need to consider in the<br />
definition of <tt>try</tt>.<br />
<br />
<haskell><br />
> norm :: CoreExpr -> CoreExpr<br />
> norm (CoreCon c) = CoreApp (CoreCon c) []<br />
> norm x = x<br />
</haskell><br />
<br />
Hopefully by now the definition of <tt>try</tt> will be<br />
self-explanatory:<br />
<br />
<haskell><br />
> try :: CoreExpr -> Alt -> [CoreExpr]<br />
> try e (pat, rhs) =<br />
> case (norm pat, norm e) of<br />
> (CoreApp (CoreCon f) as, CoreApp (CoreCon g) bs)<br />
> | f == g -> [CoreLet (zip (vars as) bs) rhs]<br />
> (CoreVar v, e) -> [CoreLet [(v, e)] rhs]<br />
> _ -> []<br />
> where<br />
> vars = map fromCoreVar<br />
</haskell><br />
<br />
This completes the definition of <tt>whnf</tt>. However, we would<br />
like to be able to fully evaluate expressions -- to what we simply<br />
call "normal form" -- so that the resulting value's tree is computed<br />
in its entirety. Our <tt>nf</tt> function repeatedly applies<br />
<tt>whnf</tt> at progressively deeper nodes in the growing tree:<br />
<br />
<haskell><br />
> nf :: Core -> CoreExpr -> CoreExpr<br />
> nf p e =<br />
> case whnf p e of<br />
> CoreCon c -> CoreCon c<br />
> CoreApp (CoreCon c) es -> CoreApp (CoreCon c) (map (nf p) es)<br />
> e -> e<br />
</haskell><br />
<br />
All that remains is to turn our evaluator into a program by giving it<br />
a sensible <tt>main</tt> function. We first load the core file using<br />
<tt>loadCore</tt> and then apply <tt>removeRecursiveLet</tt>, as<br />
discussed ealier, before evaluating the expression <tt>CoreFun<br />
"main"</tt> to normal form and printing it.<br />
<br />
<haskell><br />
> module Main where<br />
<br />
> import System<br />
> import Monad<br />
<br />
> main :: IO ()<br />
> main = liftM head getArgs<br />
> >>= liftM removeRecursiveLet . loadCore<br />
> >>= print . flip nf (CoreFun "main")<br />
</haskell><br />
<br />
In future we hope to use a variant of this evaluator (with sharing) in<br />
a property-based testing framework. This will let us check that<br />
various program analyses and transformations that we have developed<br />
are semantics-preserving. As part of another project, we have<br />
sucessfully extended the evaluator to support various functional-logic<br />
evaluation strategies, including residuation and narrowing.<br />
<br />
<br />
== Javascript backend ==<br />
<br />
The Javascript backend is a unique feature of Yhc, something which our light weight approach makes easier.<br />
<br />
Author: Dimitry<br />
<br />
The idea to write a converter from Haskell to Javascript has been floating around for a while [[http://www.haskell.org/haskellwiki/Hajax Hajax]]. Many people expressed interest in such feature, but no practical implementation was visible.<br />
<br />
Initial goals of this subproject were:<br />
<br />
* To develop a conversion program that converts the [http://haskell.org/haskellwiki/Yhc/API/Core Yhc Core] to Javascript, thus making it possible to execute arbitrary Haskell code within a web browser<br />
<br />
* To develop an unsafe interface layer for quick access to Javascript objects with ability to wrap arbitrary Javascript code into a Haskell-callable function<br />
<br />
* To develop a typesafe interface layer on top of the unsafe interface layer for access to the Document Object Model (DOM) available to Javascript executed in a web browser<br />
<br />
* To develop or adopt an existing GUI library or toolkit working on top of the typesafe DOM layer for actual development of clientside Web applications.<br />
<br />
=== General concepts ===<br />
<br />
The Javascript backend converts a linked and optimized Yhc Core file into a piece of Javascript code to be embedded in a XHTML document. The Javascript code generator attempts to translate Core expressions to Javascript expressions one-to-one with minor optimizations on its own, taking advantage of Javascript capability to pass functions around as values.<br />
<br />
Three kinds of functions are present in the Javascript backend:<br />
<br />
* Unsafe functions that embed pieces of Javascript directly into the generated code: these functions pay no respect to types of arguments passed, and may force evaluation of their arguments if needed.<br />
<br />
* Typesafe wrappers that provide type signatures for unsafe functions. Such wrappers are either handwritten, or automatically generated from external interface specifications (such as the Document Object Model interface)<br />
<br />
* Regular library functions. These either come unmodified from the standard packages that come with Yhc, or are substituted by the Javascript backend using the Core overlay technique. An example of such a function is the <hask>toUpper</hask> function which is hooked up to the Javascript implementation supporting Unicode (the original library function currently works correctly only for the Latin1 range of characters).<br />
<br />
==== Unsafe interfaces ====<br />
<br />
The core part of unsafe interface to Javascript (or, in other words, Javascript FFI) is a pseudo-function <hask>unsafeJS</hask>.<br />
<br />
The function has a type signature: <br />
<br />
<hask><br />
foreign import primitive unsafeJS :: String -> a<br />
</hask><br />
<br />
Which means that it takes a string. Type of the return value does not matter: the function itself is never executed. Its applications are detected by the Yhc Core to Javascript conversion program at the time of Javascript generation. <br />
<br />
The unsafeJS function should be called with a string literal. Neither explicitly coded (with (:)) list of characters nor concatenation of two or more strings will work. The converter will report an error in this situation. <br />
<br />
A valid example of using unsafeJS is shown below: <br />
<br />
<haskell><br />
global_YHC'_Primitive'_primIntSignum :: Int -> Int<br />
<br />
global_YHC'_Primitive'_primIntSignum a = unsafeJS<br />
"var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;"<br />
</haskell><br />
<br />
This is a Javascript overlay (in the sense that it overlays the default Prelude definition of the <hask>signum</hask> function) of a function that returns sign of an <hask>Int</hask> value. <br />
<br />
The string literal <hask>unsafeJS</hask> is applied to is the Javascript code to be wrapped. <br />
<br />
Below is the Javascript representation of this function found in generated code. <br />
<br />
<pre><br />
strIdx["F_hy"] = "YHC.Primitive.primIntSignum";<br />
...<br />
var F_hy=new HSFun("F_hy", 1, function(a){<br />
var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;});<br />
</pre><br />
<br />
==== Typesafe wrappers ====<br />
<br />
These functions add type safety on top of unsafe interface to Javascript. Sometimes they are defined within the same module as unsafe interfaces themselves, thus avoiding the exposure of unsafe interfaces to programmers.<br />
<br />
An example of a handwritten wrapper is a function to create a new <hask>JSRef</hask> (a mechanism similar to <hask>IORef</hask>, but specific to Javascript).<br />
<br />
<haskell><br />
data JSRef a<br />
<br />
newJSRef :: a -> CPS b (JSRef a)<br />
<br />
newJSRef a = toCPE (newJSRef' a)<br />
newJSRef' a = unsafeJS "return {_val:a};"<br />
</haskell><br />
<br />
Technically, a <hask>JSRef</hask> is a Javascript object with a property named ''_val'' that holds a persistent reference to some value. On the unsafe side, invoking a constructor for such an object would be sufficient. It is however desired that:<br />
<br />
* calls to functions creating such persistent references are properly sequenced with calls to funcitons using these references, and<br />
<br />
* type of values referred to were known to the Haskell compiler.<br />
<br />
The unsafe part is implemented by the function <hask>newJSRef'</hask> which merely calls <hask>unsafeJS</hask> with proper Javascript constructor. The wrapper part <hask>newJSRef</hask> wraps the unsafe function into a CPS-style function, and is given a proper type signature, so the compiler is better informed.<br />
<br />
In some cases, such typesafe wrappers may be generated automatically, using some external interface specifications provided by third parties for their APIs.<br />
<br />
As an example of such API, the W3C DOM interface may be taken. For instance, this piece of OMG IDL:<br />
<br />
<pre><br />
interface Text : CharacterData {<br />
Text splitText(in unsigned long offset)<br />
raises(DOMException);<br />
};<br />
</pre><br />
<br />
is converted into:<br />
<br />
<haskell><br />
data TText = TText<br />
<br />
...<br />
<br />
instance CText TText<br />
<br />
instance CCharacterData TText<br />
<br />
instance CNode TText<br />
<br />
...<br />
<br />
splitText :: (CText this, CText zz) => this -> Int -> CPS c zz<br />
splitText a b = toCPE (splitText' a b)<br />
splitText' a b<br />
= unsafeJS "return((exprEval(a)).splitText(exprEval(b)));"<br />
</haskell><br />
<br />
again, giving the Haskell compiler better control over types of this function's (initially type-agnostic) arguments.<br />
<br />
=== Usage of Continuation Passing Style ===<br />
<br />
Initially it was attempted to build a monadic framework. The <hask>JS</hask> monad was designed to play the same role as the <hask>IO</hask> monad plays in "regular" Haskell programming. There were however arguments in favor of using [http://haskell.org/haskellwiki/Continuation Continuation Passing Style] (CPS):<br />
<br />
* CPS involves less overhead as each expression passes its continustion iself instead of <hask>bind</hask> which takes the expression and invokes the continuation<br />
<br />
* CPS results in Javascript patterns that are easy to detect and optimize for (this is one of the future plans).<br />
<br />
* The [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets] GUI library internals are written in CPS, so taking CPS as general approach to programming is believed to make adoption of Fudgets easier.<br />
<br />
=== Integration with DOM ===<br />
<br />
The [http://www.w3.org Web Consortium] provides [http://www.omg.org/gettingstarted/omg_idl.htm OMG IDL] files to describe the API to use with the [http://www.w3.org/DOM/ Document Object Model] (DOM). An utility was designed, based on [http://www.haskell.org/hdirect/ HaskellDirect], to parse these files and convert them to set of Haskell modules. The way interface inheritance is reflected differs from the original HaskellDirect way: in HaskellDirect this was achieved by declaration of "nested" algebraic data types, while the Javascript backend utility takes advantage of Haskell typeclasses, representing DOM types with phantom types, and declaring them instances of appropriate class(es).<br />
<br />
=== Unicode support ===<br />
<br />
Despite the fact that all modern Web browsers support Unicode, this is not the case with Javascript: no access to Unicode characters' properties is provided. In the same time it is impossible for a Haskell application running in a browser not to have access to such information. The approach used is the same as used in [http://www.haskell.org/hugs Hugs] and [http://www.haskell.org/ghc GHC]: the Unicode characters database file from [http://www.unicode.org Unicode Consortium] was converted into a set of Javascript arrays, each array entry representing a range of character code values, or a case conversion rule for a range (for this implementation, Unicode support was limited to character category, and simple case conversions). First, a range is found by character code using binary search; then character category and case conversion distances (values to add to character code to convert between upper and lower cases) are retrieved from the range entry. The whole set of arrays adds about 70 kilobytes to the web page size, if embedded inside a &lt;script&gt; tag.<br />
<br />
Using the Core overlay technique, Haskell character functions (like <hask>toUpper</hask>, <hask>isAlpha</hask>, etc.) were hooked up to the Javascript implementations supporting Unicode. This did not result in considerable slowdowns, rather, some browsers even showed minor speedup in heavy functions like <hask>read::String -> Int</hask>.<br />
<br />
=== Examples of code generation ===<br />
<br />
The examples below show some real-life functions' conversion from Haskell via Core to Javascript. It would be good to mention that the Javascript code generator changes over time as the Javascript backend evolves, so these examples really describe the situation as of the moment this article is being written.<br />
<br />
This function:<br />
<br />
<haskell><br />
fromRoman = foldr fromNumeral 0 . maxmunch . map toUpper<br />
</haskell><br />
<br />
converted to Yhc Core:<br />
<br />
<haskell><br />
Roman.fromRoman =<br />
Prelude._LAMBDA27191<br />
(Prelude._LAMBDA27191<br />
(Prelude.map Data.Char.toUpper)<br />
(Roman.maxmunch Prelude.Prelude.Num.Prelude.Int))<br />
(Prelude.foldr<br />
(Roman.fromNumeral<br />
Prelude.Prelude.Num.Prelude.Int<br />
Prelude.Prelude.Ord.Prelude.Int)<br />
0)<br />
<br />
-- The LAMBDA is similar to the composition function (.), only with<br />
-- inverted order of application: _LAMBDA27191 f g x = g (f x)<br />
<br />
Prelude._LAMBDA27191 v22167 v22166 v2007 = v22166 (v22167 v2007)<br />
</haskell><br />
<br />
converted to Javascript:<br />
<br />
<pre><br />
<br />
/* fromRoman, code was formatted manually */<br />
<br />
var F_g8=new HSFun("F_g8", 0, function(){<br />
return (F_e9)._ap([(F_e9)._ap([new HSFun("F_gz", 0, <br />
function(){<br />
return (F_gz)._ap([F_Z]);<br />
}), new HSFun("F_g9", 0, <br />
function(){<br />
return (F_g9)._ap([F_dC]);<br />
})]), new HSFun("F_gp", 0, <br />
function(){<br />
return (F_gp)._ap([new HSFun("F_g7", 0, <br />
function(){<br />
return (F_g7)._ap([F_dC, F_d1]);<br />
}), 0]);<br />
})]);<br />
});<br />
<br />
/* _LAMBDA27191 */<br />
<br />
var F_e9=new HSFun("F_e9", 3, function(_b3, _b2, _bO){return (_b2)._ap([(_b3)._ap([_bO])]);});<br />
</pre><br />
<br />
During the conversion to Javascript, all identifiers found in Yhc Core are renamed to much shorter ones consisting only of alphanumeric characters and thus surely valid for Javascript (identifiers in Yhc Core often are very long, or contain special characters, etc.)<br />
<br />
While it is really hard to understand anything from the Javascript for the <hask>fromRoman</hask> function (other than the Javascript backend already makes a good obfuscator), something may be seen in the Javascript for the composition function. It builds an application of its first argument to the third, and then the application of the second to the previous application, and returns the latter.<br />
<br />
Another example of a function whose implementation was replaced via the Overlay technique is the <hask>isSpace</hask> function:<br />
<br />
<haskell><br />
global_Data'_Char'_isSpace = f . ord<br />
where f a = unsafeJS "return uIsSpace(exprEval(a));"<br />
</haskell><br />
<br />
<haskell><br />
Data.Char.isSpace =<br />
Prelude._LAMBDA27191<br />
Data._CharNumeric.ord<br />
StdOverlay.StdOverlay.Prelude.287.f<br />
</haskell><br />
<br />
<pre><br />
var F_W=new HSFun("F_W", 0, function(){return (F_e9)._ap([F_bh, F_hk]);});<br />
</pre><br />
<br />
In the Haskell code, the <hask>global_Data'_Char'_isSpace</hask> identifier tells the Core Overlay engine that the function with qualified name <hask>Data.Char.isSpace</hask> is to be replaced with a new implementation. In Yhc Core, the previously reviewed reversed composition function can be seen which composes the <hask>ord</hask> function, and an inner function that actually invokes the Javascript function which in turn performs the Unicode properties lookup for a given character numeric code.<br />
<br />
=== Browsers compatibility ===<br />
<br />
Compatibility with major browsers such as Netscape/Mozilla/Firefox and Microsoft Internet Explorer, and also Opera was observed. Compatibility with Safari has not been reached so far.<br />
<br />
=== Future plan: Fudgets ===<br />
<br />
It is planned to port some portion of [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets], so it becomes possible to write Web applications using this library. Several experiments showed that the Stream Processors (SP), and some parts of Fudget Kernel layers worked within a Javascript application. More problems are expected with porting the toplevel widgets due to differences in many concepts between Web browser and X Window, for which the Fudgets library was originally developed.<br />
<br />
== Wacky features ==<br />
<br />
Yhc is going in many interesting directions. Some of these directions are likely to become very important in the future, some are likely to fade away. Yhc is a genuine research bed for brand new ideas.<br />
<br />
Author: All<br />
<br />
When you don't spend all the time on wacky type systems, you get a lot more time left to work on Wacky other stuff. Include Java interpetter, .NET back end, Javascript back end, Python interpretter, Hat debugging, yhi-stack, whole program optimisation. Lots of these things are breeding grounds for various useful technologies, and most are marching towards genuine usefulness.<br />
<br />
== The Future ==<br />
<br />
The original structure of nhc was as one big set of modules – some were broadly divided into type checking/parsing etc, but the overall structure and grouping was weaker than in other compilers – particularly GHC which has a very well defined structure. One of our first actions was to split up the code into hierarchical modules, introducing Type.*, Parse.* etc to better divide the components.<br />
<br />
Some of the sections are shared – for example both hpc and Hat share the nhc Parse modules, and could equally share the Yhc parse modules. This direction is obviously attractive – by opening up more parts of the compiler for reuse in other projects we gain in many ways:<br />
<br />
* The authors of other project benefit from having libraries to build upon, not needlessly replicating the functionality already present in Yhc.<br />
* We benefit from having more people focused and dependent on our code, resulting in a bigger community.<br />
* We get additional bug reports.<br />
* By structuring code as a library, “hacks” that before might have been tolerated quickly take on a whole new aura of ugliness.<br />
* By isolating each component, we minimize the amount of whole compiler knowledge required.<br />
<br />
This direction attracts us, and we see this as the future direction of our compiler.<br />
<br />
Interestingly, almost at the same time the GHC developers have introduced a GHC API. We do not wish to replicate the GHC API, in particular their API is more complex than ours. We also wish to design an API, and then construct some future version of Yhc around the API we develop – rather than exporting an API.<br />
<br />
For the next major Yhc version we would hope to rewrite the parser, type checker, renamer and desugarer – leaving only the new ByteCode and Core aspects still intact. This is clearly a lot of work, and we view this as a very long term plan. For the moment we hope to push the Core and ByteCode features of Yhc into new areas.<br />
<br />
In the meantime things like libary compatability and Haskell' loom more closely on horizon. Our progress towards these goals relies on the help of volunteers.<br />
<br />
== Acknowledgements ==<br />
<br />
Thanks to everyone who has submitted a patch, become a buildbot, reported bugs or done anything else to benefit the Yhc project. We've put together a list of most of the people (if we've missed you, please shout, and we'll add your name in future versions of this document!)<br />
<br />
Andrew Wilkinson, Bernie Pope, Bob Davie, Brian Alliet, Christopher Lane Hinson, Dimitry Golubovsky, Gabor Greif, Goetz Isenmann, Isaac Dupree, Kartik Vaddadi, Krasimir Angelov, Malcolm Wallace, Michal Palka, Mike Dodds, Neil Mitchell, Robert Dockins, Samuel Bronson, Stefan O'Rear, Thorkil Naur, Tom Shackell</div>Mfnhttps://wiki.haskell.org/index.php?title=Yhc/TMR&diff=12431Yhc/TMR2007-04-05T21:19:10Z<p>Mfn: </p>
<hr />
<div>Authors: Neil Mitchell, Tom Shackell, Dimitry Golubovsky, Matthew Naylor<br />
<br />
This is a draft of the Yhc TMR article, deadline April 13th. It isn't intended as a wiki article beyond the listed authors (although if you want to fix some spelling, we don't mind!). If you are interested in helping email the Yhc list.<br />
<br />
== The beginning ==<br />
<br />
In the beginning there was the nhc compiler, which had a number of issues. We fixed some of them.<br />
<br />
Author: Tom/Neil/Andrew<br />
<br />
How we started up Yhc, this is the section that would have been in the History of Haskell paper if they had done a Yhc section :)<br />
<br />
Include the transition from CVS -> york Darcs -> haskell.org Darcs<br />
<br />
== Portability concerns ==<br />
<br />
From the beginning portability was a prime concern, while the original nhc was only running on Linux v old.old, and never Windows, Yhc was fully portable by design.<br />
<br />
Author: Tom, Andrew<br />
<br />
Why portability is such a concern, details of our ports system. Include our scons architecture, buildbot system etc. Mention that Yhc runs under Hugs, and indeed some of the developers use Hugs.<br />
<br />
<br />
<br />
== Yhc.Core ==<br />
<br />
Yhc.Core is one of our most successful libraries to date. The original nhc compiler used an intermediate core language called PosLambda – a basic lambda calculus extended with positional information. Although the language was very similar to a subset of Haskell, it wasn’t the same. In particular there were unusual constructs such as FatBar, and all names were stored in a symbol table.<br />
<br />
Rather than attempt to change the PosLambda language, a task that would have been decidedly painful, we chose instead to write a Core language from scratch, being inspired by both PosLambda and GHC Core. We then wrote a converter from PosLambda to Yhc.Core.<br />
<br />
In particular our idealised Core language differs from GHC Core in a number of ways:<br />
<br />
* Untyped – originally this was a restriction of PosLambda, but now some of us see this as a feature (others still see this as a bug).<br />
* Syntactically a subset of Haskell.<br />
* Minimal name mangling.<br />
<br />
All these features combine to create a Core language which resembles Haskell much more than Core languages in other Haskell compilers. This very low barrier to entry means that with merely the Haddock description of our abstract syntax tree, most Haskell programmers can feel at home with relatively little effort.<br />
<br />
Having reduced the barrier to entry for our Core language, the number of projects depending on it has increased. Consistently we have attempted to add facilities to the libraries for common tasks, rather than duplicating them separately in projects. As a result the Core library now has facilities for dealing with primitives, removing recursive lets, reachability analysis, strictness analysis, simplification, inlining and more.<br />
<br />
One of the first features we added to Core was whole program linking – any Haskell program, regardless of the number of modules, can be collapsed into one single Yhc.Core module. While this breaks separate compilation, if allows many types of analysis and transformation to be performed in a simpler manner. Naturally, if a technique turns out to be successful breaking the dependence on whole program compilation is a worthy goal – but this approach allows developers to pay that cost only when it is needed.<br />
<br />
=== A Small Example ===<br />
<br />
To give a flavour of what Core looks like, it is easiest to start with a small program:<br />
<br />
<haskell><br />
head2 (x:xs) = x<br />
<br />
map2 f [] = []<br />
map2 f (x:xs) = f x : map2 f xs<br />
<br />
test x = map2 head2 x<br />
</haskell><br />
<br />
Compiling this with <tt>yhc -showcore Sample.hs</tt> generates:<br />
<br />
<haskell><br />
Sample.head2 v220 =<br />
case v220 of<br />
(:) v221 v222 -> v221<br />
_ -> Prelude.error Sample._LAMBDA228<br />
<br />
Sample._LAMBDA228 =<br />
"Sample: Pattern match failure in function at 9:1-9:15."<br />
<br />
Sample.map2 v223 v224 =<br />
case v224 of<br />
[] -> []<br />
(:) v225 v226 -> (:) (v223 v225) (Sample.map2 v223 v226)<br />
<br />
Sample.test v227 = Sample.map2 Sample.head2 v227<br />
</haskell><br />
<br />
The generated Core looks very much like Haskell, and can be treated as such. The generated Core is the simplest form of Haskell, with many restrictions:<br />
<br />
* Case statements only examine their outermost constructor<br />
* No type classes<br />
* No where<br />
* No nested functions (only top level)<br />
* All names are fully qualified<br />
<br />
=== Yhc.Core.Overlay ===<br />
<br />
Describe the overlay here<br />
<br />
=== Evaluator ===<br />
=== Semantics of Yhc Core ===<br />
<br />
In this section an evaluator for Yhc Core programs is presented in the<br />
form of a literate Haskell program. The aim is to define the meaning<br />
of Core programs while demonstrating a full, albeit simple,<br />
application of the <tt>Yhc.Core</tt> library.<br />
<br />
<haskell><br />
> module Eval where<br />
<br />
> import Yhc.Core<br />
</haskell><br />
<br />
Our evaluator is based around the function <tt>whnf</tt> that takes a<br />
core program (of type <tt>Core</tt>) along with a core expression (of<br />
type <tt>CoreExpr</tt>) and reduces that expression until it has the<br />
form:<br />
<br />
<ul><br />
<br />
<li>a data constructor with unevaluated arguments, or</li><br />
<br />
<li>an unapplied lambda expression.</li><br />
<br />
</ul><br />
<br />
In general, data values in Haskell are tree-shaped. The function<br />
<tt>whnf</tt> is often said to "reduce an expression to head normal<br />
form" because it reveals the head (or root) of a value's tree and no<br />
more. Stricly speaking, when the result of reduction could be a<br />
functional value (i.e. a lambda expression), and the body of that<br />
lambda is left unevaluated, then the result is said to be in "weak<br />
head normal form" -- this explains the strange acronym WHNF!<br />
<br />
The type of <tt>whnf</tt> is:<br />
<br />
<haskell><br />
> whnf :: Core -> CoreExpr -> CoreExpr<br />
</haskell><br />
<br />
Defining it is a process of taking each kind of core expression in<br />
turn, and asking "how do I reduce this to weak head normal form?" As<br />
usual, it makes sense to define the base cases first, namely<br />
constructors and lambda expressions:<br />
<br />
<haskell><br />
> whnf p (CoreCon c) = CoreCon c<br />
> whnf p (CoreApp (CoreCon c) as) = CoreApp (CoreCon c) as<br />
> whnf p (CoreLam (v:vs) e) = CoreLam (v:vs) e<br />
</haskell><br />
<br />
Notice that there are two forms in which a constructor may appear:<br />
alone with no arguments, or as function application to a list of<br />
arguments. Also, because of the way our evaluator is designed, we may<br />
encounter lambda expressions with no arguments. Hence, only lambdas<br />
with arguments represent a base-case. For the no-arguments case, we<br />
just shift the focus of reduction to the body:<br />
<br />
<haskell><br />
> whnf p (CoreLam [] e) = whnf p e<br />
</haskell><br />
<br />
Currently, lambda expressions do not occur in the Core output of Yhc.<br />
They are part of the Core syntax because they are useful conceptually,<br />
particularly when maniplating (and evaluating) higher-order functions.<br />
<br />
Moving on to case-expressions, we first reduce the case subject, then<br />
match it against each pattern in turn, and finally reduce the body of<br />
the chosen alternative. In Core, we can safely assume that patterns<br />
are at most one constructor deep, so reduction of the subject to WHNF<br />
is sufficient.<br />
<br />
<haskell><br />
> whnf p (CoreCase e as) = whnf p (match (whnf p e) as)<br />
</haskell><br />
<br />
We leave the definition of <tt>match</tt> until later.<br />
<br />
To reduce a let-expression, we apply the let-bindings as a<br />
substitution to the body of the let. This is easily done using the<br />
Core function <tt>replaceFreeVars</tt>. Of course, let-expressions in<br />
Core are recursive, but before evaluating a core program we transform<br />
all recursive-lets away (see below). Notice that we are in no way<br />
trying to preseve the sharing implied by let-expressions, although we<br />
have done so in more complex variants of the evaluator.<br />
Strictly-speaking, Haskell evaluators are not obliged to implement<br />
sharing -- this is why it is more correct to term Haskell non-strict<br />
than lazy.<br />
<br />
<haskell><br />
> whnf p (CoreLet bs e) = whnf p (replaceFreeVars bs e)<br />
</haskell><br />
<br />
When we ecounter an unapplied function we call <tt>coreFunc</tt> to<br />
lookup its definition (i.e. its argments and its right-hand-side), and<br />
construct a corresponding lambda expression:<br />
<br />
<haskell><br />
> whnf p (CoreFun f) = whnf p (CoreLam bs body)<br />
> where<br />
> CoreFunc _ bs body = coreFunc p f<br />
</haskell><br />
<br />
This means that when reducing function applications, we know that<br />
reduction of the function part will yield a lambda:<br />
<br />
<haskell><br />
> whnf p (CoreApp f []) = whnf p f<br />
> whnf p (CoreApp f (a:as)) =<br />
> case whnf p f of<br />
> CoreLam [] e -> whnf p (CoreApp e (a:as))<br />
> CoreLam (b:bs) e -> whnf p (CoreLet [(b,a)]<br />
> (CoreApp (CoreLam bs e) as))<br />
</haskell><br />
<br />
Core programs may contain positional information which we just ignore:<br />
<br />
<haskell><br />
> whnf p (CorePos _ e) = whnf p e<br />
</haskell><br />
<br />
And the final, fall-through case covers primitive literals and<br />
functions which we are not concerned with here:<br />
<br />
<haskell><br />
> whnf p e = e<br />
</haskell><br />
<br />
Now, for the sake of completeness, we return to our <tt>match</tt><br />
function. It takes the evaluated case subject and tries to match it<br />
against each case-alternative (a pattern-expression pair) in order of<br />
appearance. We use the "failure as a list of successes" technique to<br />
model the fact that matching may fail.<br />
<br />
<haskell><br />
> type Alt = (CoreExpr, CoreExpr)<br />
<br />
> match :: CoreExpr -> [Alt] -> CoreExpr<br />
> match e as = head (concatMap (try e) as)<br />
</haskell><br />
<br />
Before defining <tt>try</tt>, it is useful to have a function that<br />
turns the two possible constuctor forms into a single, normal form.<br />
This greatly reduces the number of cases we need to consider in the<br />
definition of <tt>try</tt>.<br />
<br />
<haskell><br />
> norm :: CoreExpr -> CoreExpr<br />
> norm (CoreCon c) = CoreApp (CoreCon c) []<br />
> norm x = x<br />
</haskell><br />
<br />
Hopefully by now the definition of <tt>try</tt> will be<br />
self-explanatory:<br />
<br />
<haskell><br />
> try :: CoreExpr -> Alt -> [CoreExpr]<br />
> try e (pat, rhs) =<br />
> case (norm pat, norm e) of<br />
> (CoreApp (CoreCon f) as, CoreApp (CoreCon g) bs)<br />
> | f == g -> [CoreLet (zip (vars as) bs) rhs]<br />
> (CoreVar v, e) -> [CoreLet [(v, e)] rhs]<br />
> _ -> []<br />
> where<br />
> vars = map fromCoreVar<br />
</haskell><br />
<br />
This completes the definition of <tt>whnf</tt>. However, we would<br />
like to be able to fully evaluate expressions -- to what we simply<br />
call "normal form" -- so that the resulting value's tree is computed<br />
in its entirety. Our <tt>nf</tt> function repeatedly applies<br />
<tt>whnf</tt> at progressively deeper nodes in the growing tree:<br />
<br />
<haskell><br />
> nf :: Core -> CoreExpr -> CoreExpr<br />
> nf p e =<br />
> case whnf p e of<br />
> CoreCon c -> CoreCon c<br />
> CoreApp (CoreCon c) es -> CoreApp (CoreCon c) (map (nf p) es)<br />
> e -> e<br />
</haskell><br />
<br />
All that remains is to turn our evaluator into a program by giving it<br />
a sensible <tt>main</tt> function. We first load the core file using<br />
<tt>loadCore</tt> and then apply <tt>removeRecursiveLet</tt>, as<br />
discussed ealier, before evaluating the expression <tt>CoreFun<br />
"main"</tt> to normal form and printing it.<br />
<br />
<haskell><br />
> module Main where<br />
<br />
> import System<br />
> import Monad<br />
<br />
> main :: IO ()<br />
> main = liftM head getArgs<br />
> >>= liftM removeRecursiveLet . loadCore<br />
> >>= print . flip nf (CoreFun "main")<br />
</haskell><br />
<br />
In future we hope to use a variant of this evaluator (with sharing) in<br />
a property-based testing framework. This will let us check that<br />
various program analyses and transformations that we have developed<br />
are semantics-preserving. As part of another project, we have<br />
sucessfully extended the evaluator to support various functional-logic<br />
evaluation strategies, including residuation and narrowing.<br />
<br />
<br />
== Javascript backend ==<br />
<br />
The Javascript backend is a unique feature of Yhc, something which our light weight approach makes easier.<br />
<br />
Author: Dimitry<br />
<br />
The idea to write a converter from Haskell to Javascript has been floating around for a while [[http://www.haskell.org/haskellwiki/Hajax Hajax]]. Many people expressed interest in such feature, but no practical implementation was visible.<br />
<br />
Initial goals of this subproject were:<br />
<br />
* To develop a conversion program that converts the [http://haskell.org/haskellwiki/Yhc/API/Core Yhc Core] to Javascript, thus making it possible to execute arbitrary Haskell code within a web browser<br />
<br />
* To develop an unsafe interface layer for quick access to Javascript objects with ability to wrap arbitrary Javascript code into a Haskell-callable function<br />
<br />
* To develop a typesafe interface layer on top of the unsafe interface layer for access to the Document Object Model (DOM) available to Javascript executed in a web browser<br />
<br />
* To develop or adopt an existing GUI library or toolkit working on top of the typesafe DOM layer for actual development of clientside Web applications.<br />
<br />
=== General concepts ===<br />
<br />
The Javascript backend converts a linked and optimized Yhc Core file into a piece of Javascript code to be embedded in a XHTML document. The Javascript code generator attempts to translate Core expressions to Javascript expressions one-to-one with minor optimizations on its own, taking advantage of Javascript capability to pass functions around as values.<br />
<br />
Three kinds of functions are present in the Javascript backend:<br />
<br />
* Unsafe functions that embed pieces of Javascript directly into the generated code: these functions pay no respect to types of arguments passed, and may force evaluation of their arguments if needed.<br />
<br />
* Typesafe wrappers that provide type signatures for unsafe functions. Such wrappers are either handwritten, or automatically generated from external interface specifications (such as the Document Object Model interface)<br />
<br />
* Regular library functions. These either come unmodified from the standard packages that come with Yhc, or are substituted by the Javascript backend using the Core overlay technique. An example of such a function is the <hask>toUpper</hask> function which is hooked up to the Javascript implementation supporting Unicode (the original library function currently works correctly only for the Latin1 range of characters).<br />
<br />
==== Unsafe interfaces ====<br />
<br />
The core part of unsafe interface to Javascript (or, in other words, Javascript FFI) is a pseudo-function <hask>unsafeJS</hask>.<br />
<br />
The function has a type signature: <br />
<br />
<hask><br />
foreign import primitive unsafeJS :: String -> a<br />
</hask><br />
<br />
Which means that it takes a string. Type of the return value does not matter: the function itself is never executed. Its applications are detected by the Yhc Core to Javascript conversion program at the time of Javascript generation. <br />
<br />
The unsafeJS function should be called with a string literal. Neither explicitly coded (with (:)) list of characters nor concatenation of two or more strings will work. The converter will report an error in this situation. <br />
<br />
A valid example of using unsafeJS is shown below: <br />
<br />
<haskell><br />
global_YHC'_Primitive'_primIntSignum :: Int -> Int<br />
<br />
global_YHC'_Primitive'_primIntSignum a = unsafeJS<br />
"var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;"<br />
</haskell><br />
<br />
This is a Javascript overlay (in the sense that it overlays the default Prelude definition of the <hask>signum</hask> function) of a function that returns sign of an <hask>Int</hask> value. <br />
<br />
The string literal <hask>unsafeJS</hask> is applied to is the Javascript code to be wrapped. <br />
<br />
Below is the Javascript representation of this function found in generated code. <br />
<br />
<pre><br />
strIdx["F_hy"] = "YHC.Primitive.primIntSignum";<br />
...<br />
var F_hy=new HSFun("F_hy", 1, function(a){<br />
var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;});<br />
</pre><br />
<br />
==== Typesafe wrappers ====<br />
<br />
These functions add type safety on top of unsafe interface to Javascript. Sometimes they are defined within the same module as unsafe interfaces themselves, thus avoiding the exposure of unsafe interfaces to programmers.<br />
<br />
An example of a handwritten wrapper is a function to create a new <hask>JSRef</hask> (a mechanism similar to <hask>IORef</hask>, but specific to Javascript).<br />
<br />
<haskell><br />
data JSRef a<br />
<br />
newJSRef :: a -> CPS b (JSRef a)<br />
<br />
newJSRef a = toCPE (newJSRef' a)<br />
newJSRef' a = unsafeJS "return {_val:a};"<br />
</haskell><br />
<br />
Technically, a <hask>JSRef</hask> is a Javascript object with a property named ''_val'' that holds a persistent reference to some value. On the unsafe side, invoking a constructor for such an object would be sufficient. It is however desired that:<br />
<br />
* calls to functions creating such persistent references are properly sequenced with calls to funcitons using these references, and<br />
<br />
* type of values referred to were known to the Haskell compiler.<br />
<br />
The unsafe part is implemented by the function <hask>newJSRef'</hask> which merely calls <hask>unsafeJS</hask> with proper Javascript constructor. The wrapper part <hask>newJSRef</hask> wraps the unsafe function into a CPS-style function, and is given a proper type signature, so the compiler is better informed.<br />
<br />
In some cases, such typesafe wrappers may be generated automatically, using some external interface specifications provided by third parties for their APIs.<br />
<br />
As an example of such API, the W3C DOM interface may be taken. For instance, this piece of OMG IDL:<br />
<br />
<pre><br />
interface Text : CharacterData {<br />
Text splitText(in unsigned long offset)<br />
raises(DOMException);<br />
};<br />
</pre><br />
<br />
is converted into:<br />
<br />
<haskell><br />
data TText = TText<br />
<br />
...<br />
<br />
instance CText TText<br />
<br />
instance CCharacterData TText<br />
<br />
instance CNode TText<br />
<br />
...<br />
<br />
splitText :: (CText this, CText zz) => this -> Int -> CPS c zz<br />
splitText a b = toCPE (splitText' a b)<br />
splitText' a b<br />
= unsafeJS "return((exprEval(a)).splitText(exprEval(b)));"<br />
</haskell><br />
<br />
again, giving the Haskell compiler better control over types of this function's (initially type-agnostic) arguments.<br />
<br />
=== Usage of Continuation Passing Style ===<br />
<br />
Initially it was attempted to build a monadic framework. The <hask>JS</hask> monad was designed to play the same role as the <hask>IO</hask> monad plays in "regular" Haskell programming. There were however arguments in favor of using [http://haskell.org/haskellwiki/Continuation Continuation Passing Style] (CPS):<br />
<br />
* CPS involves less overhead as each expression passes its continustion iself instead of <hask>bind</hask> which takes the expression and invokes the continuation<br />
<br />
* CPS results in Javascript patterns that are easy to detect and optimize for (this is one of the future plans).<br />
<br />
* The [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets] GUI library internals are written in CPS, so taking CPS as general approach to programming is believed to make adoption of Fudgets easier.<br />
<br />
=== Integration with DOM ===<br />
<br />
The [http://www.w3.org Web Consortium] provides [http://www.omg.org/gettingstarted/omg_idl.htm OMG IDL] files to describe the API to use with the [http://www.w3.org/DOM/ Document Object Model] (DOM). An utility was designed, based on [http://www.haskell.org/hdirect/ HaskellDirect], to parse these files and convert them to set of Haskell modules. The way interface inheritance is reflected differs from the original HaskellDirect way: in HaskellDirect this was achieved by declaration of "nested" algebraic data types, while the Javascript backend utility takes advantage of Haskell typeclasses, representing DOM types with phantom types, and declaring them instances of appropriate class(es).<br />
<br />
=== Unicode support ===<br />
<br />
Despite the fact that all modern Web browsers support Unicode, this is not the case with Javascript: no access to Unicode characters' properties is provided. In the same time it is impossible for a Haskell application running in a browser not to have access to such information. The approach used is the same as used in [http://www.haskell.org/hugs Hugs] and [http://www.haskell.org/ghc GHC]: the Unicode characters database file from [http://www.unicode.org Unicode Consortium] was converted into a set of Javascript arrays, each array entry representing a range of character code values, or a case conversion rule for a range (for this implementation, Unicode support was limited to character category, and simple case conversions). First, a range is found by character code using binary search; then character category and case conversion distances (values to add to character code to convert between upper and lower cases) are retrieved from the range entry. The whole set of arrays adds about 70 kilobytes to the web page size, if embedded inside a &lt;script&gt; tag.<br />
<br />
Using the Core overlay technique, Haskell character functions (like <hask>toUpper</hask>, <hask>isAlpha</hask>, etc.) were hooked up to the Javascript implementations supporting Unicode. This did not result in considerable slowdowns, rather, some browsers even showed minor speedup in heavy functions like <hask>read::String -> Int</hask>.<br />
<br />
=== Examples of code generation ===<br />
<br />
The examples below show some real-life functions' conversion from Haskell via Core to Javascript. It would be good to mention that the Javascript code generator changes over time as the Javascript backend evolves, so these examples really describe the situation as of the moment this article is being written.<br />
<br />
This function:<br />
<br />
<haskell><br />
fromRoman = foldr fromNumeral 0 . maxmunch . map toUpper<br />
</haskell><br />
<br />
converted to Yhc Core:<br />
<br />
<haskell><br />
Roman.fromRoman =<br />
Prelude._LAMBDA27191<br />
(Prelude._LAMBDA27191<br />
(Prelude.map Data.Char.toUpper)<br />
(Roman.maxmunch Prelude.Prelude.Num.Prelude.Int))<br />
(Prelude.foldr<br />
(Roman.fromNumeral<br />
Prelude.Prelude.Num.Prelude.Int<br />
Prelude.Prelude.Ord.Prelude.Int)<br />
0)<br />
<br />
-- The LAMBDA is similar to the composition function (.), only with<br />
-- inverted order of application: _LAMBDA27191 f g x = g (f x)<br />
<br />
Prelude._LAMBDA27191 v22167 v22166 v2007 = v22166 (v22167 v2007)<br />
</haskell><br />
<br />
converted to Javascript:<br />
<br />
<pre><br />
<br />
/* fromRoman, code was formatted manually */<br />
<br />
var F_g8=new HSFun("F_g8", 0, function(){<br />
return (F_e9)._ap([(F_e9)._ap([new HSFun("F_gz", 0, <br />
function(){<br />
return (F_gz)._ap([F_Z]);<br />
}), new HSFun("F_g9", 0, <br />
function(){<br />
return (F_g9)._ap([F_dC]);<br />
})]), new HSFun("F_gp", 0, <br />
function(){<br />
return (F_gp)._ap([new HSFun("F_g7", 0, <br />
function(){<br />
return (F_g7)._ap([F_dC, F_d1]);<br />
}), 0]);<br />
})]);<br />
});<br />
<br />
/* _LAMBDA27191 */<br />
<br />
var F_e9=new HSFun("F_e9", 3, function(_b3, _b2, _bO){return (_b2)._ap([(_b3)._ap([_bO])]);});<br />
</pre><br />
<br />
During the conversion to Javascript, all identifiers found in Yhc Core are renamed to much shorter ones consisting only of alphanumeric characters and thus surely valid for Javascript (identifiers in Yhc Core often are very long, or contain special characters, etc.)<br />
<br />
While it is really hard to understand anything from the Javascript for the <hask>fromRoman</hask> function (other than the Javascript backend already makes a good obfuscator), something may be seen in the Javascript for the composition function. It builds an application of its first argument to the third, and then the application of the second to the previous application, and returns the latter.<br />
<br />
Another example of a function whose implementation was replaced via the Overlay technique is the <hask>isSpace</hask> function:<br />
<br />
<haskell><br />
global_Data'_Char'_isSpace = f . ord<br />
where f a = unsafeJS "return uIsSpace(exprEval(a));"<br />
</haskell><br />
<br />
<haskell><br />
Data.Char.isSpace =<br />
Prelude._LAMBDA27191<br />
Data._CharNumeric.ord<br />
StdOverlay.StdOverlay.Prelude.287.f<br />
</haskell><br />
<br />
<pre><br />
var F_W=new HSFun("F_W", 0, function(){return (F_e9)._ap([F_bh, F_hk]);});<br />
</pre><br />
<br />
In the Haskell code, the <hask>global_Data'_Char'_isSpace</hask> identifier tells the Core Overlay engine that the function with qualified name <hask>Data.Char.isSpace</hask> is to be replaced with a new implementation. In Yhc Core, the previously reviewed reversed composition function can be seen which composes the <hask>ord</hask> function, and an inner function that actually invokes the Javascript function which in turn performs the Unicode properties lookup for a given character numeric code.<br />
<br />
=== Browsers compatibility ===<br />
<br />
Compatibility with major browsers such as Netscape/Mozilla/Firefox and Microsoft Internet Explorer, and also Opera was observed. Compatibility with Safari has not been reached so far.<br />
<br />
=== Future plan: Fudgets ===<br />
<br />
It is planned to port some portion of [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets], so it becomes possible to write Web applications using this library. Several experiments showed that the Stream Processors (SP), and some parts of Fudget Kernel layers worked within a Javascript application. More problems are expected with porting the toplevel widgets due to differences in many concepts between Web browser and X Window, for which the Fudgets library was originally developed.<br />
<br />
== Wacky features ==<br />
<br />
Yhc is going in many interesting directions. Some of these directions are likely to become very important in the future, some are likely to fade away. Yhc is a genuine research bed for brand new ideas.<br />
<br />
Author: All<br />
<br />
When you don't spend all the time on wacky type systems, you get a lot more time left to work on Wacky other stuff. Include Java interpetter, .NET back end, Javascript back end, Python interpretter, Hat debugging, yhi-stack, whole program optimisation. Lots of these things are breeding grounds for various useful technologies, and most are marching towards genuine usefulness.<br />
<br />
== The Future ==<br />
<br />
The original structure of nhc was as one big set of modules – some were broadly divided into type checking/parsing etc, but the overall structure and grouping was weaker than in other compilers – particularly GHC which has a very well defined structure. One of our first actions was to split up the code into hierarchical modules, introducing Type.*, Parse.* etc to better divide the components.<br />
<br />
Some of the sections are shared – for example both hpc and Hat share the nhc Parse modules, and could equally share the Yhc parse modules. This direction is obviously attractive – by opening up more parts of the compiler for reuse in other projects we gain in many ways:<br />
<br />
* The authors of other project benefit from having libraries to build upon, not needlessly replicating the functionality already present in Yhc.<br />
* We benefit from having more people focused and dependent on our code, resulting in a bigger community.<br />
* We get additional bug reports.<br />
* By structuring code as a library, “hacks” that before might have been tolerated quickly take on a whole new aura of ugliness.<br />
* By isolating each component, we minimize the amount of whole compiler knowledge required.<br />
<br />
This direction attracts us, and we see this as the future direction of our compiler.<br />
<br />
Interestingly, almost at the same time the GHC developers have introduced a GHC API. We do not wish to replicate the GHC API, in particular their API is more complex than ours. We also wish to design an API, and then construct some future version of Yhc around the API we develop – rather than exporting an API.<br />
<br />
For the next major Yhc version we would hope to rewrite the parser, type checker, renamer and desugarer – leaving only the new ByteCode and Core aspects still intact. This is clearly a lot of work, and we view this as a very long term plan. For the moment we hope to push the Core and ByteCode features of Yhc into new areas.<br />
<br />
In the meantime things like libary compatability and Haskell' loom more closely on horizon. Our progress towards these goals relies on the help of volunteers.<br />
<br />
== Acknowledgements ==<br />
<br />
Thanks to everyone who has submitted a patch, become a buildbot, reported bugs or done anything else to benefit the Yhc project. We've put together a list of most of the people (if we've missed you, please shout, and we'll add your name in future versions of this document!)<br />
<br />
Andrew Wilkinson, Bernie Pope, Bob Davie, Brian Alliet, Christopher Lane Hinson, Dimitry Golubovsky, Gabor Greif, Goetz Isenmann, Isaac Dupree, Kartik Vaddadi, Krasimir Angelov, Malcolm Wallace, Michal Palka, Mike Dodds, Neil Mitchell, Robert Dockins, Samuel Bronson, Stefan O'Rear, Thorkil Naur, Tom Shackell</div>Mfnhttps://wiki.haskell.org/index.php?title=Yhc/TMR&diff=12430Yhc/TMR2007-04-05T21:16:57Z<p>Mfn: </p>
<hr />
<div>Authors: Neil Mitchell, Tom Shackell, Dimitry Golubovsky<br />
<br />
This is a draft of the Yhc TMR article, deadline April 13th. It isn't intended as a wiki article beyond the listed authors (although if you want to fix some spelling, we don't mind!). If you are interested in helping email the Yhc list.<br />
<br />
== The beginning ==<br />
<br />
In the beginning there was the nhc compiler, which had a number of issues. We fixed some of them.<br />
<br />
Author: Tom/Neil/Andrew<br />
<br />
How we started up Yhc, this is the section that would have been in the History of Haskell paper if they had done a Yhc section :)<br />
<br />
Include the transition from CVS -> york Darcs -> haskell.org Darcs<br />
<br />
== Portability concerns ==<br />
<br />
From the beginning portability was a prime concern, while the original nhc was only running on Linux v old.old, and never Windows, Yhc was fully portable by design.<br />
<br />
Author: Tom, Andrew<br />
<br />
Why portability is such a concern, details of our ports system. Include our scons architecture, buildbot system etc. Mention that Yhc runs under Hugs, and indeed some of the developers use Hugs.<br />
<br />
<br />
<br />
== Yhc.Core ==<br />
<br />
Yhc.Core is one of our most successful libraries to date. The original nhc compiler used an intermediate core language called PosLambda – a basic lambda calculus extended with positional information. Although the language was very similar to a subset of Haskell, it wasn’t the same. In particular there were unusual constructs such as FatBar, and all names were stored in a symbol table.<br />
<br />
Rather than attempt to change the PosLambda language, a task that would have been decidedly painful, we chose instead to write a Core language from scratch, being inspired by both PosLambda and GHC Core. We then wrote a converter from PosLambda to Yhc.Core.<br />
<br />
In particular our idealised Core language differs from GHC Core in a number of ways:<br />
<br />
* Untyped – originally this was a restriction of PosLambda, but now some of us see this as a feature (others still see this as a bug).<br />
* Syntactically a subset of Haskell.<br />
* Minimal name mangling.<br />
<br />
All these features combine to create a Core language which resembles Haskell much more than Core languages in other Haskell compilers. This very low barrier to entry means that with merely the Haddock description of our abstract syntax tree, most Haskell programmers can feel at home with relatively little effort.<br />
<br />
Having reduced the barrier to entry for our Core language, the number of projects depending on it has increased. Consistently we have attempted to add facilities to the libraries for common tasks, rather than duplicating them separately in projects. As a result the Core library now has facilities for dealing with primitives, removing recursive lets, reachability analysis, strictness analysis, simplification, inlining and more.<br />
<br />
One of the first features we added to Core was whole program linking – any Haskell program, regardless of the number of modules, can be collapsed into one single Yhc.Core module. While this breaks separate compilation, if allows many types of analysis and transformation to be performed in a simpler manner. Naturally, if a technique turns out to be successful breaking the dependence on whole program compilation is a worthy goal – but this approach allows developers to pay that cost only when it is needed.<br />
<br />
=== A Small Example ===<br />
<br />
To give a flavour of what Core looks like, it is easiest to start with a small program:<br />
<br />
<haskell><br />
head2 (x:xs) = x<br />
<br />
map2 f [] = []<br />
map2 f (x:xs) = f x : map2 f xs<br />
<br />
test x = map2 head2 x<br />
</haskell><br />
<br />
Compiling this with <tt>yhc -showcore Sample.hs</tt> generates:<br />
<br />
<haskell><br />
Sample.head2 v220 =<br />
case v220 of<br />
(:) v221 v222 -> v221<br />
_ -> Prelude.error Sample._LAMBDA228<br />
<br />
Sample._LAMBDA228 =<br />
"Sample: Pattern match failure in function at 9:1-9:15."<br />
<br />
Sample.map2 v223 v224 =<br />
case v224 of<br />
[] -> []<br />
(:) v225 v226 -> (:) (v223 v225) (Sample.map2 v223 v226)<br />
<br />
Sample.test v227 = Sample.map2 Sample.head2 v227<br />
</haskell><br />
<br />
The generated Core looks very much like Haskell, and can be treated as such. The generated Core is the simplest form of Haskell, with many restrictions:<br />
<br />
* Case statements only examine their outermost constructor<br />
* No type classes<br />
* No where<br />
* No nested functions (only top level)<br />
* All names are fully qualified<br />
<br />
=== Yhc.Core.Overlay ===<br />
<br />
Describe the overlay here<br />
<br />
=== Evaluator ===<br />
=== Semantics of Yhc Core ===<br />
<br />
In this section an evaluator for Yhc Core programs is presented in the<br />
form of a literate Haskell program. The aim is to define the meaning<br />
of Core programs while demonstrating a full, albeit simple,<br />
application of the <tt>Yhc.Core</tt> library.<br />
<br />
<haskell><br />
> module Eval where<br />
<br />
> import Yhc.Core<br />
</haskell><br />
<br />
Our evaluator is based around the function <tt>whnf</tt> that takes a<br />
core program (of type <tt>Core</tt>) along with a core expression (of<br />
type <tt>CoreExpr</tt>) and reduces that expression until it has the<br />
form:<br />
<br />
<ul><br />
<br />
<li>a data constructor with unevaluated arguments, or</li><br />
<br />
<li>an unapplied lambda expression.</li><br />
<br />
</ul><br />
<br />
In general, data values in Haskell are tree-shaped. The function<br />
<tt>whnf</tt> is often said to "reduce an expression to head normal<br />
form" because it reveals the head (or root) of a value's tree and no<br />
more. Stricly speaking, when the result of reduction could be a<br />
functional value (i.e. a lambda expression), and the body of that<br />
lambda is left unevaluated, then the result is said to be in "weak<br />
head normal form" -- this explains the strange acronym WHNF!<br />
<br />
The type of <tt>whnf</tt> is:<br />
<br />
<haskell><br />
> whnf :: Core -> CoreExpr -> CoreExpr<br />
</haskell><br />
<br />
Defining it is a process of taking each kind of core expression in<br />
turn, and asking "how do I reduce this to weak head normal form?" As<br />
usual, it makes sense to define the base cases first, namely<br />
constructors and lambda expressions:<br />
<br />
<haskell><br />
> whnf p (CoreCon c) = CoreCon c<br />
> whnf p (CoreApp (CoreCon c) as) = CoreApp (CoreCon c) as<br />
> whnf p (CoreLam (v:vs) e) = CoreLam (v:vs) e<br />
</haskell><br />
<br />
Notice that there are two forms in which a constructor may appear:<br />
alone with no arguments, or as function application to a list of<br />
arguments. Also, because of the way our evaluator is designed, we may<br />
encounter lambda expressions with no arguments. Hence, only lambdas<br />
with arguments represent a base-case. For the no-arguments case, we<br />
just shift the focus of reduction to the body:<br />
<br />
<haskell><br />
> whnf p (CoreLam [] e) = whnf p e<br />
</haskell><br />
<br />
Currently, lambda expressions do not occur in the Core output of Yhc.<br />
They are part of the Core syntax because they are useful conceptually,<br />
particularly when maniplating (and evaluating) higher-order functions.<br />
<br />
Moving on to case-expressions, we first reduce the case subject, then<br />
match it against each pattern in turn, and finally reduce the body of<br />
the chosen alternative. In Core, we can safely assume that patterns<br />
are at most one constructor deep, so reduction of the subject to WHNF<br />
is sufficient.<br />
<br />
<haskell><br />
> whnf p (CoreCase e as) = whnf p (match (whnf p e) as)<br />
</haskell><br />
<br />
We leave the definition of <tt>match</tt> until later.<br />
<br />
To reduce a let-expression, we apply the let-bindings as a<br />
substitution to the body of the let. This is easily done using the<br />
Core function <tt>replaceFreeVars</tt>. Of course, let-expressions in<br />
Core are recursive, but before evaluating a core program we transform<br />
all recursive-lets away (see below). Notice that we are in no way<br />
trying to preseve the sharing implied by let-expressions, although we<br />
have done so in more complex variants of the evaluator.<br />
Strictly-speaking, Haskell evaluators are not obliged to implement<br />
sharing -- this is why it is more correct to term Haskell non-strict<br />
than lazy.<br />
<br />
<haskell><br />
> whnf p (CoreLet bs e) = whnf p (replaceFreeVars bs e)<br />
</haskell><br />
<br />
When we ecounter an unapplied function we call <tt>coreFunc</tt> to<br />
lookup its definition (i.e. its argments and its right-hand-side), and<br />
construct a corresponding lambda expression:<br />
<br />
<haskell><br />
> whnf p (CoreFun f) = whnf p (CoreLam bs body)<br />
> where<br />
> CoreFunc _ bs body = coreFunc p f<br />
</haskell><br />
<br />
This means that when reducing function applications, we know that<br />
reduction of the function part will yield a lambda:<br />
<br />
<haskell><br />
> whnf p (CoreApp f []) = whnf p f<br />
> whnf p (CoreApp f (a:as)) =<br />
> case whnf p f of<br />
> CoreLam [] e -> whnf p (CoreApp e (a:as))<br />
> CoreLam (b:bs) e -> whnf p (CoreLet [(b,a)]<br />
> (CoreApp (CoreLam bs e) as))<br />
</haskell><br />
<br />
Core programs may contain positional information which we just ignore:<br />
<br />
<haskell><br />
> whnf p (CorePos _ e) = whnf p e<br />
</haskell><br />
<br />
And the final, fall-through case covers primitive literals and<br />
functions which we are not concerned with here:<br />
<br />
<haskell><br />
> whnf p e = e<br />
</haskell><br />
<br />
Now, for the sake of completeness, we return to our <tt>match</tt><br />
function. It takes the evaluated case subject and tries to match it<br />
against each case-alternative (a pattern-expression pair) in order of<br />
appearance. We use the "failure as a list of successes" technique to<br />
model the fact that matching may fail.<br />
<br />
<haskell><br />
> type Alt = (CoreExpr, CoreExpr)<br />
<br />
> match :: CoreExpr -> [Alt] -> CoreExpr<br />
> match e as = head (concatMap (try e) as)<br />
</haskell><br />
<br />
Before defining <tt>try</tt>, it is useful to have a function that<br />
turns the two possible constuctor forms into a single, normal form.<br />
This greatly reduces the number of cases we need to consider in the<br />
definition of <tt>try</tt>.<br />
<br />
<haskell><br />
> norm :: CoreExpr -> CoreExpr<br />
> norm (CoreCon c) = CoreApp (CoreCon c) []<br />
> norm x = x<br />
</haskell><br />
<br />
Hopefully by now the definition of <tt>try</tt> will be<br />
self-explanatory:<br />
<br />
<haskell><br />
> try :: CoreExpr -> Alt -> [CoreExpr]<br />
> try e (pat, rhs) =<br />
> case (norm pat, norm e) of<br />
> (CoreApp (CoreCon f) as, CoreApp (CoreCon g) bs)<br />
> | f == g -> [CoreLet (zip (vars as) bs) rhs]<br />
> (CoreVar v, e) -> [CoreLet [(v, e)] rhs]<br />
> _ -> []<br />
> where<br />
> vars = map fromCoreVar<br />
</haskell><br />
<br />
This completes the definition of <tt>whnf</tt>. However, we would<br />
like to be able to fully evaluate expressions -- to what we simply<br />
call "normal form" -- so that the resulting value's tree is computed<br />
in its entirety. Our <tt>nf</tt> function repeatedly applies<br />
<tt>whnf</tt> at progressively deeper nodes in the growing tree:<br />
<br />
<haskell><br />
> nf :: Core -> CoreExpr -> CoreExpr<br />
> nf p e =<br />
> case whnf p e of<br />
> CoreCon c -> CoreCon c<br />
> CoreApp (CoreCon c) es -> CoreApp (CoreCon c) (map (nf p) es)<br />
> e -> e<br />
</haskell><br />
<br />
All that remains is to turn our evaluator into a program by giving it<br />
a sensible <tt>main</tt> function. We first load the core file using<br />
<tt>loadCore</tt> and then apply <tt>removeRecursiveLet</tt>, as<br />
discussed ealier, before evaluating the expression <tt>CoreFun<br />
"main"</tt> to normal form and printing it.<br />
<br />
<haskell><br />
> module Main where<br />
<br />
> import System<br />
> import Monad<br />
<br />
> main :: IO ()<br />
> main = liftM head getArgs<br />
> >>= liftM removeRecursiveLet . loadCore<br />
> >>= print . flip nf (CoreFun "main")<br />
</haskell><br />
<br />
In future we hope to use a variant of this evaluator (with sharing) in<br />
a property-based testing framework. This will let us check that<br />
various program analyses and transformations that we have developed<br />
are semantics-preserving. As part of another project, we have<br />
sucessfully extended the evaluator to support various functional-logic<br />
evaluation strategies, including residuation and narrowing.<br />
<br />
<br />
== Javascript backend ==<br />
<br />
The Javascript backend is a unique feature of Yhc, something which our light weight approach makes easier.<br />
<br />
Author: Dimitry<br />
<br />
The idea to write a converter from Haskell to Javascript has been floating around for a while [[http://www.haskell.org/haskellwiki/Hajax Hajax]]. Many people expressed interest in such feature, but no practical implementation was visible.<br />
<br />
Initial goals of this subproject were:<br />
<br />
* To develop a conversion program that converts the [http://haskell.org/haskellwiki/Yhc/API/Core Yhc Core] to Javascript, thus making it possible to execute arbitrary Haskell code within a web browser<br />
<br />
* To develop an unsafe interface layer for quick access to Javascript objects with ability to wrap arbitrary Javascript code into a Haskell-callable function<br />
<br />
* To develop a typesafe interface layer on top of the unsafe interface layer for access to the Document Object Model (DOM) available to Javascript executed in a web browser<br />
<br />
* To develop or adopt an existing GUI library or toolkit working on top of the typesafe DOM layer for actual development of clientside Web applications.<br />
<br />
=== General concepts ===<br />
<br />
The Javascript backend converts a linked and optimized Yhc Core file into a piece of Javascript code to be embedded in a XHTML document. The Javascript code generator attempts to translate Core expressions to Javascript expressions one-to-one with minor optimizations on its own, taking advantage of Javascript capability to pass functions around as values.<br />
<br />
Three kinds of functions are present in the Javascript backend:<br />
<br />
* Unsafe functions that embed pieces of Javascript directly into the generated code: these functions pay no respect to types of arguments passed, and may force evaluation of their arguments if needed.<br />
<br />
* Typesafe wrappers that provide type signatures for unsafe functions. Such wrappers are either handwritten, or automatically generated from external interface specifications (such as the Document Object Model interface)<br />
<br />
* Regular library functions. These either come unmodified from the standard packages that come with Yhc, or are substituted by the Javascript backend using the Core overlay technique. An example of such a function is the <hask>toUpper</hask> function which is hooked up to the Javascript implementation supporting Unicode (the original library function currently works correctly only for the Latin1 range of characters).<br />
<br />
==== Unsafe interfaces ====<br />
<br />
The core part of unsafe interface to Javascript (or, in other words, Javascript FFI) is a pseudo-function <hask>unsafeJS</hask>.<br />
<br />
The function has a type signature: <br />
<br />
<hask><br />
foreign import primitive unsafeJS :: String -> a<br />
</hask><br />
<br />
Which means that it takes a string. Type of the return value does not matter: the function itself is never executed. Its applications are detected by the Yhc Core to Javascript conversion program at the time of Javascript generation. <br />
<br />
The unsafeJS function should be called with a string literal. Neither explicitly coded (with (:)) list of characters nor concatenation of two or more strings will work. The converter will report an error in this situation. <br />
<br />
A valid example of using unsafeJS is shown below: <br />
<br />
<haskell><br />
global_YHC'_Primitive'_primIntSignum :: Int -> Int<br />
<br />
global_YHC'_Primitive'_primIntSignum a = unsafeJS<br />
"var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;"<br />
</haskell><br />
<br />
This is a Javascript overlay (in the sense that it overlays the default Prelude definition of the <hask>signum</hask> function) of a function that returns sign of an <hask>Int</hask> value. <br />
<br />
The string literal <hask>unsafeJS</hask> is applied to is the Javascript code to be wrapped. <br />
<br />
Below is the Javascript representation of this function found in generated code. <br />
<br />
<pre><br />
strIdx["F_hy"] = "YHC.Primitive.primIntSignum";<br />
...<br />
var F_hy=new HSFun("F_hy", 1, function(a){<br />
var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;});<br />
</pre><br />
<br />
==== Typesafe wrappers ====<br />
<br />
These functions add type safety on top of unsafe interface to Javascript. Sometimes they are defined within the same module as unsafe interfaces themselves, thus avoiding the exposure of unsafe interfaces to programmers.<br />
<br />
An example of a handwritten wrapper is a function to create a new <hask>JSRef</hask> (a mechanism similar to <hask>IORef</hask>, but specific to Javascript).<br />
<br />
<haskell><br />
data JSRef a<br />
<br />
newJSRef :: a -> CPS b (JSRef a)<br />
<br />
newJSRef a = toCPE (newJSRef' a)<br />
newJSRef' a = unsafeJS "return {_val:a};"<br />
</haskell><br />
<br />
Technically, a <hask>JSRef</hask> is a Javascript object with a property named ''_val'' that holds a persistent reference to some value. On the unsafe side, invoking a constructor for such an object would be sufficient. It is however desired that:<br />
<br />
* calls to functions creating such persistent references are properly sequenced with calls to funcitons using these references, and<br />
<br />
* type of values referred to were known to the Haskell compiler.<br />
<br />
The unsafe part is implemented by the function <hask>newJSRef'</hask> which merely calls <hask>unsafeJS</hask> with proper Javascript constructor. The wrapper part <hask>newJSRef</hask> wraps the unsafe function into a CPS-style function, and is given a proper type signature, so the compiler is better informed.<br />
<br />
In some cases, such typesafe wrappers may be generated automatically, using some external interface specifications provided by third parties for their APIs.<br />
<br />
As an example of such API, the W3C DOM interface may be taken. For instance, this piece of OMG IDL:<br />
<br />
<pre><br />
interface Text : CharacterData {<br />
Text splitText(in unsigned long offset)<br />
raises(DOMException);<br />
};<br />
</pre><br />
<br />
is converted into:<br />
<br />
<haskell><br />
data TText = TText<br />
<br />
...<br />
<br />
instance CText TText<br />
<br />
instance CCharacterData TText<br />
<br />
instance CNode TText<br />
<br />
...<br />
<br />
splitText :: (CText this, CText zz) => this -> Int -> CPS c zz<br />
splitText a b = toCPE (splitText' a b)<br />
splitText' a b<br />
= unsafeJS "return((exprEval(a)).splitText(exprEval(b)));"<br />
</haskell><br />
<br />
again, giving the Haskell compiler better control over types of this function's (initially type-agnostic) arguments.<br />
<br />
=== Usage of Continuation Passing Style ===<br />
<br />
Initially it was attempted to build a monadic framework. The <hask>JS</hask> monad was designed to play the same role as the <hask>IO</hask> monad plays in "regular" Haskell programming. There were however arguments in favor of using [http://haskell.org/haskellwiki/Continuation Continuation Passing Style] (CPS):<br />
<br />
* CPS involves less overhead as each expression passes its continustion iself instead of <hask>bind</hask> which takes the expression and invokes the continuation<br />
<br />
* CPS results in Javascript patterns that are easy to detect and optimize for (this is one of the future plans).<br />
<br />
* The [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets] GUI library internals are written in CPS, so taking CPS as general approach to programming is believed to make adoption of Fudgets easier.<br />
<br />
=== Integration with DOM ===<br />
<br />
The [http://www.w3.org Web Consortium] provides [http://www.omg.org/gettingstarted/omg_idl.htm OMG IDL] files to describe the API to use with the [http://www.w3.org/DOM/ Document Object Model] (DOM). An utility was designed, based on [http://www.haskell.org/hdirect/ HaskellDirect], to parse these files and convert them to set of Haskell modules. The way interface inheritance is reflected differs from the original HaskellDirect way: in HaskellDirect this was achieved by declaration of "nested" algebraic data types, while the Javascript backend utility takes advantage of Haskell typeclasses, representing DOM types with phantom types, and declaring them instances of appropriate class(es).<br />
<br />
=== Unicode support ===<br />
<br />
Despite the fact that all modern Web browsers support Unicode, this is not the case with Javascript: no access to Unicode characters' properties is provided. In the same time it is impossible for a Haskell application running in a browser not to have access to such information. The approach used is the same as used in [http://www.haskell.org/hugs Hugs] and [http://www.haskell.org/ghc GHC]: the Unicode characters database file from [http://www.unicode.org Unicode Consortium] was converted into a set of Javascript arrays, each array entry representing a range of character code values, or a case conversion rule for a range (for this implementation, Unicode support was limited to character category, and simple case conversions). First, a range is found by character code using binary search; then character category and case conversion distances (values to add to character code to convert between upper and lower cases) are retrieved from the range entry. The whole set of arrays adds about 70 kilobytes to the web page size, if embedded inside a &lt;script&gt; tag.<br />
<br />
Using the Core overlay technique, Haskell character functions (like <hask>toUpper</hask>, <hask>isAlpha</hask>, etc.) were hooked up to the Javascript implementations supporting Unicode. This did not result in considerable slowdowns, rather, some browsers even showed minor speedup in heavy functions like <hask>read::String -> Int</hask>.<br />
<br />
=== Examples of code generation ===<br />
<br />
The examples below show some real-life functions' conversion from Haskell via Core to Javascript. It would be good to mention that the Javascript code generator changes over time as the Javascript backend evolves, so these examples really describe the situation as of the moment this article is being written.<br />
<br />
This function:<br />
<br />
<haskell><br />
fromRoman = foldr fromNumeral 0 . maxmunch . map toUpper<br />
</haskell><br />
<br />
converted to Yhc Core:<br />
<br />
<haskell><br />
Roman.fromRoman =<br />
Prelude._LAMBDA27191<br />
(Prelude._LAMBDA27191<br />
(Prelude.map Data.Char.toUpper)<br />
(Roman.maxmunch Prelude.Prelude.Num.Prelude.Int))<br />
(Prelude.foldr<br />
(Roman.fromNumeral<br />
Prelude.Prelude.Num.Prelude.Int<br />
Prelude.Prelude.Ord.Prelude.Int)<br />
0)<br />
<br />
-- The LAMBDA is similar to the composition function (.), only with<br />
-- inverted order of application: _LAMBDA27191 f g x = g (f x)<br />
<br />
Prelude._LAMBDA27191 v22167 v22166 v2007 = v22166 (v22167 v2007)<br />
</haskell><br />
<br />
converted to Javascript:<br />
<br />
<pre><br />
<br />
/* fromRoman, code was formatted manually */<br />
<br />
var F_g8=new HSFun("F_g8", 0, function(){<br />
return (F_e9)._ap([(F_e9)._ap([new HSFun("F_gz", 0, <br />
function(){<br />
return (F_gz)._ap([F_Z]);<br />
}), new HSFun("F_g9", 0, <br />
function(){<br />
return (F_g9)._ap([F_dC]);<br />
})]), new HSFun("F_gp", 0, <br />
function(){<br />
return (F_gp)._ap([new HSFun("F_g7", 0, <br />
function(){<br />
return (F_g7)._ap([F_dC, F_d1]);<br />
}), 0]);<br />
})]);<br />
});<br />
<br />
/* _LAMBDA27191 */<br />
<br />
var F_e9=new HSFun("F_e9", 3, function(_b3, _b2, _bO){return (_b2)._ap([(_b3)._ap([_bO])]);});<br />
</pre><br />
<br />
During the conversion to Javascript, all identifiers found in Yhc Core are renamed to much shorter ones consisting only of alphanumeric characters and thus surely valid for Javascript (identifiers in Yhc Core often are very long, or contain special characters, etc.)<br />
<br />
While it is really hard to understand anything from the Javascript for the <hask>fromRoman</hask> function (other than the Javascript backend already makes a good obfuscator), something may be seen in the Javascript for the composition function. It builds an application of its first argument to the third, and then the application of the second to the previous application, and returns the latter.<br />
<br />
Another example of a function whose implementation was replaced via the Overlay technique is the <hask>isSpace</hask> function:<br />
<br />
<haskell><br />
global_Data'_Char'_isSpace = f . ord<br />
where f a = unsafeJS "return uIsSpace(exprEval(a));"<br />
</haskell><br />
<br />
<haskell><br />
Data.Char.isSpace =<br />
Prelude._LAMBDA27191<br />
Data._CharNumeric.ord<br />
StdOverlay.StdOverlay.Prelude.287.f<br />
</haskell><br />
<br />
<pre><br />
var F_W=new HSFun("F_W", 0, function(){return (F_e9)._ap([F_bh, F_hk]);});<br />
</pre><br />
<br />
In the Haskell code, the <hask>global_Data'_Char'_isSpace</hask> identifier tells the Core Overlay engine that the function with qualified name <hask>Data.Char.isSpace</hask> is to be replaced with a new implementation. In Yhc Core, the previously reviewed reversed composition function can be seen which composes the <hask>ord</hask> function, and an inner function that actually invokes the Javascript function which in turn performs the Unicode properties lookup for a given character numeric code.<br />
<br />
=== Browsers compatibility ===<br />
<br />
Compatibility with major browsers such as Netscape/Mozilla/Firefox and Microsoft Internet Explorer, and also Opera was observed. Compatibility with Safari has not been reached so far.<br />
<br />
=== Future plan: Fudgets ===<br />
<br />
It is planned to port some portion of [http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ Fudgets], so it becomes possible to write Web applications using this library. Several experiments showed that the Stream Processors (SP), and some parts of Fudget Kernel layers worked within a Javascript application. More problems are expected with porting the toplevel widgets due to differences in many concepts between Web browser and X Window, for which the Fudgets library was originally developed.<br />
<br />
== Wacky features ==<br />
<br />
Yhc is going in many interesting directions. Some of these directions are likely to become very important in the future, some are likely to fade away. Yhc is a genuine research bed for brand new ideas.<br />
<br />
Author: All<br />
<br />
When you don't spend all the time on wacky type systems, you get a lot more time left to work on Wacky other stuff. Include Java interpetter, .NET back end, Javascript back end, Python interpretter, Hat debugging, yhi-stack, whole program optimisation. Lots of these things are breeding grounds for various useful technologies, and most are marching towards genuine usefulness.<br />
<br />
== The Future ==<br />
<br />
The original structure of nhc was as one big set of modules – some were broadly divided into type checking/parsing etc, but the overall structure and grouping was weaker than in other compilers – particularly GHC which has a very well defined structure. One of our first actions was to split up the code into hierarchical modules, introducing Type.*, Parse.* etc to better divide the components.<br />
<br />
Some of the sections are shared – for example both hpc and Hat share the nhc Parse modules, and could equally share the Yhc parse modules. This direction is obviously attractive – by opening up more parts of the compiler for reuse in other projects we gain in many ways:<br />
<br />
* The authors of other project benefit from having libraries to build upon, not needlessly replicating the functionality already present in Yhc.<br />
* We benefit from having more people focused and dependent on our code, resulting in a bigger community.<br />
* We get additional bug reports.<br />
* By structuring code as a library, “hacks” that before might have been tolerated quickly take on a whole new aura of ugliness.<br />
* By isolating each component, we minimize the amount of whole compiler knowledge required.<br />
<br />
This direction attracts us, and we see this as the future direction of our compiler.<br />
<br />
Interestingly, almost at the same time the GHC developers have introduced a GHC API. We do not wish to replicate the GHC API, in particular their API is more complex than ours. We also wish to design an API, and then construct some future version of Yhc around the API we develop – rather than exporting an API.<br />
<br />
For the next major Yhc version we would hope to rewrite the parser, type checker, renamer and desugarer – leaving only the new ByteCode and Core aspects still intact. This is clearly a lot of work, and we view this as a very long term plan. For the moment we hope to push the Core and ByteCode features of Yhc into new areas.<br />
<br />
In the meantime things like libary compatability and Haskell' loom more closely on horizon. Our progress towards these goals relies on the help of volunteers.<br />
<br />
== Acknowledgements ==<br />
<br />
Thanks to everyone who has submitted a patch, become a buildbot, reported bugs or done anything else to benefit the Yhc project. We've put together a list of most of the people (if we've missed you, please shout, and we'll add your name in future versions of this document!)<br />
<br />
Andrew Wilkinson, Bernie Pope, Bob Davie, Brian Alliet, Christopher Lane Hinson, Dimitry Golubovsky, Gabor Greif, Goetz Isenmann, Isaac Dupree, Kartik Vaddadi, Krasimir Angelov, Malcolm Wallace, Michal Palka, Mike Dodds, Neil Mitchell, Robert Dockins, Samuel Bronson, Stefan O'Rear, Thorkil Naur, Tom Shackell</div>Mfn