Difference between revisions of "shared library"

From Lazarus wiki
Jump to navigationJump to search
(Added explanation of what shared library is; some grammar/spelling/text cleanup.)
Line 1: Line 1:
 
{{shared library}}
 
{{shared library}}
  
(here if I say shared library I mean both .so as .dll, unless I really say "unix shared library" or "dll")
+
A shared library is a compiled piece of code that can be shared and used by various programs. It provides functions and procedures that other programs can call.
 +
It is different from a static library (that is linked into an executable and becomes part of it) or an executable.
 +
When I say shared library here I mean both Linux .so and Windows .dll, unless I explicitly specify it, like "unix shared library" or "dll".
  
Currently there are a lot of Shared libraries bug reports. However, to my knowledge there is no real documentation about how shared libraries work in combination with FPC. That is both how shared libraries work now, how they should work, and how Delphi treats them.
+
Currently there are a lot of Shared libraries bug reports. However, to my knowledge there is no real documentation about how shared libraries work in combination with FPC. That is: how shared libraries work now, how they should work, as well as how Delphi treats them.
  
Let's start with a simple sketch of what forms Delphi supports, because we will of course try to be as compatible as multiplatformwise reasonable. Note that this is all from memory and webpages. If you can detail it more, or specify more exact borderconditions, by all means, do).
+
Let's start with a simple sketch of what forms Delphi supports, because we will of course try to be as compatible as reasonable (limited by the requirement to work on multiple platforms).
 +
 
 +
This article does not show Kylix details; please feel free to add Kylix details if that is still deemed relevant.
  
I'm also totally unaware of Kylix details, so if you know how Kylix implements a certain feature by all means add it :-)
 
  
 
== Delphi ==
 
== Delphi ==
Line 16: Line 19:
 
#* This means the RTL will be linked into the shared library
 
#* This means the RTL will be linked into the shared library
 
#* This also means the memory manager will be its own island. Using automated types in functions that communicate with it is not possible.
 
#* This also means the memory manager will be its own island. Using automated types in functions that communicate with it is not possible.
#* Classes use is not possible. (both program and shared library have own copies of the VMT, which breaks e.g. IS and AS operators)
+
#* Use of Classes is not possible: both program and shared library have their own copies of the VMT, which breaks e.g. ''is'' and ''as'' operators
# create a standalone shared library. (compiling a library unit without "runtime packages" selected), but while using unit '''sharemem'''
+
# create a standalone shared library. (compiling a library unit without "runtime packages" selected), but while using unit '''sharemem''')
#* This means the RTL will be linked into the shared library
+
#* Comparable with previous case but the memory manager is switched to COM compatible. This means other components/programs that also switch their memory manager to '''sharemem''' can call functions that use automated types, because it doesn't matter who returns the block to the COM memmanagement system.
#* But the memory manager is switched to COM compatible. This means other components/programs that also switch their memory manager to '''sharemem''' can call functions that use automated types. (because it doesn't matter who returns the block to the COM memmanagement system).
+
#* Note that AFAIK the COM memory manager is quite slow. This road is probably not desired unless you really want to mess with COM, or your componentization is more important than speed.
#* Note that afaik the COM memory manager is quite slow. This road is probably not desired unless you really want to mess with COM, or your componentization is more important than speed.
+
# [[packages|Library packages]] These are shared libraries for which all dependancies on the Pascal level are known (which units they contain and depend on), and can be treated as parts of the main program that reside in a DLL.
#* Classes use is not possible. (both program and DLL have own copies of the VMT)
+
#* The RTL is in a separate package (DLL), and both the main program and package use it. Therefore there is only one copy of any unit including system.
# [[packages|Library packages]] These are shared libraries for which all dependancies on pascal level are known (which units they contain and depend on), and can be treated as parts of the main program that reside in a DLL.
+
#* This also means there is only one memory manager, at least for the main program and the packages it uses. IOW shared libs that are not a package can still use an own RTL and memory manager.
#* The RTL is in a separate package(DLL), and both the mainprogram and package use it. Therefore there is only one copy of any unit including system.
+
#* A package can only depend on units in its own package or in other packages.  (separate compilation requirement)
#* This also means there is only one memory manager (at least for the mainprogram and the packages it uses. IOW shared libs that are not a package can still use an own RTL and memory manager)
+
#* Because no units are duplicated, there is only one copy of each VMT, making classes use transparent.
#* a package can only depend on units in its own package or in other packages.  (separate compilation requirement)
+
#* Probably packages can also switch to sharemem, making it compatible with other systems using sharemem. Some way must be found to initialise sharemem as early as possible though (does Delphi do this? Possible test for this is to pass an ansistring created in a init section of a unit in a package to a different sharemem using shared lib that is not a package) '''to be determined'''
#* because no units are duplicated, there is only one copy of each VMT, making classes use transparent.
 
#* Probably packages can also switch to sharemem, making it compatible with other sharemem using systems. Some way must be found to initialise sharemem as early as possible though (does Delphi do this? Possible test for this is to pass an ansistring created in a init section of a unit in a package to a different sharemem using shared lib that is not a package) T.B.D.
 
  
Besides these, Delphi can also generate dlls that are ActiveX components. T.B.D.
+
Besides these, Delphi can also generate DLLs that are ActiveX components. '''to be described/determined'''
  
Topics about library packages are mostly moved from this page to the separate [[packages]] lemma, to not confuse the discussion
+
Topics about library packages are mostly moved from this page to the separate [[packages]] article, to avoid confusing the discussion.
  
 
== Linker namespaces ==  
 
== Linker namespaces ==  
Line 44: Line 45:
 
* http://people.freebsd.org/~deischen/symver/freebsd_versioning.txt
 
* http://people.freebsd.org/~deischen/symver/freebsd_versioning.txt
  
Note that while the last url is about FreeBSD, it mentions Linux doing the same.
+
Note that while the last URL is about FreeBSD, it mentions Linux doing the same.
  
 
== Sharemem implementation details ==  
 
== Sharemem implementation details ==  
  
As said, sharemem switches the memory manager to a global one. Under Windows this is afaik the COM manager. On *nix a similar memmanager doesn't always exist ( Gnome and KDE component architectures might have something similar), but it is not guaranteed there.  
+
As said, sharemem switches the memory manager to a global one. Under Windows this is AFAIK the COM manager. On *nix a similar memmanager doesn't always exist (Gnome and KDE component architectures might have something similar), so this is not guaranteed.  
  
 
One could probably simply have all programs use cmem, which could be a "level 0" implementation for everything that runs in this process.
 
One could probably simply have all programs use cmem, which could be a "level 0" implementation for everything that runs in this process.
  
I explictely name this because, it might be necessary to impose an own initialisation order (independant of OS shared lib initialisation) to allow the main program to initialise units right after the RTL (system unit) initializations, but before other libs. e.g. a different memory manager.
+
I explicitely mention this because it might be necessary to impose an own initialisation order (independant of OS shared lib initialisation) to allow the main program to initialise units right after the RTL (system unit) initializations, but before other libs, e.g. a different memory manager.
  
 
== VMT duplication ==
 
== VMT duplication ==
  
The basic problem of VMT duplication is mostly support for the IS parameter and simular methods of tobject and tclass like inheritsfrom). This is only solved for packages, IOW for the other two types named above, the IS operator doesn't work across libraries/binaries.
+
The basic problem of VMT duplication is mostly support for the ''is'' parameter and similar methods of ''TObject'' and ''TClass'' like ''inheritsfrom''. This is only solved for packages, IOW for the other two types named above, the ''is'' operator doesn't work across libraries/binaries.
  
In earlier discussions about packages, there was some confusion about this topic. People assumed that packages would somehow tap into the RTL VMT's in the main binary. I'm however pretty sure this is not the case, at least not in Delphi, since when using packages, the RTL always is a package too. And every dependancy of an unit in a package must be in the same package or in a package it has a dependancy on. This means an unit (and thus, the VMTs declared in it) only exists once in the greater program (main program + its packages)
+
In earlier discussions about packages, there was some confusion about this topic. People assumed that packages would somehow tap into the RTL VMTs in the main binary. I'm however pretty sure this is not the case, at least not in Delphi, since if you use packages, the RTL always is a package, too. Also, every dependency of a unit in a package must be in the same package or in a package it has a dependency on. This means a unit (and thus, the VMTs declared in it) only exists once in the greater program (main program + its packages)
  
 
== Initialization and finalization sections and RTL ==
 
== Initialization and finalization sections and RTL ==
  
(the initialization order of units in packages is moved to the packages page).
+
(The initialization order of units in packages is moved to the packages page).
  
 
The problem with libraries is usually that libraries are initialized as a whole. This can lead to problems with plugin units modifying RTL behaviour (cwstrings, cmem, FV drivers etc)
 
The problem with libraries is usually that libraries are initialized as a whole. This can lead to problems with plugin units modifying RTL behaviour (cwstrings, cmem, FV drivers etc)
Line 80: Line 81:
 
See http://www.freepascal.org/contrib/delete.php3?ID=543 for an example of sharing memory between exe and dll without using CMEM. Powtils CGI library also uses this trick for dynpwu.pas so that ansistrings can be used without any sharemem or cmem unit. The memory manager is exported from the executable or a single library and shared with all the other DLL/EXE's that are connecting to the single module.
 
See http://www.freepascal.org/contrib/delete.php3?ID=543 for an example of sharing memory between exe and dll without using CMEM. Powtils CGI library also uses this trick for dynpwu.pas so that ansistrings can be used without any sharemem or cmem unit. The memory manager is exported from the executable or a single library and shared with all the other DLL/EXE's that are connecting to the single module.
  
<b>Marco's answer:</b> First, library packages have more features than just shared memory managers. Unique VMTs (try to use the IS operator in your example on a class created in the other lib/exe) and no need to handcraft proper initialization. One just groups units into libs, and the compiler does the rest. No special code required. So what must be done for (Library-) packages is different (and non competing) to the more general shared library case. And that is what this page is about.
+
<b>Marco's answer:</b> First, library packages have more features than just shared memory managers. Unique VMTs (try to use the ''is'' operator in your example on a class created in the other lib/exe) and no need to handcraft proper initialization. One just groups units into libs, and the compiler does the rest. No special code required. So what must be done for (library) packages is different (and non competing) to the more general shared library case. And that is what this page is about.
  
From this general "shared library" case, a subdivision can be separated that have a shared memorymgr.  
+
From this general "shared library" case, a subdivision can be separated that has a shared memory manager.  
  
 
Note that there can be a bunch of memory managers used for this, and multiple units that work like sharemem. Nearly any memory manager applies. However it is logical to reserve the "sharemem" unit name for the memory manager that is the most compatible for inter-process work on the given platform (COM on Windows, cmem on Unix). So, I'm talking about the default case here, which doesn't mean that other people can't roll their own.
 
Note that there can be a bunch of memory managers used for this, and multiple units that work like sharemem. Nearly any memory manager applies. However it is logical to reserve the "sharemem" unit name for the memory manager that is the most compatible for inter-process work on the given platform (COM on Windows, cmem on Unix). So, I'm talking about the default case here, which doesn't mean that other people can't roll their own.
  
Also, in-compiler(rtl) support must be as universal as possible, and also work for libraries that are not loadlibrary-ed, but autoloaded at binary startup, and not require handcoding e.g. initialization. Also it mustn't stand in the way of handcoded solutions (like the one you reference), in other words it must be overridable, since otherwise in-compiler(rtl) support would stand in the way of what people build on top of FPC.
+
Also, in-compiler (RTL) support must be as universal as possible, and also work for libraries that are not ''loadlibrary''-ed, but autoloaded at binary startup, and not require handcoding e.g. initialization. Also it mustn't stand in the way of handcoded solutions (like the one you reference), in other words it must be overridable, since otherwise in-compiler (RTL) support would stand in the way of what people build on top of FPC.
  
It would be useful though, if sb investigate if it is possible to call exported symbols from the binary from an autoloaded library (for the Tier 1 targets, Linux/FreeBSD - Windows - OS X). This because then a library could maybe indeed plug in the mother binary.
+
It would be useful though, if somebody investigate if it is possible to call exported symbols from the binary from an autoloaded library (for the Tier 1 targets, Linux/FreeBSD - Windows - OS X). This because then a library could maybe indeed plug in the mother binary.
  
 
In short the first objective is to simply have a default scheme that works like Delphi's sharemem to put in the "sharemem" unit.
 
In short the first objective is to simply have a default scheme that works like Delphi's sharemem to put in the "sharemem" unit.
  
<b> Marco, added later after some new insights</b>
+
<b>Marco, added later after some new insights</b>
  
 
Moreover a difference is also if the units in the dll are accessed directly, or only over a separately (additionally) defined interface.
 
Moreover a difference is also if the units in the dll are accessed directly, or only over a separately (additionally) defined interface.
  
<b>Lars says: </b> Well another idea: instead of exporting the memory manager from the exe I was thinking about making a fpsharemem.dll that exports one single common fpc memory manager on each platform for all the programs and dll's to use, instead of using CMEM. i.e. all units put fpsharemem.dll in their uses clause.
+
<b>Lars says: </b> Well another idea: instead of exporting the memory manager from the exe I was thinking about making a fpsharemem.dll that exports one single common FPC memory manager on each platform for all the programs and DLLs to use, instead of using CMEM. i.e. all units put fpsharemem.dll in their uses clause.
In fact, that's actually what the demo does but instead of a separate fpsharemem.dll it is just inside that single dll included with the demo and all the other code. i.e. why use CMEM if we can use freepascal memory manager? The freepascal mem manager is available on all platforms isn't it, as long as there is a fp dll is created for each platform?   
+
In fact, that's actually what the demo does but instead of a separate fpsharemem.dll it is just inside that single DLL included with the demo and all the other code. i.e. why use CMEM if we can use freepascal memory manager? The freepascal mem manager is available on all platforms isn't it, as long as there is a FreePascal DLL that is created for each platform?   
  
I guess to answer my own questions it must have to do with COM/IPC (inter process communication) issues/compatibilities. Well anyway, still a very interesting discussion. Doing it by hand the way I did was just a demo to show the concept in action and to prove to people that ansistrings and automated types can be used in regular old DLL's.
+
I guess to answer my own questions it must have to do with COM/IPC (inter process communication) issues/compatibilities. Well anyway, still a very interesting discussion. Doing it by hand the way I did was just a demo to show the concept in action and to prove to people that ansistrings and automated types can be used in regular old DLLs.
  
<b>Marco:</b> If you have to make one mandatory, make it cmem, because otherwise you might not be able to interface with some shared libs. If you can somehow make it user selectable it would be better. But keep in mind that it is already hard to pack two precompiled sets of units into a release (increasing to 200MB installed size or so), let alone more.
+
<b>Marco:</b> If you have to make one (thing? memory manager?) mandatory, make it cmem, because otherwise you might not be able to interface with some shared libs. If you can somehow make it user selectable it would be better. But keep in mind that it is already hard to pack two precompiled sets of units into a release (increasing to 200MB installed size or so), let alone more.
  
Roughly this scheme is what sharemem does, communication over COM/IPC to make sure that a second instance of a library doesn't instantiate its state twice. It avoids the DLL with the memmanager itself by using COM as memory manager. (something that wouldn't work on Unix, since there is no such global memory manager).
+
Roughly this scheme is what sharemem does, communication over COM/IPC to make sure that a second instance of a library doesn't instantiate its state twice. It avoids the DLL with the memmanager itself by using COM as memory manager. (Something that wouldn't work on Unix, since there is no such global memory manager).
  
I've thought about this in the past too, but once you start adding a few more things (like IS remaining working, and something end-user supportable for people that can't write an header to a DLL), you end up with effectively packages again. It is really not that much more, runtime, just a few table to govern proper unit initialization.
+
I've thought about this in the past too, but once you start adding a few more things (like ''is'' operator support, and something end-user supportable for people that can't write a header to a DLL), you end up with effectively packages again. It is really not that much more, runtime, just a few tables to govern proper unit initialization.
  
 
== References ==
 
== References ==

Revision as of 12:45, 1 April 2012

Deutsch (de) English (en) español (es) 한국어 (ko) русский (ru)

A shared library is a compiled piece of code that can be shared and used by various programs. It provides functions and procedures that other programs can call. It is different from a static library (that is linked into an executable and becomes part of it) or an executable. When I say shared library here I mean both Linux .so and Windows .dll, unless I explicitly specify it, like "unix shared library" or "dll".

Currently there are a lot of Shared libraries bug reports. However, to my knowledge there is no real documentation about how shared libraries work in combination with FPC. That is: how shared libraries work now, how they should work, as well as how Delphi treats them.

Let's start with a simple sketch of what forms Delphi supports, because we will of course try to be as compatible as reasonable (limited by the requirement to work on multiple platforms).

This article does not show Kylix details; please feel free to add Kylix details if that is still deemed relevant.


Delphi

Delphi to my knowledge knows three (or four) ways of dynamic linking.

  1. create a standalone shared library. (compiling a library unit without "runtime packages" selected).
    • This means the RTL will be linked into the shared library
    • This also means the memory manager will be its own island. Using automated types in functions that communicate with it is not possible.
    • Use of Classes is not possible: both program and shared library have their own copies of the VMT, which breaks e.g. is and as operators
  2. create a standalone shared library. (compiling a library unit without "runtime packages" selected), but while using unit sharemem)
    • Comparable with previous case but the memory manager is switched to COM compatible. This means other components/programs that also switch their memory manager to sharemem can call functions that use automated types, because it doesn't matter who returns the block to the COM memmanagement system.
    • Note that AFAIK the COM memory manager is quite slow. This road is probably not desired unless you really want to mess with COM, or your componentization is more important than speed.
  3. Library packages These are shared libraries for which all dependancies on the Pascal level are known (which units they contain and depend on), and can be treated as parts of the main program that reside in a DLL.
    • The RTL is in a separate package (DLL), and both the main program and package use it. Therefore there is only one copy of any unit including system.
    • This also means there is only one memory manager, at least for the main program and the packages it uses. IOW shared libs that are not a package can still use an own RTL and memory manager.
    • A package can only depend on units in its own package or in other packages. (separate compilation requirement)
    • Because no units are duplicated, there is only one copy of each VMT, making classes use transparent.
    • Probably packages can also switch to sharemem, making it compatible with other systems using sharemem. Some way must be found to initialise sharemem as early as possible though (does Delphi do this? Possible test for this is to pass an ansistring created in a init section of a unit in a package to a different sharemem using shared lib that is not a package) to be determined

Besides these, Delphi can also generate DLLs that are ActiveX components. to be described/determined

Topics about library packages are mostly moved from this page to the separate packages article, to avoid confusing the discussion.

Linker namespaces

One of the big problems with porting dynamic libraries from Windows (e.g. DLLs) to Linux and FreeBSD, is the fact that Windows and OS X have a linker namespace per module (as in, per shared lib or binary), with export tables carefully governing exported symbols, and Linux and FreeBSD only have one single linker namespace.

Specially in multi-language projects this is a problem.

I got some feedback on this from a FreeBSD hacker, who recommended to look into ELF visibility attribute, and/or API versioning:

Note that while the last URL is about FreeBSD, it mentions Linux doing the same.

Sharemem implementation details

As said, sharemem switches the memory manager to a global one. Under Windows this is AFAIK the COM manager. On *nix a similar memmanager doesn't always exist (Gnome and KDE component architectures might have something similar), so this is not guaranteed.

One could probably simply have all programs use cmem, which could be a "level 0" implementation for everything that runs in this process.

I explicitely mention this because it might be necessary to impose an own initialisation order (independant of OS shared lib initialisation) to allow the main program to initialise units right after the RTL (system unit) initializations, but before other libs, e.g. a different memory manager.

VMT duplication

The basic problem of VMT duplication is mostly support for the is parameter and similar methods of TObject and TClass like inheritsfrom. This is only solved for packages, IOW for the other two types named above, the is operator doesn't work across libraries/binaries.

In earlier discussions about packages, there was some confusion about this topic. People assumed that packages would somehow tap into the RTL VMTs in the main binary. I'm however pretty sure this is not the case, at least not in Delphi, since if you use packages, the RTL always is a package, too. Also, every dependency of a unit in a package must be in the same package or in a package it has a dependency on. This means a unit (and thus, the VMTs declared in it) only exists once in the greater program (main program + its packages)

Initialization and finalization sections and RTL

(The initialization order of units in packages is moved to the packages page).

The problem with libraries is usually that libraries are initialized as a whole. This can lead to problems with plugin units modifying RTL behaviour (cwstrings, cmem, FV drivers etc)

For stand-alone libraries with internal copies of RTL to function, the RTL needs to be initialized, and the initialization section of the library needs to be called as well.

Roughly there are two options here:

  • Use (e.g. on ELF platforms) the ABI .init and .fini sections or similar constructs in other binary formats.
  • Leave the above initializers mostly empty sections, and enforce an own order using a set of own initializers and finalizers. The ELF init and fini sections merely register the real initializers.

Mixed forms are also possible, e.g. initialize standalone shared libraries via the initializer sections, but with packages use an own definition.

Shared Exe Memory manager

Lars says: regarding a single memory manager for BPL style packages: what about using the executable and exporting its memory manager? I have had problems trying to use CMEM but have successfully shared the same memory manager using SetMemoryManager/GetMemoryManager tricks.

See http://www.freepascal.org/contrib/delete.php3?ID=543 for an example of sharing memory between exe and dll without using CMEM. Powtils CGI library also uses this trick for dynpwu.pas so that ansistrings can be used without any sharemem or cmem unit. The memory manager is exported from the executable or a single library and shared with all the other DLL/EXE's that are connecting to the single module.

Marco's answer: First, library packages have more features than just shared memory managers. Unique VMTs (try to use the is operator in your example on a class created in the other lib/exe) and no need to handcraft proper initialization. One just groups units into libs, and the compiler does the rest. No special code required. So what must be done for (library) packages is different (and non competing) to the more general shared library case. And that is what this page is about.

From this general "shared library" case, a subdivision can be separated that has a shared memory manager.

Note that there can be a bunch of memory managers used for this, and multiple units that work like sharemem. Nearly any memory manager applies. However it is logical to reserve the "sharemem" unit name for the memory manager that is the most compatible for inter-process work on the given platform (COM on Windows, cmem on Unix). So, I'm talking about the default case here, which doesn't mean that other people can't roll their own.

Also, in-compiler (RTL) support must be as universal as possible, and also work for libraries that are not loadlibrary-ed, but autoloaded at binary startup, and not require handcoding e.g. initialization. Also it mustn't stand in the way of handcoded solutions (like the one you reference), in other words it must be overridable, since otherwise in-compiler (RTL) support would stand in the way of what people build on top of FPC.

It would be useful though, if somebody investigate if it is possible to call exported symbols from the binary from an autoloaded library (for the Tier 1 targets, Linux/FreeBSD - Windows - OS X). This because then a library could maybe indeed plug in the mother binary.

In short the first objective is to simply have a default scheme that works like Delphi's sharemem to put in the "sharemem" unit.

Marco, added later after some new insights

Moreover a difference is also if the units in the dll are accessed directly, or only over a separately (additionally) defined interface.

Lars says: Well another idea: instead of exporting the memory manager from the exe I was thinking about making a fpsharemem.dll that exports one single common FPC memory manager on each platform for all the programs and DLLs to use, instead of using CMEM. i.e. all units put fpsharemem.dll in their uses clause. In fact, that's actually what the demo does but instead of a separate fpsharemem.dll it is just inside that single DLL included with the demo and all the other code. i.e. why use CMEM if we can use freepascal memory manager? The freepascal mem manager is available on all platforms isn't it, as long as there is a FreePascal DLL that is created for each platform?

I guess to answer my own questions it must have to do with COM/IPC (inter process communication) issues/compatibilities. Well anyway, still a very interesting discussion. Doing it by hand the way I did was just a demo to show the concept in action and to prove to people that ansistrings and automated types can be used in regular old DLLs.

Marco: If you have to make one (thing? memory manager?) mandatory, make it cmem, because otherwise you might not be able to interface with some shared libs. If you can somehow make it user selectable it would be better. But keep in mind that it is already hard to pack two precompiled sets of units into a release (increasing to 200MB installed size or so), let alone more.

Roughly this scheme is what sharemem does, communication over COM/IPC to make sure that a second instance of a library doesn't instantiate its state twice. It avoids the DLL with the memmanager itself by using COM as memory manager. (Something that wouldn't work on Unix, since there is no such global memory manager).

I've thought about this in the past too, but once you start adding a few more things (like is operator support, and something end-user supportable for people that can't write a header to a DLL), you end up with effectively packages again. It is really not that much more, runtime, just a few tables to govern proper unit initialization.

References