fpvectorial - Text Document Support

From Lazarus wiki
Jump to navigationJump to search

(This page is currently under construction. I can't find how to hide this page while I'm working on it, so if you stumbled across it, move along, nothing to see here :-))

FPVectorial - Document Support

Background

As the name suggests, FP Vectorial was originally created as a vector based image library. Support for creating documents was added to FP Vectorial as the existing architecture already had the concept of a Document containing Pages, and it's architecture was easily extendable. Document support was added to FP Vectorial beside the existing image handling classes. Vector based images are differentiated from office documents by the type of page added to TvVectorialDocument. At this time, two different pages can be added to TvVectorialDocument:

  • TvPageSequence: This is used by all office document writers.
  • TvPageVectorial: This is used for Vector Image writers, and is currently ignored by the office document writers.

Currently there are two office document writers.

  • TODTVectorialWriter: for producing .odt files suitable for opening in OpenOffice and LibreOffice
  • TDOCXVectorialWriter: for opening files in Microsoft Office 2007 onwards.

Instead of focusing on the requirements of each individual file format, Office Document support inside FP Vectorial was added by creating and implementing Document Class Hierarchy. It is then up to each individual reader and writer to interpret this class hierarchy.

If you want to produce an ODT file by concentrating on the File Format, then the ODT Support implemented by dgaparry and made available on the forums is recommended. This allows fine control over any resulting ODT file. <TODO: Include Forum Link, get Gary's name correct, and get his library name correct>

Roadmap

Functionality OpenDocument (ODT) Office Open XML (DOCX)
File Version ODT 1.2 with extensions ECMA-376 1st Edition (2006)
Supported Style Types Paragraph and Text-Span Paragraph and Text
Table Support Yes Yes
List Support Partially Implemented (Bullets only) Partially Implemented (Bullets only)
Multiple Headers/Footers Not yet Yes
Tables in Header/Footer Not yet Not tested
Image Support Not yet Not yet
Images in Header/Footer Not yet Not yet
FPVectorial Image Support Not yet Not yet
Tab Stops Not yet Not yet
Document Fields Not yet Not yet
Meta Data Partially Implemented Not yet
Table of contents Not yet Not yet
Footnotes Not planned Not planned
Review/Revision Not planned Not planned
Bookmarks / Hyperlinks Not planned not planned
Comments Not planned Not planned

Hello World Example

The following code will produce a "Hello World" document. <TODO:Write the code> Basic Concepts: • A single FPVectorial Document consists of a series of Page Sequences. • Each Page Sequence can have it's own Header and Footer, and it's own Page Setup (size, orientation) • Text, Tables, Images are added to the Page Sequence. In this way the document is built up. • FPVectorial has no concept of how many pages are in a document, only the number of Page Sequences. A large multipage document may only have a single Page Sequence. • A single Page Sequence can have multiple Paragraphs added. • Text is added inside Paragraphs. Paragraphs consist of multiple text runs. A single Paragraph can have a single Paragraph style. Each text run can have an optional text style applied (allowing, for instance, the bolding of individual words in a paragraph). • All Paragraph and Text Styles must be defined before being used. FP Vectorial supports style inheritance. Microsoft Office, OpenOffice and LibreOffice allow only partial styles to be defined, though each office implements its own different set of defaults for any missing properties. If it is critical that the document look identical in each of the Office Suites, then the Styles should be fully defined. • A default set of Styles can be added to the FP Vectorial Document by calling AddDefaultStyles. • Headings are Paragraph Styles, with additional properties covering Heading Level and numbering • Tables consist of a collection of rows. • Table rows consist of a collection of cells. • Tables can add optional column information. This must be provided if merged cells are being used. • Any Table Cell can support any document object, including multiple paragraphs, images and even nested Tables. • Headers and Footers are built up in an identical manner to a Page Sequence.


Code Examples

Styles

Complex Paragraph

Bullet Lists

Numbered Lists

Simple Table

Table with Merged Cells


Known Issues

• List support is currently experimental • Table support in ODT writer results in large file sizes. This is due to the fact that a style is created for each individual cell, even if multiple cells are identically formatted. This also applied to row styles and column styles. In order to resolve this, Cell, Row and Column Styles could be normalised. Alternatively, the entire table formatting architecture could be re-written to force the end user to create and apply the styles themselves (in addition to also requiring DOCX table support to be re-written to support the new architecture, specific code will need adding to DOCX writer to interpret the new FP Vectorial Table Styles). • ODT Writer produces files that cannot be opened in Word 2010. Investigate why. • File MIMETYPE in ODTDocument should not be compressed. Currently clNone compression type is ignored by TZipper (possible cause for Word 2010 rejecting existing ODT files). Patch exists, and needs investigating as to why this hasn't been added to FPC trunk.


TODO

• Comprehensive testing, including opening the files in as many word processors and office suites as possible. • DOCX and ODT Readers (volunteers required • Add PDF reader/writer (current PDF support is for Vector Image support only) • Add HTML reader/writer • Add RTF reader/writer • Produce well documented examples and store in FPVectorial/Documents • Code Documentation for distributing with Lazarus

The following needs adding to FP Vectorial, then implementing in each writer • Raster Image Support • Tab Stops • Document Fields/Meta Data (Page Count, Page Number, Date Created, Author) • Table of contents

The following needs revisiting • List support is only partially implemented in FP Vectorial, ODT Writer and DOCX Writer. The design needs finishing, and then re-implementing in the two writers The following needs implementing in ODTwriter • Support for Header/Footers and support for Tables within Headers/Footers. This requires a significant refactoring of ODT writer. Header and Footer content is actually stored in styles.xml, not in content.xml. All current text and table support can only produce content in content.xml. • Add Vectorial Image support to the writers. It seems silly using the same architecture as a powerful vector image library, then not allowing vector images to be added to the resulting office documents.