Introduction

About ulif.openoffice

ulif.openoffice is a Python package to support document conversions using LibreOffice/OpenOffice.org (LO).

It provides components to interact with a running LO-server for document conversions from office-type documents like .doc or .odt to HTML or PDF (to be extended). Using ulif.openoffice you can trigger such conversions via commandline, programmatically from Python, or via HTTP.

Furthermore, it provides a caching server that caches all documents once converted and delivers them in case a document is requested again. Depending on your needs this can speed-up things by factor 10 or more.

Finally there is also a daemon (oooctl) included that starts the LO server in background and restarts it in case of crashes.

Sources

ulif.openoffice is hosted on:

where you can get latest released versions.

Development can be tracked on github:

The documentation can be browsed on:

Requirements

ulif.openoffice requires unoconv executable to do the actual conversions. Current Debian-based distributions normally offer install of unoconv.

ulif.openoffice is tested on Debian-based systems, most notably Ubuntu. It will probably miserably fail on Windows and there are no plans to change that.

The package is designed for server-based deployments. While the LO-server is running, you cannot use the office-suite on your desktop (at least at time of writing this). This is a limitation of LO itself.

Overview

ulif.openoffice mainly provides six different components, of which four merely act as ‘frontends’ for the core functionality: a cmdline client, a RESTful WSGI application, a WSGI based XMLRPC application, and the respective API calls for use from Python programmes.

  • Additional to plain LibreOffice conversions, we provide a set of filters to modify office documents on the fly. We call these filters document processors. They can unzip incoming docs, zip results, extract CSS stylesheets from generated HTML into own files, brush up generated HTML and much more. You can always tell which filters to apply for each conversion and in what order.

    You can even register your own document processors and they will appear in the frontends (cmdline client, WSGI app, API calls).

  • An oooctl server that runs in background, starts a local LO-server and monitors its status. If the LO server process dies, it is restarted by oooctl.

  • An oooclient commandline tool to trigger conversions.

    oooclient also supports use of a cache manager that caches already converted documents and delivers them in case the converted version exists already.

  • A DocumentConverter WSGI application that acts as a REST server. You can send it documents via HTTP and will get the converted documents back.

    The DocumentConverter also supports use of a cache manager that caches already converted documents and delivers them in case the converted version exists already.

  • A WSGIXMLRPC application that also acts as a WSGI application but provides XMLRPC services. You can use it for instance via the standard Python xmlrpclib library.

  • A Python API to perform all the conversion stuff in your own Python programmes.

The components play together roughly as shown in the following figure:

_images/overview.png

Fig. 1: Overview of ulif.openoffice components

The black arrows show the way from a source document (in .doc format) to the LibreOffice server and the way back of the converted document (PDF).

Use of client-API, oooctl server and cache is optional.

The LibreOffice server can run on a remote machine.