Converting Docs via XMLRPC

One of the included WSGI apps provides access to unoconv and filters in this package via XMLRPC. More specificially we provide a WSGI app that can be served by HTTP servers and will then talk to XMLRPC clients, optionally caching result docs.

Setting Up the XMLRPC App With Paste

To run the included XMLRPC doc converter WSGI app we can use Paste. The required paster script can be installed locally with:

(py27) pip install PasteScript

Then we need a PasteDeploy compatible config file like the following xmlrpc.ini:

# xmlrpc.ini
# A sample config to run WSGI XMLRPC app with paster
[app:main]
use = egg:ulif.openoffice#xmlrpcapp
cache_dir = /tmp/mycache

[server:main]
use = egg:Paste#http
host = localhost
port = 8008

In the [app:main] section we tell to serve the ulif.openoffice WSGI app xmlrpcapp. We additionally set a directory where we allow cached documents to be stored. This entry (cache_dir) is optional. Just leave it out if you do not want caching of result docs.

The [server:main] section simply tells to start an HTTP server on localhost port 8008. host can be set to any local hostname or an IP number. Set it to 0.0.0.0 to be accessible on all IPs assigned to the current machine (but read the security infos below, first!).

You now can start an XMLRPC conversion server:

(py27) $ paster serve xmlrpc.ini

and start converting real office documents via XMLRPC on the configured host and port (here: localhost:8008).

While we use the Paste HTTP server here for demonstration, you are not bound to this choice. Of course you can use any HTTP server capable of serving WSGI apps you like. This includes at least Apache and nginx (with appropriate modules loaded).

Securing the XMLRPC app (optional)

For the ulif.openoffice XMLRPC app applies the same as for the RESTful document converter in this regard. See Securing the Document Converter (optional) for details.

Converting Documents via XMLRPC

Once the server is running, we can start converting docs via XMLRPC. With standard Python xmlrpclib this is very easy:

>>> server = ServerProxy('http://localhost:8008')

The ServerProxy can be imported from xmlrpclib (Python 2.x) or from xmlrpc.client (Python 3.x).

The ulif.openoffice XML-RPC server provides the following methods:

>>> server.system.listMethods()     
['convert_locally', 'get_cached', 'system.listMethods',
 'system.methodHelp', 'system.methodSignature']

If the server is running on the same machine as the client, i.e. both components can access the same filesystem, then convert_locally() is the fastest method to convert documents via XMLRPC.

convert_locally takes as arguments a path to a source document and a dictionary of options:

>>> with open('sample.txt', 'w') as fd:
...      num = fd.write('Some Content')
>>> result = server.convert_locally('sample.txt', {})
>>> pprint(result)              
['/.../sample.html.zip',
 '78138d2003f1a87043d65c692fb3a64b_1_1',
 {'error': False, 'oocp_status': 0}]

The result consists of a result path, a cache key and a dict with metadata: (<PATH>, <CACHE_KEY>, <METADATA>).

The result path will be in a newly created directory.

Note

It is up to you to remove the result directory after usage.

Here the result is a ZIP file that includes any CSS stylesheets, images, etc. generated. You can retrieve an non-zipped version by setting options to something like:

{'oocp-out-fmt': 'html', 'meta-procord': 'oocp'}

which tells the converter to run only the core converter (no post processing, etc.) and to generate HTML output.

The cache key is None if the XMLRPC server were configured without a cache. This can be modified in xmlrpc.ini.

The metadata dict contains especially infos about errors happened during processing. You can normally ignore it, as failed conversions will be signalled by an xmlrpclib.Fault result.

To produce different results, you can pass in different options dict. In the example above we simply used the default (an empty dict), but we can also produce a PDF file:

>>> options = {'oocp-out-fmt': 'pdf', 'meta-procord': 'oocp'}
>>> result = server.convert_locally('sample.txt', options)
>>> pprint(result)             
['/.../sample.pdf',
 '78138d2003f1a87043d65c692fb3a64b_1_2',
 {'error': False, 'oocp_status': 0}]

Here we used the options oocp-out-fmt and meta-procord. The first one tells LibreOffice to produce PDF output and the latter option tells to call only the oocp processor.

See ulif.openoffice.processor for the names and options of different document processors. You can also run the commandline client:

(py27) $ oooclient --help

to get a list of all supported options. Please note, that option keys must be provided without leading dash.

Retrieving Cached Docs via XMLRPC

Beside converting new docs we can also retrieve already cached docs via XMLRPC using the get_cached() method. For this we need the cache key provided in a conversion result.

>>> result = server.get_cached('78138d2003f1a87043d65c692fb3a64b_1_2')
>>> result                      
'/.../sample.pdf'

Of course this works only, if the XMLRPC server runs on the same machine as the client but the operation is pretty fast compared to converting.

Note

The result path is located inside the cache! The result file is therefore part of the cache and should not be modified! Instead please copy the file to an outside cache location or your cache will get corrupted.