Annotate Server Installation Guide - Part 2: optional modules.

This chapter describes how to install the optional modules for Annotate, and should be read after completing the basic install (with PDF support). Note that installing these modules is more complex than installing the basic Annotate server. You can install all, none or just a subset of these modules, depending on your requirements.

1. File conversion with Aspose

1.1 Download the Aspose module from this link

1.2 Create a new 'aspose' user

% adduser aspose

1.3 Create a new directory /var/lib/opus2/jetty-aspose owned by the aspose user

% mkdir -p /var/lib/opus2/jetty-aspose
% chown aspose:aspose -R /var/lib/opus2

1.4 Unzip the Aspose module on /var/lib/opus2/jetty-aspose, change ownership to the aspose user and copy the init script to add the new systemd service and start it:

% chown aspose:aspose /var/lib/opus2/jetty-aspose/jetty-aspose-prod-1.3.zip
% cp init/jetty-aspose.service /lib/systemd/system
% systemctl enable jetty-aspose.service
% systemctl start jetty-aspose.service

1.5 To check it started correctly run:

% journalctl -f -u jetty-aspose.service

1.6 Copy 'scripts/oojettyconv.sh' into your scripts folder in the Annotate install dir

1.7 Edit the phpconfig.inc file and set "$ooshcommand" to point at the "oojettyconv.sh" script

2. Enabling the Apache user to run programs

Typically PHP scripts are run as the apache user - which is a restricted account with no home directory. On Ubuntu linux, the default apache user is set up as www-data; one way to find our which user apache runs as on your system is to type ps aux | grep httpd and see who owns the cluster of 'httpd' processes.

If you want to be able to run openoffice or firefox as the apache user, (to support uploading Word documents via Openoffice) you will need to create a home directory where these applications store their profile settings.

# As root: first check if the apache user
# already has a home directory that it can write to:
#   N.B. if apache runs as a different user
#   e.g. 'www-data' you should replace 'apache' with 'www-data'
#   throughout this guide, e.g.
#     su www-data
#   
  % su apache
  $ cd
  $ touch tmp.tmp

# If this works, and the apache home directory is writable by
# the apache user, you can skip the rest of this section.
# If this fails, you need to set up a home directory
# for the apache user - the example below sets it
# to '/var/www/ahome':-

  % cd /var/www
  % mkdir ahome
  % chown apache ahome
  % chgrp apache ahome

# Enable home directory for apache:
  % vi /etc/passwd

# edit the entry for the apache user to allow logins
# and set the home dir, e.g.:
  apache:x:48:48:Apache:/var/www/ahome:/bin/bash

# Check you can su to the apache user now:
  % su apache
                    

At this point you should also change the settings in the configuration file in annotate/scripts/bashconfig.inc, which is included by the various scripts for running openoffice and firefox. A sample is provided in bashconfig-sample.inc which you need to copy to bashconfig.inc and then edit:

  % su annotate
  $ cd /var/www/html/annotate/scripts
  $ cp bashconfig-sample.inc bashconfig.inc
  $ vi bashconfig.inc

# as the annotate user, edit the settings in annotate/scripts/bashconfig.inc:

  APACHE_USER=apache
  APACHE_HOME=/var/www/ahome
                    

3. File conversion with OpenOffice

Install openoffice 4.1.4: www.openoffice.org. It will install itself to somewhere like /opt/openoffice4/program.

Many linux distributions only offer older versions of openoffice via their standard install mechanism, so it is worth downloading directly from the openoffice download page. To fetch the openoffice 4.1.4. binary directly onto the server, you can follow the steps below:-

# go to: - http://download.openoffice.org/other.html
# right-click on the Download link for the version you want (e.g, English-US, Linux RPM)
# and 'copy link location' - it is a HTTP redirect to a download link.
# Use this as the argument to curl to fetch the soffice.tgz as root on server:-

# Login as root...
  % cd /mnt/install/downloads          # or your chosen download directory
  % curl -L "https://sourceforge.net/projects/openofficeorg.mirror/files/4.1.4/binaries/en-US/Apache_OpenOffice_4.1.4_Linux_x86_install-rpm_en-US.tar.gz/download" -o soffice.tgz

# it's about 150Mb, so might take a while to download...

  % tar xvfz soffice.tgz
  % cd OOO300_m9_native_packed-1_en-US.9358/RPMS
  % rpm -Uvih *.rpm

# - you may need to install gnome-vfs2 (for fedora the package is 'yum install gnome-vfs2')
# The default install location is:
#   /opt/openoffice4/program/soffice
                    

3.1 Optional: installing openoffice in a non-standard location

If you want to install openoffice in another directory rather than on top of any existing installation, you can use the steps described on: wiki.services.openoffice.org/wiki/Run_OOo_versions_parallel [external]. You can also use this if you do not usually use the RPM package system (e.g. on debian / ubuntu).

# ==== Optional ====
# e.g. in your home directory:
  % sudo apt-get install rpm
  % mkdir oo 
  % cd oo
  % curl -L "https://sourceforge.net/projects/openofficeorg.mirror/files/4.1.4/binaries/en-US/Apache_OpenOffice_4.1.4_Linux_x86_install-rpm_en-US.tar.gz/download" -o soffice.tgz
  % mkdir TEMP
  % cd TEMP 
  % tar xvfz ../soffice.tgz
  % cd OOO300_m9_native_packed-1_en-US.9358/RPMS/
  % mkdir TEMP_ROOT
  % cd TEMP_ROOT
# extract the RPMs...  will make an opt/ subdir
  % for i in ../o*.rpm; do rpm2cpio $i | cpio -id; done

  % mv opt ~    # or where you want the installed version
# run it using (e.g.) 
  % ~/opt/openoffice4/program/soffice &
                    

3.2 Testing your openoffice installation:

# Check you can run openoffice as a normal user:
  % su annotate
  $ vi .bashrc
  export PATH=/opt/openoffice4/program:$PATH
  $ source .bashrc

# On your local machine, set xhost+ and check your firewall
# can accept X connections on the normal TCP port (6000)
  $ export DISPLAY={your ip}:0.0
  $ soffice &

# on Ubuntu, the executable could be called 'ooffice' not 'soffice'
                    

3.3 Configuring Annotate to use openoffice:

There is a test document in annotate/scripts which you can try, but first you need to check the settings in annotate/scripts/bashconfig.inc (a sample is provided in bashconfig-sample.inc):

# As the annotate user, check the paths in the config file: scripts/bashconfig.inc:
  % su annotate
  $ cd /var/www/html/annotate/scripts          # ... where you installed annotate
  $ cp bashconfig-sample.inc bashconfig.inc    # ... if bashconfig.inc not present
  $ vi bashconfig.inc
  OOPATH=/opt/openoffice4/program
  OOEXE=/opt/openoffice4/program/soffice
  OOEXENAME=soffice.bin
  OOPYTHON=/opt/openoffice4/program/python

# If any of these are not correct, edit the bashconfig.inc file.
# The OOPYTHON setting points to the version of python which is
# bundled with the installation of openoffice from openoffice.org.
# For Ubuntu, you can change this to your standard python install
#  (e.g. /usr/bin/python).  You will need the python-uno package  
#  installed for calling openoffice.


# The following command should convert 'sample.doc' to '/tmp/sample.pdf'
  $ ./ooconv.sh sample.doc /tmp/sample.pdf
                    

3.4 Running openoffice in server mode

The test conversion above started openoffice, converted a document, then killed the openoffice process. This can take a few seconds for each document. You can avoid the openoffice startup time by running it in server mode, listening to a socket for incoming documents. This also has the advantage that you can run openoffice as a separate user from the Apache one (e.g. you could create a new user just to run openoffice).

# As the user you want to run openoffice as:
# (as root)
  % adduser openoffice

  % su openoffice

# (as the 'openoffice' user')
  $ cd /var/www/html/annotate/scripts
  $ ./oocron.sh                 # this starts up openoffice

# Check that the 'soffice.bin' process is running:
  $ ps aux | grep soffice

# Try converting a test file again a couple of times, running
# as the apache user:
# (as root)
  % su apache

  $ ./ooconv.sh sample.doc /tmp/test5.pdf
  $ ./ooconv.sh sample.doc /tmp/test6.pdf

# All being well, the second time should have been much
# faster, as you avoid the startup time of openoffice.

# You need to keep the openoffice process alive all the time
# e.g. using a cron job, as your chosen openoffice user, adding an
# entry like:
# as root...
  % su openoffice
  $ crontab -e
* * * * * bash /var/www/html/annotate/scripts/oocron.sh >/dev/null 2>&1
                    

While openoffice is running in server mode, the conversions from office formats should be much faster.

3.5 Troubleshooting OpenOffice installs on Fedora, RedHat and CentOS

If you are installing on RedHat, Fedora or CentOS, check this blog post [external] for a solution to a known bug with the yum installation system for openoffice, which can break the openoffice install if automatic updates are switched on.

If the CRON job above is not starting the openoffice process properly, then check /var/log/cron for messages; if you see entries like 'Error: PAM Access Problems', then you may need to explicitly enable the cron daemon to run tasks as the openoffice user, with a line in /etc/security/access.conf.

3.6 Updating the php/phpconfig.inc file to enable openoffice support

To enable support for the office formats when you upload a document to your annotate server, edit your php/phpconfig.inc file as follows:

# Edit the setting in php/phpconfig.inc to point to the ooconv.sh script:
  % su annotate
  $ cd /var/www/html/annotate/php        # ... or your install directory
  $ vi phpconfig.inc
  $ooshcommand="/bin/bash /var/www/html/annotate/scripts/ooconv.sh";

# Test it out by uploading a short Word / openoffice file on your
# documents.php page.
                    

3.7 Using openoffice to convert uploaded images to PDF [new Dec 2009]

You can configure openoffice to convert uploaded image files to PDF and then use the same annotation interface as text documents (by default, image files are shown using the HTML annotation interface, in a separate frame). To set this up, add the line below to your phpconfig.inc file:

// Optional: Uncomment to convert uploaded images to pdf using OO 
  $convertUploadedImagesToPDF = 1;
                    

3.8 Installing Windows Fonts for OpenOffice on Linux

By default, an openoffice installation on Linux will not have access to the standard Windows fonts (Arial, Verdana etc), which can cause problems with the Word to PDF conversion for documents created on a Microsoft operating system. Unlike PDF files, Word files do not include the fonts they depend on, and assume the recipient has the relevant fonts installed. However, it is possible to install the Windows standard fonts on Linux which greatly improves the quality of generated PDFs from Word files.

# Install microsoft truetype fonts on Ubuntu / debian:
  % sudo apt-get install msttcorefonts
                    

This (external) blog entry has details on installing truetype fonts on Linux; another blog entry has notes on using the new MS Vista fonts on Linux.The basic steps for installing TrueType fonts and making them available to applications (including openoffice) are outlined below. On Windows systems, your fonts will be installed to a path like: C:\WINDOWS\Fonts\*.ttf. You will have to restart openoffice after installing fonts.

# Check you have the standard PostScript Type1 fonts installed:
# (e.g. on Fedora:)
  % yum install ghostscript-fonts

# Steps for installing additional TrueType fonts on Linux
  % cd /usr/share/fonts/truetype
  % mkdir myfonts
  % cd myfonts
# ... copy the *.ttf files to myfonts/
  % mkfontdir
  % fc-cache
                    

4. Enabling 'export PDF with notes'

There is Java code included to generate a PDF with the notes attached (from the Tools > Export PDF menu option). To enable this, you need to have installed Java on your server:

# as root...
(on ubuntu)
  % sudo apt-get install openjdk-8-jre-headless

(other linux distributions will have different package names)
                    

If 'java' isn't installed to the standard path, you can set the version of java to use with the $javaexe setting in php/phpconfig.inc:

// e.g. ... in phpconfig.inc:
  $javaexe = "/opt/jre1.8.0/bin/java";
                    

5. Set the initial tags available to new users

Each user account maintains a list of tags which have been used by that user, and these are used to populate the tags chooser for new notes. You can initialise this list for new user accounts by editing the text file 'php/inittags.txt' - the format is plain text, one line per tag.

  cd php
  vi inittags.txt
                    

6. Enabling email notifications

To enable email notifications on the server (so users get sent an email when someone adds a comment to a document), you need to set up a regular CRON job to check for news. There is a PHP script php/sendEmailNotifications.php in your installation which you can run by viewing it in your browser - to set up a cron job to fetch this URL every 10 minutes:

# as root...
  % su annotate
  $ crontab -e
*/10 * * * * /usr/bin/curl "http://www.yoursite.com/annotate/php/sendEmailNotifications.php" -o - >/dev/null 2>&1
                    

Note that your users will have to choose to switch on email notifications for their account - there is a link on the home page, and the account page lets you control detailed settings (e.g. for immediate, hourly or daily updates).

7. Advanced phpconfig.inc configuration settings

A number of installation settings are present in the phpconfig.inc file which can be used to change the standard behaviour of Annotate, and use your own logo / branding / messages. The basic settings are below, see your phpconfig.inc file for details of all the options.

// Optional: Change the default note edit/delete/content settings.
// $authorOnlyDelete = 1;   // Uncomment so doc owner can't delete others' comments.
// $authorOnlyEdit = 1;     // Uncomment so doc owner can't edit others' comments.
// $fixOnReply = 1;         // Uncomment to stop notes with replies being deleted.
// $anyEdit = 1;               // Uncomment to allow any viewer to edit other's comments.
// $allowJavascriptNotes = 1;  // Uncomment to allow javascript: urls in notes

// Optional: Customize the welcome message in the banner of home.php
// $todaysMessage = "Welcome to Annotate and hello world";

// Optional: Override the Annotate logo displayed in the 
// top left with your own logo. You can include html;
// use an absolute URL for images, e.g.:
// $customBannerLogo = "<img border='0' src='http://www.textensor.com/textensor-200.png' />";

// Optional: Don't send users emails on creating accounts.
// $noNewAccountEmail = 1;

// Optional: Don't give users a welcome document.
// $noSampleDocument = 1;  

// Optional: Customize the footer used when exporting PDFs with notes.
// The default footer just has the page number of the orig document.
//
// For a footer like this one uncomment the settings below:
//   "Page 1. {document title} - generated by user123 - notes by [joe,jill] - visit http://yoursite.com"
//
// $pdffooter_title       = 1; // add document title too.
// $pdffooter_generatedby = 1; // add who it was generated by.
// $pdffooter_annotators  = 1; // add annotators too.
// $pdffooter = " - visit http://yoursite.com"; 
                    

8. Support large document uploads

Annotate doesn't impose any file size limit for uploads itself - but there will be limits set in your "php.ini" apache/php configuration file. You can find what they are set to on your system, and where your php.ini config file is by pointing your browser at a file 'phpinfo.php' file which includes the line:

 <?php phpinfo(); ?>
                    

Relevant php.ini settings are: file_uploads, upload_max_filesize, max_input_time, memory_limit, max_execution_time, post_max_size. You may want to increase the default settings, e.g. to:

# Sample settings for php.ini:
  post_max_size=100M
  upload_max_filesize=100M
  max_execution_time=300
  max_input_time=60
                    

You will need to restart your web server for any changes to take effect - you can view a phpinfo.php file to check their values.

9. Backing up your documents and notes

All documents are stored in the docs/ folder; all notes are stored in the private/ folder. You should take regular backups of these folders, e.g. by running a cron job which uses the rsync tool to make an incremental remote copy on another server.

10. Apache cache and cookie settings

Annotate has been designed to make use of client web browser caches to minimize the number of server requests for pages and notes. For Apache, a sample htaccess-cache.txt file is supplied which should be copied to annotate/.htaccess. This uses the mod_expires apache module to add a HTTP header to allow browsers to cache static content (such as page images). You need to make sure that the optional mod_expires module is enabled, so uncomment the lines below in your httpd.conf apache config file and restart the web server:

LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
                    

On Ubuntu, the configuration of optional apache modules can be done by linking from /etc/apache2/mods-enabled/ to /etc/apache2/mods-available:

$ cd /etc/apache2/mods-enabled
$ ln -s ../rewrite.load rewrite.load
$ ln -s ../headers.load headers.load
$ ln -s ../expires.load expires.load
                    

The default .htaccess file is below: You can force a page reload at any time from a browser using shift-reload (on Firefox) or ctrl-reload (on IE) - or clear your browser cache then reload.

ExpiresActive On
ExpiresDefault "access plus 1 week"
                    

10.1 Enabling compression

Configuring your apache server to serve up compressed versions of html and javascript speeds up the annotate server significantly as transfers of notes and code to the browser will be faster. You need to enable the mod_deflate apache module and edit your .htaccess file or httpd.conf settings as below: (sample provided in htaccess-cache-gzip.txt)

ExpiresActive On
ExpiresDefault "access plus 1 week"

<Files *.js>
SetOutputFilter DEFLATE
</Files>

<Files *.css>
SetOutputFilter DEFLATE
</Files>

<Files *.html>
SetOutputFilter DEFLATE
</Files>

<Files *.txt>
SetOutputFilter DEFLATE
</Files>
                    

10.2 Cookies and embedding an iframe with IE

If you are embedding an Annotate panel in another application which is hosted on a different site from your Annotate server, you may encounter login problems with Internet Explorer, which by default blocks 3rd party cookies (such as the PHP session cookie) needed for logins to Annotate to work.

Solutions to this are:

  • (1) run annotate on the same server as your external web application
  • (2) run a proxy server to make it look to the browser like annotate is running on the same server (e.g. using mod_proxy and mod_rewrite)
  • (3) get your users to enable cookies from your annotate server on IE (they can do this by double-clicking on the 'no entry' icon in the browser footer);
  • (4) add a P3P http header to every message your web server returns. For option (4), adding the P3P header, you need to enable the mod_headers optional module in httpd.conf (and restart your web server), and add the line below to your .htaccess file (a sample is included in htaccess-cookies.txt). The Microsoft IIS support site has instructions if you use the IIS web server.
Header append P3P: 'CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"'
                    

11. Creating a robots.txt file for search engines

Search engines will not index any of the documents uploaded to your annotate server unless you post a public link to the document on a website. You can also create a robots.txt file to prevent search engines indexing your content even if a link is published to the web. A sample is provided in robots-sample.txt which you can edit and copy to the root of your web directory so it can be found as http://yoursite.com/robots.txt - a sample is given below:

User-agent: *
Disallow: /annotate/
                    

12. Custom storage locations for documents and notes

By default, documents are stored in the docs/ folder, and notes in the private/ folder of your annotate installation. It is possible to configure these, so documents are stored in any path on your system using the docsdir and privatedir phpconfig settings. This can be useful if you want to store on a network drive, or just separately from the rest of the annotate install. These must also be specified if running using the Quercus java servlet implementation of PHP rather than regular apache-php.

The once complexity is that if you move the docs/ folder, you also need to edit your web server configuration to ensure that static content from http://yoursite.com/annotate/docs/ is served from the new folder too.

# 
$docsdir    = "c:/test/resin-4.0.9/webapps/ROOT/annotate/docs/";
$privatedir = "c:/test/resin-4.0.9/webapps/ROOT/annotate/private/";

# ... or on linux:
# $docsdir    = "/var/disk123/docs/";
# $privatedir = "/var/disk123/private/";

# NB you also need to configure your web server to serve
# static content from http://yoursite.com/annotate/docs/
# from the docsdir.
                    

13. HTTPS install notes

You need to configure your web server with a certificate (e.g. for testing see this external guide). After this, you just need to configure the phpconfig.inc path to include https (see below), and access the server through a https: URL.

The HTTPS support has been tested on Linux with Apache; contact us if you run into any issues on other configurations.

# sample config for https: 
  $nnotatepath="https://yourserver.com/annotate";
                    

Questions / problems:

Please contact us at support@annotate.com.