Please note: Pentaho's offices will be closed Monday May 29th in observance of Memorial Day as we honor and celebrate our Veterans.
Because of this, you may notice a small delay in response from Pentaho. Thanks for understanding, and Happy Memorial Day!

Updated - Best Practices - Backup and Recovery

Your feedback is important to us! Email us how we can improve these documents.

Introduction

The purpose of this document is to introduce best practices for backing up Pentaho BI and DI Servers. This document only covers a general approach for the user, due to the variety of environments. The following topics are covered in this document:

 

Software Version PDF
Pentaho  5.4, 6.x, 7.x

 

Back Up the Pentaho or BA Server

This section explains the recommended backup procedure for a single Pentaho or BA Server. The Server is mainly composed of three configuration and deployment containers, with some additional file settings. If you encounter several servers, apply the procedure to each one:

  • For web application files using Tomcat:
    • tomcat/webapps/pentaho
    • tomcat/webapps/pentaho-style
    • tomcat/webapps/sw-style
  • For web application files using JBOSS:
    • jboss-xxx/standalone/deployments/pentaho.war
    • jboss-???/standalone/deployments/pentaho-style.war
    • jboss-???/standalone/deployments/sw-style

The Pentaho installation contains system configuration, plugins, files, and folders that can be modified during Pentaho use. The default folder’s name is pentaho-solutions.

There are three Pentaho database repositories that store operational data and system data:

  • Jackrabbit
  • Hibernate
  • Quartz

Pentaho’s Jackrabbit database requires synchronization with local files stored in the pentaho-solutions/system/jackrabbit/repository folder.

There are other tools, software, and scripts that also require a backup, in addition to the server operational files:

  • License-installer tool and .installedLicences
  • Pentaho Start and Stop scripts

Pentaho stores some information by default in the USER HOME. This can be changed upon installation by setting environment variables. Use the .pentaho.kettle.pentaho-meta default directory for the USER HOME hidden folders.

Note: We recommend stopping the Server before doing a backup. This will guarantee that the information on the database is in sync with the local files stored in the pentaho-solutions repositories folder.

Make sure to back up the database repositories before using anything in the pentaho-solutions folder, if you are unable to stop the server. Pentaho does not provide any backup scripts or software for this operation.

Incremental Backup for the Pentaho or BA Server

Database files and low-level files should be incrementally backed up, before you back up the BA server.

Note: We recommend daily incremental backups for Pentaho solutions and Pentaho database repositories, and weekly full backups of the Pentaho web application, Pentaho solutions, and database repositories.

Here are the steps you need to do an incremental backup:

  1. Stop the Server.
  2. Perform an incremental backup for the Pentaho database repositories.
  3. Perform an incremental backup for the pentaho-solutions folder. If the system is installed as a cluster, back up each folder.
  4. Start the Server.

Full Backup for the Pentaho or BA Server

Full backups require the backup of the Pentaho database repositories, Pentaho solutions folder, and Pentaho web application folder.

Keep in mind that the server stores temporal information in pentaho-solutions/tmp that needs to be backed up. The Manual Cleanup of the /tmp Directory has tips for this process.

Here are the steps you need to do the backup:

  1. Stop the Server.
  2. Back up the Pentaho database repositories.
  3. Use the database utilities based on the repository database selected.
  4. Back up the Pentaho solutions folder.
  5. Back up the Pentaho web applications folder.
  6. Include OTHER PENTAHO FOLDERS.
  7. Start the Server.

Note: If a system is installed as a cluster, back up each pentaho-solutions folder, and each pentaho-applications folder.

Those default locations are the pentaho-solutions directory and /tomcat/webapps/pentaho directory, but they could be customized with customer installation.

Back Up the DI Server (Pentaho 5.4 or 6.x only)

This section explains the recommended backup procedure for a single DI Server.

Note: You can still set up separate DI-only servers with Pentaho 7.x, but the repositories are named jackrabbit, hibernate, and quartz.

The DI server is mainly composed of three configuration containers and three deployment containers. If you encounter several servers, apply the procedure to each one:

  • For web application files using Tomcat:
    • tomcat/webapps/pentaho-di
    • tomcat/webapps/pentaho-style
  • For web application files using JBOSS:
    • jboss-xxx/standalone/deployments/pentaho-di.
    • jboss-???/standalone/deployments/pentaho-style.war

The Pentaho installation contains system configuration, plugins, files, and folders that can be modified during Pentaho use. The default folder’s name is pentaho-solutions.

There are three Pentaho database repositories that store operational data and system data:

  • di_jackrabbit
  • di_hibernate
  • di_quartz

Note: Pentaho’s di_jackrabbit database requires synchronization with local files stored in the pentaho-solutions/system/jackrabbit/repository folder.

There are other tools, software, and scripts that also require a backup, in addition to server operational files:

  • License-installer tool and.installedLicences
  • Pentaho Start and Stop scripts

Pentaho stores some information by default in the USER HOME. This can be changed upon installation by setting environment variables. Use the .pentaho .kettle .pentaho-meta default directory for the USER HOME hidden folders.

Note: We recommend stopping the DI server before doing a backup. This will guarantee that the information on the database is in sync with the local files stored in the pentaho-solutions repositories folder.

Make sure to back up the database repositories before using anything in pentaho-solutions folder, if you are unable to stop the server. Pentaho does not provide any backup scripts or software to do the backup.

Incremental Backup for the DI Server

Database files and low-level files should be incrementally backed up, in order to back up the DI server.

Note: We recommend daily incremental backups for Pentaho solutions and Pentaho database repositories, and weekly full backups of Pentaho web applications, Pentaho solutions, and database repositories.

Note: It is important to make sure all external clients using the repository are stopped before stopping the DI server. Make sure all CARTE servers and PDI Client tools are disconnected before stopping the server.

Here are the steps you need to stop the servers:

  1. Stop the DI Server.
  2. Perform an incremental backup for the Pentaho database repositories.
  3. Perform an incremental backup for the pentaho-solutions folder. If a system is installed as a cluster, backup each pentaho-solutions folder.
  4. Start the DI Server.

Full Backup for the DI Server

Full backups require the backup of Pentaho database repositories, Pentaho solutions folder, and Pentaho web applications folder.

Note: Make sure all external clients using the repository are stopped before stopping the DI server. Make sure all CARTE servers and PDI Client tools are disconnected before stopping the server.

Here are the steps for making a backup:

  1. Stop the DI Server.
  2. Back up the Pentaho database repositories.
  3. Use the database utilities based on the repository database selected.
  4. Back up the Pentaho solutions folder.
  5. Back up the Pentaho web application folder.
  6. Include OTHER PENTAHO FOLDERS.
  7. Start the DI Server.

Note: If a system is installed as a cluster, back up each pentaho-solutions folder, and each pentaho-applications folder.

Those default locations are the pentaho-solutions directory and /tomcat/webapps/Pentaho directory, but they could be customized with customer installation.

Restore Pentaho Servers

This section explains the recommendations for restoring a backup for the Pentaho/BA and DI servers. Restoring either server from a backup is as simple as recovering the Pentaho solutions, Pentaho web applications, and the Pentaho databases.

Note: It is important that the restore locations and IP/DNS names address remain the same, as they are fixed in the configuration files of the platform.

If any of this information has changed, the configuration files can be changed to point to any new location such as IP, users, or passwords. Refer to the server manual install procedure, in order to review all the configuration files locations before starting up the server.

Command Line Notes

This section explains some other useful commands that can be used for backup and restore.

Command Line Arguments for Pentaho

Use the following command lines to backup or restore the file’s repositories from a Pentaho/BA server. Pentaho documentation has a more in-depth reference for command line arguments.

Note: For Linux: files use import-export.sh instead of import-export.bat.

To backup:

import-export.bat --backup--url=http://localhost:8080/pentaho -- username=admin --password=password --file- path=c:/home/Downloads/backup.zip --logfile=c:/temp/logfile.log

To restore:

import-export.bat --restore --url=http://localhost:8080/pentaho --username=admin --password=password --file-path=c:/home/Downloads/backup .zip --overwrite=true --logfile=c:/temp/logfile.log

Command Line Arguments for Postgres Database

Refer to the correct version of PostgreSQL documentation for backup and restore command line arguments for PostgreSQL.

Import to the Repository from the DI Server

Follow the instructions below to import the repository. You must be already logged into the repository in Spoon before you can perform this task.

  1. In Spoon, go to Tools > Repository > Import Repository.image014.png
  2. Locate the export (XML) file that contains the solution repository contents.
  3. Click Open. The Directory Selectiondialox box will appear.
  4. Select the directory in which you want to import the repository.
  5. Click Ok.
  6. Enter a comment, if applicable.
  7. Wait for the import process to complete.
  8. Click Close.

Export from Command Line

kitchen.bat /file:C:\Pentaho_samples\repository\repository_export.kjb "/param:rep_name=PDI2000" "/param:rep_user=joe" "/param:rep_password=password" "/param:rep_folder=/public/dev" "/param:target_filename=C:\Pentaho_samples\repository\export\dev.xml"

Related Information

Please visit the following links for more information about topics discussed in this document:

 

Have more questions? Submit a request

Comments

Powered by Zendesk