Please note: Pentaho's offices will be closed Monday May 29th in observance of Memorial Day as we honor and celebrate our Veterans.
Because of this, you may notice a small delay in response from Pentaho. Thanks for understanding, and Happy Memorial Day!

Best Practices - Big Data - Parsing XML on PDI

Your feedback is important to us!  Email us how we can improve these documents.

Software Version
Pentaho  5.2+, 6.x
Hadoop Cloudera 5.x
HortonWorks 2.x

Overview

We have collected a set of best practice recommendations for different strategies to process and parse XML files stored in a Hadoop cluster. 

Keep these Pentaho Architecture principles in mind while you are working through this document:

  1. Architecture is important, above all else.
  2. Platforms are always evolving: sometimes you will have to think creatively.

Some of the things discussed here include selecting the best method for parsing based on your use case and implementation details for different methods.

 

   -  Best Practices - Big Data - XML Parsing in PDI

Have more questions? Submit a request

Comments

Powered by Zendesk