Your feedback is important to us! Email us how we can improve these documents.
|Pentaho Data Integration||6.x, 7.x|
|MapR Converged Data Platform||4.x, 5.x|
|Apache Drill||1.6 or later (1.8 is the latest available)|
Apache Drill is a schema-free SQL-on-Hadoop tool that lets you run SQL queries against different data sets located in your Hadoop filesystem with various formats, e.g. json, csv, Parquet, HBase, etc. Blending Pentaho Data Integration (PDI) with Apache Drill gives you the flexibility to do data integration work through Pentaho’s powerful PDI product.
Some of the things discussed here include configuring Apache Drill for Pentaho Data Integration, connecting PDI to Drill, and links to recommended settings and best practices.
We assume that you have administrator permissions on the cluster, have a MapR Converged Data Platform running with Apache Drill installed, and Apache ZooKeeper running in replicated mode.