Integrating Pentaho with MapR using Apache Drill

Your feedback is important to us!  Email us how we can improve these documents.

Software Versions
Pentaho Data Integration 6.x, 7.x
MapR Converged Data Platform 4.x, 5.x
Apache Drill 1.6 or later (1.8 is the latest available)

Overview

Apache Drill is a schema-free SQL-on-Hadoop tool that lets you run SQL queries against different data sets located in your Hadoop filesystem with various formats, e.g. json, csv, Parquet, HBase, etc. Blending Pentaho Data Integration (PDI) with Apache Drill gives you the flexibility to do data integration work through Pentaho’s powerful PDI product.

Note: Pentaho Data Integration’s support of Drill is limited and is provided through our support for JDBC 3 / 4 drivers. Support of the Apache Drill driver itself is provided through MapR.

Some of the things discussed here include configuring Apache Drill for Pentaho Data Integration, connecting PDI to Drill, and links to recommended settings and best practices.

We assume that you have administrator permissions on the cluster, have a MapR Converged Data Platform running with Apache Drill installed, and Apache ZooKeeper running in replicated mode

 

   -  Best Practice - Integrating Pentaho with MapR using Apache Drill

Have more questions? Submit a request

Comments

Powered by Zendesk