Configuration
This guide presents basic configuration tasks for the Pentaho Server, data connections, the Pentaho design tools, and Hadoop cluster connections so you can get started creating ETL solutions and data analytics. This guide assumes you have installed the Pentaho software.
Tools: These configuration tasks can be performed through the PUC (Pentaho User Console), the PDI (Pentaho Data Integration) client, or edits to shell scripts and property files.
Login Credentials: A Pentaho administrator user name and password is required to perform configuration tasks through the user console.
These tasks are for IT and Pentaho administrators as described in the following definitions:
- An IT administrator installs, configures, and upgrades the Pentaho Server. An IT administrator knows where the data is stored, how to connect to it, details about the computing environment, and how to use the command line on Microsoft Windows or Linux.
- A Pentaho administrator is responsible for creation and management of users and roles along with managing workstations so the ETL specialists and business analysts can create, publish, and share content.
IT Administrator Tasks
As an IT administrator, you need to configure the Pentaho Server and define what security to use. If your team is working with Big Data, you will also need to set up a connection to a Hadoop cluster.
Configure the Pentaho Server and Security
Basic server tasks include starting and stopping the Pentaho Server, increasing the server's memory limit, and specifying data connections. These IT administrator tasks prepare the system for more specific Pentaho administrator configuration tasks, like defining connections and managing users and roles.
- Start and Stop the Pentaho Server
- Increase the Pentaho Server Memory Limit
- Specify Data Connections for BA Design Tools
- Specify Data Connections for the Pentaho Server
Specifying data connections for the Pentaho Server includes tasks for setting up shared connections so that your users can select the native database connection they need from a list.
The native database connections in the Pentaho Suite are based on JDBC (Java DataBase Connectivity).
The following articles show you how to set up various types of JDBC data connections for the Pentaho Server:
- Set Up Native (JDBC) or OCI Data Connections for the Pentaho Server
- Set Up JNDI Connections for the Pentaho Server
You also need to establish a security plan for your Pentaho system. Pentaho supports two different security options: Pentaho Security and advanced security providers, such as LDAP, Single Sign-On, or Microsoft Active Directory. The following task assists you in defining your security plan:
For more information on setting up security for your Pentaho system, particularly implementing advanced security options, refer to our Security guide in the Administration section.
Set Up Pentaho to Connect to a Hadoop Cluster
If you are an IT Administrator for a team working with Big Data, you will need to configure Pentaho to connect to a Hadoop cluster.
Pentaho can connect to Cloudera Distribution for Hadoop (CDH), Hortonworks Data Platform (HDP), Amazon Elastic MapReduce (EMR), or MapR. Pentaho also supports many related services such as HDFS, HBase, Oozie, Zookeeper, and Spark. You can connect to clusters and services from these Pentaho components: Spoon, the Pentaho Server, Analyzer, Pentaho Interactive Reporting, Pentaho Report Designer (PRD), and Pentaho Metadata Editor (PME).
The Pentaho Server can be configured to connect to a Hadoop cluster through an adaptive big-data layer referred to as a shim. You must modify shim properties and configuration files before you can connect to a Hadoop cluster. Pentaho regularly develops and releases shims, even in between releases, so that customers can easily keep abreast of the latest technological developments. To see which shims are supported for this version of Pentaho, see the Component Reference.
If the Hadoop Distribution that you want to use is not listed, visit Configuring Pentaho for your Hadoop Distro and Version. A previous version of our software might support older Hadoop Distributions.
To learn how to configure a shim for a specific distribution, click one of the following links:
Pentaho Administrator Tasks
As a Pentaho administrator, you need to configure data connections, manage the Pentaho Server, and set up the BA (Business Analytics) or PDI (Pentaho Data Integration) design tools.
Configure Data Connections
Data connection tasks include establishing data connections for the Pentaho Server and the Pentaho Repository, as well as steps on how to manage the permissions for users accessing those data connections.
- Define Data Connections for the Pentaho Server
- Create a Connection to the Pentaho Repository
- Assign Permissions to Use or Manage Database Connections
Data connections are specified and set up by your IT administrator.
Manage the Pentaho Server
The Pentaho Administrator is responsible for creating and managing users and workstations in the organization so the ETL specialists and business analysts can create, publish, and share content. Depending on the size and needs of your organization, their duties can include updating licenses, managing users and roles, creating and modifying data sources, and scheduling reports.
If you are using basic Pentaho Security, the Pentaho Administrator may be tasked with creating and managing users and roles, including assigning permissions to allow users to access the content they need.
Set Up the Design Tools and Utilities
Before using design tools and utilities, you need to perform configuration tasks for each workstation running these tools. Depending on how these tools and utilities were installed, they might be located on different machines other than the Pentaho Server.
BA Design Tools
The following table describes the BA design tools and their uses:
Design Tool | What It Does |
---|---|
Aggregation Designer | Optimizes the multidimensional Mondrian data model. |
Metadata Editor | Creates relational data models and refines the models created with the Data Access Wizard. |
Report Designer |
Interactive Reporting is a web-based design interface which is used to create both simple and on-demand operational reports without depending on IT or report developers. |
Schema Workbench | Creates multidimensional Mondrian data models and refines the models created with the Data Access Wizard. |