Installing Pepperdata (CDP Public Cloud)
To install Pepperdata via a parcel for Cloudera Data Platform (CDP) Public Cloud, first ensure that you have the license file that we emailed you and that your environment meets all the prerequisites (components, versions, and permissions). Next, obtain the Pepperdata installation artifacts, and move them to a location that the Data Hub cluster can access. Then create the Pepperdata-enabled cluster definition and Pepperdata-enabled cluster template. And finally, create the Pepperdata-enabled cluster in the same way that you’d create any cluster.
On This Page
Prerequisites
-
From the license file that we emailed you, get the license string and the cluster name. You’ll need this information when you create the Pepperdata-enabled cluster template (later in this procedure).
-
Before you begin installing Pepperdata, ensure that your environment meets all the listed prerequisites.
-
Perform the administrative tasks that are necessary to set up a CDP environment; refer to Getting started as an admin .
-
There must be a running Data Lake, to which your Data Hub clusters are attached. (Pepperdata is installed on Data Hub clusters, not on Data Warehouses.)
- The installed CDP Runtime is supported by Pepperdata for CDP Public Cloud environments; any of the following:
- CDP Runtime 7.2.x
- CDP Runtime 7.1.x
- CDP Runtime 7.0.x
-
The installed Cloudera Management Console is compatible with the installed CDP Runtime.
The Console-related procedures in this documentation are for the Cloudbreak v2.49 Cloudera Management Console. If you’re using a different version of Cloudbreak or a different console altogether, consult your console’s documentation for details about how to perform steps such as accessing a cluster template or cluster definition, and registering a cluster template. - CDP client is installed and working; refer to Install CDP client on Linux .
-
Task 1: Obtain the Pepperdata Installation Artifacts
Obtain the required Pepperdata artifacts (packages, installation files, and so on), and place them where they can be accessed by the Data Hub cluster.
- Download the following Pepperdata artifacts from the Downloads page to any local location:
- The appropriate
PepperdataSupervisor-X.Y.Z...tgz
installation file for your distro - The latest
PEPPERDATA-X.Y.Z.jar
CSD (custom service descriptor) for Supervisor 8.1 - The
PepperataSupervisor-X.Y.Z-manifest.json
JSON manifest for parcels
- The appropriate
-
Extract the parcel and hash files from the
PepperdataSupervisor-X.Y.Z...tgz
installation file.Be sure to substitute the actual installation file name for the
<your-filename.tgz>
placeholder.tar xvf <your-filename.tgz>
-
Rename the JSON manifest for parcels file to
manifest.json
.Be sure to perform this step. If you do not rename the file tomanifest.json
, the Pepperdata deployment will fail because Cloudera looks specifically for themanifest.json
file. -
Move the extracted and renamed files to a location that the Data Hub cluster can access: a local host, an Amazon S3 bucket, or any accessible remote location.
-
Typically a temporary remote repository is used to deploy packages on a one-time basis; refer to Creating a Temporary Internal Repository .
-
Use any utility to move the files to their required locations; for example, secure copy protocol (SCP).
-
For the extracted parcel and hash files, we recommend creating a
parcels
subdirectory in your internal repo; for example,/var/www/html/parcels
, but you can use any location. Just keep track of it so that you can use it to replace theYOUR-SUPERVISOR-URL
placeholder in the Pepperdata-enabled cluster definition (later in the procedure). -
For the extracted
PEPPERDATA-X.Y.Z.jar
CSD file, we recommend creating acsd
subdirectory in your internal repo; for example,/var/www/html/csd
, but you can use any location. Just keep track of it so that you can use it to replace theYOUR-CSD-URL
placeholder in the Pepperdata-enabled cluster definition (later in the procedure). -
Move the
manifest.json
file to the same location as the extracted parcel and hash files.
-
Task 2: Create the Pepperdata-Enabled Cluster Definition and Pepperdata-Enabled Cluster Template
To create the Pepperdata-enabled cluster definition and Pepperdata-enabled cluster template, you begin with the base cluster definition and base cluster template that Cloudera provides for the cluster type that you want to use. Then configure the Pepperdata-enabled cluster template by adding the Pepperdata services and roles, and registering the cluster template. Finally, finish configuring the Pepperdata-enabled cluster definition by adding the Pepperdata-enabled template and Pepperdata-related objects such as the target Data Hub and environment names and the repository specifications that the cluster will use.
Procedure
-
Create the initial cluster definition for the Pepperdata-enabled cluster by copying the Data Lake’s base cluster definition—the Cloudera Manager cluster definition as it exists before adding Pepperdata—to a new Pepperdata-enabled cluster definition file.
-
From the Cloudera Management Console’s left-navigation, select Data Lakes, and in the resulting display’s Name column, select the Data Lake you’ll use for Pepperdata.
The console shows the environment in which the Data Lake is running.
-
In the page’s navigation, select Cluster Definitions; in the Names column, select a Pepperdata-supported cluster type:
- Data Engineering clusters
- Streams Messaging clusters
- Streaming Analytics clusters
- Data Discovery and Exploration clusters
-
Copy the contents of the RAW VIEW, and paste them into a text editor.
-
Find the cluster template that is used for this cluster definition.
Search the editor’s text for
blueprintName
(it’s part thecluster
object), and make note of it. You’ll need this information when you create the Pepperdata-enabled cluster template (later in this procedure). -
Save the file with a filetype of
.json
.-
You can use any filename; in this documentation, we’ve used
pepperdata-cluster-definition.json
as the Pepperdata-enabled cluster definition. -
You can store the file anywhere that the CDP client can access. Make note of the location; you’ll need to specify it when you create the cluster (later in this procedure).
-
-
-
Create the initial cluster template for the Pepperdata-enabled cluster by copying the cluster definition’s base template definition—the Cloudera Manager cluster template as it exists before adding Pepperdata—to a new Pepperdata-enabled cluster template file.
-
From the Cloudera Management Console’s left-navigation, select Shared Resources > Cluster Templates, and in the resulting dislay’s Name column, select the cluster template that you noted earlier in step 1.d: the value of the
cluster.blueprintName
object. -
Copy the contents of the RAW VIEW, and paste them into a text editor.
-
Save the file with a filetype of
.json
.You can use any filename, but make a note of it. You’ll need to add more items to this file, and you’ll need to know its filename when you register the cluster template later in this procedure.
In this documentation, we’ve used
pepperdata-cluster-template.json
as the Pepperdata-enabled cluster template.
-
-
Add the Pepperdata services to the Pepperdata-enabled cluster template, and register the cluster template.
-
If you closed the Pepperdata-enabled cluster template that you created in step 2, re-open it.
-
Locate the
services
object, add the following snippet to it, and substitute your actual values for theYOUR-*
placeholders.-
You can add the snippet before or after the services that are already defined in the cluster template.
-
Although not required, we recommend editing the indentation to match what is already in the cluster template.
-
Replace the
YOUR-PD-LICENSE
placeholder (in theserviceConfigs.pepperdata_license_specification
object) with the actual license string from the license file that we emailed you. -
Replace the
YOUR-PD-REALM
placeholder (in theserviceConfigs.pepperdata_dashboard_realm
object) with the actual cluster name from the license file that we emailed you.
{ "refName": "pepperdata", "serviceType": "PEPPERDATA", "serviceConfigs": [ { "name": "pepperdata_license_specification", "value": "data://# YOUR-PD-LICENSE" }, { "name": "pepperdata_dashboard_realm", "value": "YOUR-PD-REALM" } ], "roleConfigGroups": [ { "refName": "pepperdata-PEPPERDATA_AGENT-BASE", "roleType": "PEPPERDATA_AGENT", "base": true }, { "refName": "pepperdata-PEPPERDATA_SUPERVISOR-BASE", "roleType": "PEPPERDATA_SUPERVISOR", "base": true }, { "refName": "pepperdata-PEPPERDATA_COLLECTOR-BASE", "roleType": "PEPPERDATA_COLLECTOR", "base": true }, ] }
-
-
Locate the
hostTemplates
object, and add the Pepperdata roles to the applicable node types.- Add the
pepperdata-PEPPERDATA_COLLECTOR-BASE
to all node types. - Add the
pepperdata-PEPPERDATA_AGENT-BASE
to all node types. - Add the
pepperdata-PEPPERDATA_SUPERVISOR-BASE
to the node types that contain theyarn-RESOURCEMANAGER-BASE
role (typically themaster
host template).
- Add the
-
Save your changes, and close the file.
-
Register the Pepperdata-enabled cluster template.
-
From the Cloudera Management Console’s left-navigation, select Shared Resources > Cluster Templates; in the resulting display’s top-nav, select Register Template.
-
Enter any name; for example,
pepperdata-cluster-template
. Make note of the name you specify; you’ll need to know this later. -
For the Cluster Template Source, select File > Upload JSON File, and locate the
pepperdata-cluster-template.json
file. -
Click Register.
The cluster template appears in the Cluster Templates list.
-
-
-
Finish configuring the Pepperdata-enabled cluster definition.
-
If you closed the Pepperdata-enabled cluster definition that you created in step 1, re-open it.
-
Locate the
cluster.blueprintName
object, and change its value from its original name to the name that you used to register the Pepperdata-enabled cluster template (in step 3.e, above). -
Copy-and-paste the following image object snippet to add it to the Pepperdata-enabled cluster definition, and substitute your values for the placeholders.
-
You can add it anywhere; each object is a top-level object (that is, at the same hierarchy level as the
cluster
object). If you add it to the end, be sure to appropriately adjust the delimiter commas. -
To determine the values for
YOUR-CATALOG-NAME
andYOUR-CATALOG-ID
, use the Cloudera Management Console’s left-nav to navigate to Data Lakes; in the resulting page, select the location where the Data Hub clusters are installed; in the resulting page’s horizontal-navigation, select Image Details.- You can use any value for
YOUR-CLUSTER-DEFINITION-NAME
; make note of it because you’ll need to know it when you create the cluster (later in this procedure). - For
YOUR-ENVIRONMENT-NAME
, use the environment name of the Data Lake that you’re using. - A typical value for
YOUR-CATALOG-NAME
iscdp-default
. - A typical value for
YOUR-CATALOG-ID
is016d3502-8c21-460e-a6bc-1113ccda2e30
.
- You can use any value for
"name": "YOUR-CLUSTER-DEFINITION-NAME", "environmentName": "YOUR-ENVIRONMENT-NAME", "image": { "catalog": "YOUR-CATALOG-NAME", "id": "YOUR-CATALOG-ID" },
-
-
Locate the
cluster
object, and add the following snippet to it.-
You can add the snippet before or after the other child objects in the
cluster
object. -
Although not required, we recommend editing the indentation to match what is already in the cluster template.
"cm": { "enableAutoTls": true, "repository":{ "baseUrl": "YOUR-CM-BASE-URL", "version": "YOUR-CM-VERSION" }, "products": [ { "name": "CDH", "version": "YOUR-CDH-VERSION", "parcel": "YOUR-CDH-URL" }, { "name": "PEPPERDATA_SUPERVISOR", "version": "YOUR-SUPERVISOR-VERSION", "parcel": "YOUR-SUPERVISOR-URL", "csd": ["YOUR-CSD-URL"] } ] }
-
-
Replace the following placeholders with your actual values.
Tip: If you do not know the URLs and versions, you can use the Cloudera Management Console to simulate creating a cluster for a predefined cluster definition. From the left-nav, select Data Hub Clusters, select Create > Advanced Options, and view the resulting repository specification information.- Cloudera Manager repository specification:
YOUR-CM-BASE-URL
; for example,https://archive.cloudera.com/p/cm-public/patch/7.4.2-15633910/redhat7/yum/
.YOUR-CM-VERSION
; for example,7.4.2
.
- Cloudera Runtime repository specification:
YOUR-CDH-VERSION
; for example,7.2.10-1.cdh7.2.10.p2.16564568
.YOUR-CDH-URL
; for example,https://archive.cloudera.com/p/cdp-public/7.2.10.2/parcels/
.
- Pepperdata Supervisor repository specification:
YOUR-SUPERVISOR-VERSION
—the first part of the installation filename, without any parcel name or filetype; for example,6.5.23-H30_YARN3
YOUR-SUPERVISOR-URL
—the fully-qualified location of the Pepperdata parcel and hash files (which you chose in Task 1, step 4). For example,http://cdp-datalake-7210-master0.cdp-envi.ztoh-5shg.cloudera.site:8900/parcels/
.YOUR-CSD-URL
—the fully-qualified path of thePEPPERDATA-X.Y.Z.jar
CSD JAR file (which you chose in Task 1, step 4). For example:http://cdp-datalake-7210-master0.cdp-envi.ztoh-5shg.cloudera.site:8900/csd/PEPPERDATA-3.1.2.jar
.
- Cloudera Manager repository specification:
-
Save your changes and close the file.
-
Task 3: Create the Pepperdata-Enabled Cluster
To create the Pepperdata-enabled cluster, you’ll first start the internal server on the Data Lake node that is hosting the Pepperdata parcel and manifests. Then you’ll use the CDP client to create the cluster as you configured it in the Pepperdata-enabled cluster template and Pepperdata-enabled cluster definition.
Procedure
-
On the Data Lake node that is hosting the Pepperdata parcel and manifests, start the internal server in the usual manner.
For example, if you are following the Creating a Temporary Internal Repository instructions, run the
start
command on the node that is hosting the Pepperdata files. -
On the host where the CDP client is installed, create the Pepperdata-enabled cluster.
-
Be sure to use the fully-qualified path and filename of the Pepperdata-enabled cluster definition to replace the
<your-path-to-cluster-definition>
placeholder. -
Be sure to include the
file://
prefix before the<your-path-to-cluster-definition>
value. -
You can assign any name you want for the
<your-data-hub-cluster-name>
placeholder. -
Use the command for your storage service.
-
For AWS clusters:
cdp datahub create-aws-cluster --request-template file://<path-to-cluster-definition> --cluster-name <your-data-hub-cluster-name>
-
For Azure clusters:
cdp datahub create-azure-cluster --request-template file://<path-to-cluster-definition> --cluster-name <your-data-hub-cluster-name>
-
-
-
(Optional) After the cluster starts, you can close the internal server connection.
Next: Configuring Pepperdata