Installing Pepperdata (CDP Public Cloud)

To install Pepperdata via a parcel for Cloudera Data Platform (CDP) Public Cloud, first ensure that you have the license file that we emailed you and that your environment meets all the prerequisites (components, versions, and permissions). Next, obtain the Pepperdata installation artifacts, and move them to a location that the Data Hub cluster can access. Then create the Pepperdata-enabled cluster definition and Pepperdata-enabled cluster template. And finally, create the Pepperdata-enabled cluster in the same way that you’d create any cluster.

Prerequisites

  1. From the license file that we emailed you, get the license string and the cluster name. You’ll need this information when you create the Pepperdata-enabled cluster template (later in this procedure).

  2. Before you begin installing Pepperdata, ensure that your environment meets all the listed prerequisites.

    • Perform the administrative tasks that are necessary to set up a CDP environment; refer to Getting started as an admin .

    • There must be a running Data Lake, to which your Data Hub clusters are attached. (Pepperdata is installed on Data Hub clusters, not on Data Warehouses.)

    • The installed CDP Runtime is supported by Pepperdata for CDP Public Cloud environments; any of the following:
      • CDP Runtime 7.2.x
      • CDP Runtime 7.1.x
      • CDP Runtime 7.0.x
    • The installed Cloudera Management Console is compatible with the installed CDP Runtime.

      The Console-related procedures in this documentation are for the Cloudbreak v2.49 Cloudera Management Console. If you’re using a different version of Cloudbreak or a different console altogether, consult your console’s documentation for details about how to perform steps such as accessing a cluster template or cluster definition, and registering a cluster template.
    • CDP client is installed and working; refer to Install CDP client on Linux .

Task 1: Obtain the Pepperdata Installation Artifacts

Obtain the required Pepperdata artifacts (packages, installation files, and so on), and place them where they can be accessed by the Data Hub cluster.

  1. Download the following Pepperdata artifacts from the Downloads page to any local location:
    • The appropriate PepperdataSupervisor-X.Y.Z...tgz installation file for your distro
    • The latest PEPPERDATA-X.Y.Z.jar CSD (custom service descriptor) for Supervisor 8.1
    • The PepperataSupervisor-X.Y.Z-manifest.json JSON manifest for parcels
  2. Extract the parcel and hash files from the PepperdataSupervisor-X.Y.Z...tgz installation file.

    Be sure to substitute the actual installation file name for the <your-filename.tgz> placeholder.

    tar xvf <your-filename.tgz>

  3. Rename the JSON manifest for parcels file to manifest.json.

    Be sure to perform this step. If you do not rename the file to manifest.json, the Pepperdata deployment will fail because Cloudera looks specifically for the manifest.json file.
  4. Move the extracted and renamed files to a location that the Data Hub cluster can access: a local host, an Amazon S3 bucket, or any accessible remote location.

    • Typically a temporary remote repository is used to deploy packages on a one-time basis; refer to Creating a Temporary Internal Repository .

    • Use any utility to move the files to their required locations; for example, secure copy protocol (SCP).

    • For the extracted parcel and hash files, we recommend creating a parcels subdirectory in your internal repo; for example, /var/www/html/parcels, but you can use any location. Just keep track of it so that you can use it to replace the YOUR-SUPERVISOR-URL placeholder in the Pepperdata-enabled cluster definition (later in the procedure).

    • For the extracted PEPPERDATA-X.Y.Z.jar CSD file, we recommend creating a csd subdirectory in your internal repo; for example, /var/www/html/csd, but you can use any location. Just keep track of it so that you can use it to replace the YOUR-CSD-URL placeholder in the Pepperdata-enabled cluster definition (later in the procedure).

    • Move the manifest.json file to the same location as the extracted parcel and hash files.

Task 2: Create the Pepperdata-Enabled Cluster Definition and Pepperdata-Enabled Cluster Template

To create the Pepperdata-enabled cluster definition and Pepperdata-enabled cluster template, you begin with the base cluster definition and base cluster template that Cloudera provides for the cluster type that you want to use. Then configure the Pepperdata-enabled cluster template by adding the Pepperdata services and roles, and registering the cluster template. Finally, finish configuring the Pepperdata-enabled cluster definition by adding the Pepperdata-enabled template and Pepperdata-related objects such as the target Data Hub and environment names and the repository specifications that the cluster will use.

Procedure

  1. Create the initial cluster definition for the Pepperdata-enabled cluster by copying the Data Lake’s base cluster definition—the Cloudera Manager cluster definition as it exists before adding Pepperdata—to a new Pepperdata-enabled cluster definition file.

    1. From the Cloudera Management Console’s left-navigation, select Data Lakes, and in the resulting display’s Name column, select the Data Lake you’ll use for Pepperdata.

      The console shows the environment in which the Data Lake is running.

    2. In the page’s navigation, select Cluster Definitions; in the Names column, select a Pepperdata-supported cluster type:

      • Data Engineering clusters
      • Streams Messaging clusters
      • Streaming Analytics clusters
      • Data Discovery and Exploration clusters
    3. Copy the contents of the RAW VIEW, and paste them into a text editor.

    4. Find the cluster template that is used for this cluster definition.

      Search the editor’s text for blueprintName (it’s part the cluster object), and make note of it. You’ll need this information when you create the Pepperdata-enabled cluster template (later in this procedure).

    5. Save the file with a filetype of .json.

      • You can use any filename; in this documentation, we’ve used pepperdata-cluster-definition.json as the Pepperdata-enabled cluster definition.

      • You can store the file anywhere that the CDP client can access. Make note of the location; you’ll need to specify it when you create the cluster (later in this procedure).

  2. Create the initial cluster template for the Pepperdata-enabled cluster by copying the cluster definition’s base template definition—the Cloudera Manager cluster template as it exists before adding Pepperdata—to a new Pepperdata-enabled cluster template file.

    1. From the Cloudera Management Console’s left-navigation, select Shared Resources > Cluster Templates, and in the resulting dislay’s Name column, select the cluster template that you noted earlier in step 1.d: the value of the cluster.blueprintName object.

    2. Copy the contents of the RAW VIEW, and paste them into a text editor.

    3. Save the file with a filetype of .json.

      You can use any filename, but make a note of it. You’ll need to add more items to this file, and you’ll need to know its filename when you register the cluster template later in this procedure.

      In this documentation, we’ve used pepperdata-cluster-template.json as the Pepperdata-enabled cluster template.

  1. Add the Pepperdata services to the Pepperdata-enabled cluster template, and register the cluster template.

    1. If you closed the Pepperdata-enabled cluster template that you created in step 2, re-open it.

    2. Locate the services object, add the following snippet to it, and substitute your actual values for the YOUR-* placeholders.

      • You can add the snippet before or after the services that are already defined in the cluster template.

      • Although not required, we recommend editing the indentation to match what is already in the cluster template.

      • Replace the YOUR-PD-LICENSE placeholder (in the serviceConfigs.pepperdata_license_specification object) with the actual license string from the license file that we emailed you.

      • Replace the YOUR-PD-REALM placeholder (in the serviceConfigs.pepperdata_dashboard_realm object) with the actual cluster name from the license file that we emailed you.

      {
         "refName": "pepperdata",
         "serviceType": "PEPPERDATA",
         "serviceConfigs": [
            {
               "name": "pepperdata_license_specification",
               "value": "data://# YOUR-PD-LICENSE"
            },
            {
               "name": "pepperdata_dashboard_realm",
               "value": "YOUR-PD-REALM"
            }
         ],
         "roleConfigGroups": [
            {
               "refName": "pepperdata-PEPPERDATA_AGENT-BASE",
               "roleType": "PEPPERDATA_AGENT",
               "base": true
            },
            {
               "refName": "pepperdata-PEPPERDATA_SUPERVISOR-BASE",
               "roleType": "PEPPERDATA_SUPERVISOR",
               "base": true
            },
            {
               "refName": "pepperdata-PEPPERDATA_COLLECTOR-BASE",
               "roleType": "PEPPERDATA_COLLECTOR",
               "base": true
            },
         ]
      }
      
    3. Locate the hostTemplates object, and add the Pepperdata roles to the applicable node types.

      • Add the pepperdata-PEPPERDATA_COLLECTOR-BASE to all node types.
      • Add the pepperdata-PEPPERDATA_AGENT-BASE to all node types.
      • Add the pepperdata-PEPPERDATA_SUPERVISOR-BASE to the node types that contain the yarn-RESOURCEMANAGER-BASE role (typically the master host template).
    4. Save your changes, and close the file.

    5. Register the Pepperdata-enabled cluster template.

      1. From the Cloudera Management Console’s left-navigation, select Shared Resources > Cluster Templates; in the resulting display’s top-nav, select Register Template.

      2. Enter any name; for example, pepperdata-cluster-template. Make note of the name you specify; you’ll need to know this later.

      3. For the Cluster Template Source, select File > Upload JSON File, and locate the pepperdata-cluster-template.json file.

      4. Click Register.

      The cluster template appears in the Cluster Templates list.

  2. Finish configuring the Pepperdata-enabled cluster definition.

    1. If you closed the Pepperdata-enabled cluster definition that you created in step 1, re-open it.

    2. Locate the cluster.blueprintName object, and change its value from its original name to the name that you used to register the Pepperdata-enabled cluster template (in step 3.e, above).

    3. Copy-and-paste the following image object snippet to add it to the Pepperdata-enabled cluster definition, and substitute your values for the placeholders.

      • You can add it anywhere; each object is a top-level object (that is, at the same hierarchy level as the cluster object). If you add it to the end, be sure to appropriately adjust the delimiter commas.

      • To determine the values for YOUR-CATALOG-NAME and YOUR-CATALOG-ID, use the Cloudera Management Console’s left-nav to navigate to Data Lakes; in the resulting page, select the location where the Data Hub clusters are installed; in the resulting page’s horizontal-navigation, select Image Details.

        • You can use any value for YOUR-CLUSTER-DEFINITION-NAME; make note of it because you’ll need to know it when you create the cluster (later in this procedure).
        • For YOUR-ENVIRONMENT-NAME, use the environment name of the Data Lake that you’re using.
        • A typical value for YOUR-CATALOG-NAME is cdp-default.
        • A typical value for YOUR-CATALOG-ID is 016d3502-8c21-460e-a6bc-1113ccda2e30.
      "name": "YOUR-CLUSTER-DEFINITION-NAME",
      "environmentName": "YOUR-ENVIRONMENT-NAME",
      "image": {
        "catalog": "YOUR-CATALOG-NAME",
        "id": "YOUR-CATALOG-ID"
      },
      
    4. Locate the cluster object, and add the following snippet to it.

      • You can add the snippet before or after the other child objects in the cluster object.

      • Although not required, we recommend editing the indentation to match what is already in the cluster template.

      "cm": {
        "enableAutoTls": true,
        "repository":{
        "baseUrl": "YOUR-CM-BASE-URL",
        "version": "YOUR-CM-VERSION"
      },
      "products": [
        {
          "name": "CDH",
          "version": "YOUR-CDH-VERSION",
          "parcel": "YOUR-CDH-URL"
        },
        {
          "name": "PEPPERDATA_SUPERVISOR",
          "version": "YOUR-SUPERVISOR-VERSION",
          "parcel": "YOUR-SUPERVISOR-URL",
          "csd": ["YOUR-CSD-URL"]
        }
      ]
      }
      
    5. Replace the following placeholders with your actual values.

      • Cloudera Manager repository specification:
        • YOUR-CM-BASE-URL; for example, https://archive.cloudera.com/p/cm-public/patch/7.4.2-15633910/redhat7/yum/.
        • YOUR-CM-VERSION; for example, 7.4.2.
      • Cloudera Runtime repository specification:
        • YOUR-CDH-VERSION; for example, 7.2.10-1.cdh7.2.10.p2.16564568.
        • YOUR-CDH-URL; for example, https://archive.cloudera.com/p/cdp-public/7.2.10.2/parcels/.
      • Pepperdata Supervisor repository specification:
        • YOUR-SUPERVISOR-VERSION—the first part of the installation filename, without any parcel name or filetype; for example, 6.5.23-H30_YARN3
        • YOUR-SUPERVISOR-URL—the fully-qualified location of the Pepperdata parcel and hash files (which you chose in Task 1, step 4). For example, http://cdp-datalake-7210-master0.cdp-envi.ztoh-5shg.cloudera.site:8900/parcels/.
        • YOUR-CSD-URL—the fully-qualified path of the PEPPERDATA-X.Y.Z.jar CSD JAR file (which you chose in Task 1, step 4). For example: http://cdp-datalake-7210-master0.cdp-envi.ztoh-5shg.cloudera.site:8900/csd/PEPPERDATA-3.1.2.jar.
    6. Save your changes and close the file.

Task 3: Create the Pepperdata-Enabled Cluster

To create the Pepperdata-enabled cluster, you’ll first start the internal server on the Data Lake node that is hosting the Pepperdata parcel and manifests. Then you’ll use the CDP client to create the cluster as you configured it in the Pepperdata-enabled cluster template and Pepperdata-enabled cluster definition.

Procedure

  1. On the Data Lake node that is hosting the Pepperdata parcel and manifests, start the internal server in the usual manner.

    For example, if you are following the Creating a Temporary Internal Repository  instructions, run the start command on the node that is hosting the Pepperdata files.

  2. On the host where the CDP client is installed, create the Pepperdata-enabled cluster.

    • Be sure to use the fully-qualified path and filename of the Pepperdata-enabled cluster definition to replace the <your-path-to-cluster-definition> placeholder.

    • Be sure to include the file:// prefix before the <your-path-to-cluster-definition> value.

    • You can assign any name you want for the <your-data-hub-cluster-name> placeholder.

    • Use the command for your storage service.

      • For AWS clusters:

        cdp datahub create-aws-cluster --request-template file://<path-to-cluster-definition> --cluster-name <your-data-hub-cluster-name>
        
      • For Azure clusters:

        cdp datahub create-azure-cluster --request-template file://<path-to-cluster-definition> --cluster-name <your-data-hub-cluster-name>
        
  3. (Optional) After the cluster starts, you can close the internal server connection.

Next: Configuring Pepperdata