Qubole QDS: Install Pepperdata

Supported versions: See the Qubole® QDS entries for Pepperdata 9.0.x in the table of Supported Platforms by Pepperdata Version

To activate Pepperdata—inject the necessary instrumentation—on an already existing and/or running cluster, activate Pepperdata on the already-running hosts, and add the activation commands to the cluster’s existing bootstrap configuration. To activate Pepperdata on new Qubole QDS clusters, add the Pepperdata bootstrap script to the cluster’s configuration.

On This Page

Create New Cluster with Pepperdata
Add Pepperdata to Existing/Running Cluster

Create New Cluster with Pepperdata

This procedure is for configuring Pepperdata activation for hosts that will be created in the future, for a cluster that will be created in the future. That is, there is not already an existing/running cluster.

Assumptions

This procedure assumes that you will not need to leverage any custom cluster management functions, such as certificate management. If such additional (non-Pepperdata) functions are needed, you should create a “helper bootstrap” script to invoke those functions and call the Pepperdata bootstrap script. In this case, upload the helper bootstrap script to the cluster configuration folder, and use its location and filename for the Node Bootstrap File field in the procedure.

Prerequisites

An AWS access key is required for creating the bootstrap script.

If you already have an access key, you can use that.
Otherwise, use the Qubole interface to access the My Security Credentials page, and create a new access key.

Procedure

In your cloud environment (such as AWS), create the folder for the bootstrap script.
1. In the Qubole interface, navigate to My Security Credentials > Account Settings > Storage Settings.
2. Provide your AWS Access Key.
3. Note your Default Location, which should be a root-level folder named to match your cluster.
  
  Continuing with our <my-cluster> example, the default location would be S3://<my-cluster>.
4. Create a scripts/hadoop directory under the default location.
In the Qubole interface, add Pepperdata to the bootstrap process.
1. Use the Node Bootstrap Editor in Qubole.
  - If a bootstrap file already exists for the cluster, open it.
  - Otherwise, create a new bootstrap file, bootstrap in the scripts/hadoop directory that you previously created under the default location.
2. Add the Pepperdata bootstrapping commands to the bootstrap script.
  - For an existing bootstrap file with a multi-step bootstrapping process—with pre_task_start() and post_start() functions:
    
    Be sure to substitute your bucket name and the applicable installation directory for the <my-bucket> and <my-base-directory> placeholders.
    - <my-bucket> is the name of the bucket that you created at the beginning of the installation process.
    - <my-base-directory> is the name of the directory into which the installation TGZ archive was extracted; for example, supervisor-6.3.13-H26_YARN2_A.
    1. If the read_from_s3.sh and qubole-bash-lib.sh scripts are not already sourced in the bootstrap file, add the following source commands before the pre_task_start() and post_start() functions:
      source /usr/lib/hustler/bin/read_from_s3.sh source /usr/lib/hustler/bin/qubole-bash-lib.sh
    2. Add the following snippet to the pre_task_start() function:
      read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap
    3. Add the following snippet to the post_task_start() function:
      is_master=`nodeinfo is_master` if [[ "$is_master" == "1" ]]; then read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap fi
  - For an existing bootstrap file without a multi-step bootstrapping process (without pre_task_start() and post_start() functions):
    
    Be sure to substitute your bucket name and the applicable installation directory for the <my-bucket> and <my-base-directory> placeholders.
    - <my-bucket> is the name of the bucket that you created at the beginning of the installation process.
    - <my-base-directory> is the name of the directory into which the installation TGZ archive was extracted; for example, supervisor-6.3.13-H26_YARN2_A.
    1. If the read_from_s3.sh script is not already sourced in the bootstrap file, add the following source command:
      source /usr/lib/hustler/bin/read_from_s3.sh
    2. Add the following snippet to the end of the bootstrap script:
      read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap
    After you configure the cluster and create it, be sure to restart the ResourceManager, as shown in the last step of this procedure.
  - For a new bootstrap file, add the following multi-step bootstrapping code to the bootstrap script.
    
    Be sure to substitute your bucket name and the applicable installation directory for the <my-bucket> and <my-base-directory> placeholders, as explained after the code snippet.
    #!/bin/bash source /usr/lib/hustler/bin/read_from_s3.sh source /usr/lib/hustler/bin/qubole-bash-lib.sh function pre_task_start() { read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap } function post_start() { is_master=`nodeinfo is_master` if [[ "$is_master" == "1" ]]; then read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap fi }
    Where
    - <my-bucket> is the name of the bucket that you created at the beginning of the installation process.
    - <my-base-directory> is the name of the directory into which the installation TGZ archive was extracted; for example, supervisor-6.3.13-H26_YARN2_A.
In the Qubole interface, configure your cluster.
- For the Node Bootstrap File, enter the name of the bootstrap script that you created or edited in the previous step.
- In the Advanced Configuration > HADOOP CLUSTER SETTINGS > Override Hadoop Configuration Values section, enter the required information.
  
  Be sure to substitute your bucket name, and cluster name for the <my-bucket> and <my-cluster> placeholders, respectively.
  - com.pepperdata.bucket.name=<my-bucket>
  - com.pepperdata.realm.name=<my-cluster>
Complete the steps as prompted by the Create Cluster wizard.

The cluster is created, the Pepperdata software is installed, and the Pepperdata services are automatically started.
(Only for an existing bootstrap file without a multi-step bootstrapping process)

On the master node, restart the ResourceManager.

sudo monit restart resourcemanager

Add Pepperdata to Existing/Running Cluster

This procedure is for configuring Pepperdata activation on hosts in an already-running cluster: existing (already-running) hosts and hosts that will be created in the future.

Prerequisites

Every currently-running host in the cluster must already have an initialization (bootstrap) script.

If there is no initialization script, you must destroy the cluster and re-create it so that every host has an initialization script. The script can be empty or you can follow the procedure for activating Pepperdata on a new cluster.
Be sure that you’ve installed Pepperdata on every already-running host.

Procedure

Although you can manually perform the procedure steps on every already-running host, to save time or if you have a lot of hosts, we recommend that you use any existing automation framework that you have or create a shell script with the required commands.

Beginning with any already-running host, log in to the host.
Activate Pepperdata on the already-running host.
1. From the command line, copy the Pepperdata bootstrap script that you extracted from the Pepperdata package from its local location to any location in your Qubole Data Steam (QDS) environment.
  
  For example:
  
  aws s3 cp s3://<pd-bootstrap-script-from-install-packages> /tmp/bootstrap
2. Run the Pepperdata bootstrap script; for example:
  
  sudo bash /tmp/bootstrap
  
  The script finishes with a Pepperdata installation succeeded message.
3. Repeat steps a–b on every already-running host.
Add bootstrap actions to activate Pepperdata on new hosts as they’re created.

Important: For existing/running clusters, do not try to change the configured pointer/name of the bootstrap script to the Pepperdata bootstrap script. Doing so will not result in activating Pepperdata on new hosts as they’re created.
1. Download a copy of your existing cluster bootstrap script (not the Pepperdata bootstrap script) to a location where you can edit it.
2. Open the script for editing.
3. Add the Pepperdata bootstrapping commands to the bootstrap script.
  - For an existing bootstrap file with a multi-step bootstrapping process—with pre_task_start() and post_start() functions:
    
    Be sure to substitute your bucket name and the applicable installation directory for the <my-bucket> and <my-base-directory> placeholders.
    - <my-bucket> is the name of the bucket that you created at the beginning of the installation process.
    - <my-base-directory> is the name of the directory into which the installation TGZ archive was extracted; for example, supervisor-6.3.13-H26_YARN2_A.
    1. If the read_from_s3.sh and qubole-bash-lib.sh scripts are not already sourced in the bootstrap file, add the following source commands before the pre_task_start() and post_start() functions:
      source /usr/lib/hustler/bin/read_from_s3.sh source /usr/lib/hustler/bin/qubole-bash-lib.sh
    2. Add the following snippet to the pre_task_start() function:
      read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap
    3. Add the following snippet to the post_task_start() function:
      is_master=`nodeinfo is_master` if [[ "$is_master" == "1" ]]; then read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap fi
  - For an existing bootstrap file without a multi-step bootstrapping process (without pre_task_start() and post_start() functions):
    
    Be sure to substitute your bucket name and the applicable installation directory for the <my-bucket> and <my-base-directory> placeholders.
    - <my-bucket> is the name of the bucket that you created at the beginning of the installation process.
    - <my-base-directory> is the name of the directory into which the installation TGZ archive was extracted; for example, supervisor-6.3.13-H26_YARN2_A.
    1. If the read_from_s3.sh script is not already sourced in the bootstrap file, add the following source command:
      source /usr/lib/hustler/bin/read_from_s3.sh
    2. Add the following snippet to the end of the bootstrap script:
      read_value_from_s3_location_into_file s3://<my-bucket>/install-packages/<my-base-directory>/qubole/bootstrap /tmp/bootstrap sudo bash /tmp/bootstrap
4. Save your changes and close the file.
5. Upload the revised file to overwrite the original cluster bootstrap script.