Tez Recommendations

Pepperdata recommendations for Tez applications are generated for mappers by the Application Profiler (which must be configured, enabled, and running before an app begins in order for recommendations to be generated). The Tez tile in the Recommendations section of the Pepperdata dashboard shows how many recommendations were made during the last 24 hours, along with their severity levels. (If there are no recommendations for Tez apps, the Tez tile does not appear on the dashboard.)

Recommendations information is shown in several places in the Pepperdata dashboard:

To see a table of all the Tez apps that received recommendations at a given severity level, click the linked severity text in the Tez tile.
To see the recommendations’ severity levels for all recently run Tez applications, show the Applications Overview page by using the left-nav menu to select App Spotlight > Applications (and filter by Tez apps).
To view the Application Profiler report, click the title of the Tez tile, or use the left-nav menu to select App Spotlight > Application Profiler (and filter by Tez apps).

Although heuristics and recommendations are closely related, the terms are not interchangeable.

• Heuristics are the rules and triggering/firing thresholds against which Pepperdata compares the actual metrics values for your applications.

• When thresholds are crossed, Pepperdata analyzes the data, and provides relevant recommendations.

For example, a single heuristic might have a low and a high threshold, from which Pepperdata can provide distinct recommendations such as "Too long average task runtime" and "Too short average task runtime". That is, there is not a 1:1 correspondence of heuristics to recommendations.

On This Page

Prerequisites for Receiving Tez Recommendations
Distro-Based Limitations
Recommendations

Prerequisites for Receiving Tez Recommendations

To receive recommendations for a Tez application, the following prerequisites must be met before the application begins:

YARN Timeline Server must be running, and Tez monitoring must be enabled by Pepperdata; contact Pepperdata Support.
Application Profiler must be configured, enabled, and running.

Distro-Based Limitations

Recommendations for Tez apps are available only in clusters with Amazon EMR. Other distros do not provide the data that’s necessary for analysis/recommendation generation.

Recommendations

The table describes the Pepperdata recommendations for Tez applications: each recommendation’s name, for which phase it’s generated (mapper and/or reducer), its type (general guidance or specific tuning values to change), what triggered the recommendation (the cause), the text of the actual recommendation, and notes that provide additional information.

Because Pepperdata is continually improving the recommendations as more and more applications and queries are profiled, the name, cause, and/or recommendation text might be slightly different from what’s shown in this documentation.

For details about how the recommendations appear in an application’s detail page, see Recommendations Tab.

Tez Recommendations
Name	Phase		Type		Cause	Recommendation	Notes
Name	Mapper	Reducer	Guidance	Tuning	Cause	Recommendation	Notes
Average physical memory (MB)					Excessive wasted physical memory. <N> mappers each asked for <N> GB of memory, but used an average of only <N> GB each. (Firing threshold, which is a ratio of mapper’s average memory used to its requested memory, is <= <N>).	To decrease wasted memory: Decrease the container size by decreasing the value of `hive.tez.container.size`. Also, set the value of `hive.tez.java.opts` to <N>% of the new container size.	Pepperdata evaluates a task’s memory usage by calculating the ratio of the total memory consumed by all tasks (where each task’s memory use is its average over the runtime) and the requested container memory. This recommendation might not be given for an app even when the app wasted a lot of memory on average. This is because the recommendation is based on the average of the app containers’ (or tasks’) peak memory use, but the wasted memory value (as shown in the Resource Usage tab of the App Details page) is calculated from the average memory use over the app’s entire runtime.
Task GC/CPU ratio					Excessive time spent on garbage collection (GC). <N> mappers each spent an average of <N>% of their execution time on GC. (Firing threshold >= <N>%.)	To speed up your app: Reduce the size of large hash tables that are used for map joins by decreasing the value of `hive.auto.convert.join.noconditionaltask.size`. If the job is inserting a large number of columns, reduce the value of `hive.exec.orc.default.buffer.size`. If your job is inserting multiple partitions, set `hive.optimize.sort.dynamic.partition=true`	Code optimization can generally reduce the time spent on GC.
Median task speed					Mappers spent an excessive time ingesting data. <N> mappers each ingested a median of <N> MB, at a median rate of <N> MB/sec. (Firing threshold <= <N> MB/sec.)	To speed up your app: Increase the number of mappers by decreasing the value of `tez.grouping.split-count`. Also, decrease the value of `tez.groupoing.min-size and tez.grouping.max-size`. To speed up your app: If your jobs are spending a lot of time reading the input splits, reduce the value of `hive.auto.convert.join.noconditionaltask.size`	This recommendation identifies tasks that are ingesting data too slowly.
Imbalanced work across tasks					Imbalanced work across mappers. One group (<N> tasks that worked on an average of <N> MB of data each) worked on >= the firing threshold of <N> times more data than the other group (<N> tasks that worked on an average of <N> MB of data each).	To speed up your app: Decrease the number of mappers by increasing the value of `tez.grouping.split-count`. Also, increase the value of `tez.grouping.min-size` and `tez.grouping.max-size`. To speed up your app: If there are multiple small files that need to be combined, set `hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat`
Imbalanced time spent across tasks					Imbalanced work across mappers. One group (<N> tasks that spent on an average of <N> sec) spent >= the firing threshold of <N> times more than the other group (<N> tasks that spent on an average of <N> sec).	To speed up your app: Decrease the number of mappers by increasing the value of `tez.grouping.split-count`. Also, increase the value of `tez.grouping.min-size` and `tez.grouping.max-size`. To speed up your app: If there are multiple small files that need to be combined, set `hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat`
Too short average task runtime					Runtimes too short for mappers. <N> mappers took on average <= the threshold of <N> min.	To speed up your app: Tune the number of mappers by increasing the value of `tez.grouping.min-size` and `tez.grouping.max-size`.
Too long average task runtime					Runtimes too long for mappers. <N> mappers took on average >= the threshold of <N> min.	To speed up your app: Tune the number of mappers by decreasing the value of `tez.grouping.min-size` and `tez.grouping.max-size`.
Ratio of spilled records to output records					Excessive mapper spill. <N> mappers averaged <N> spills/record. (Firing threshold >= <N> spills/record.)	To speed up your app: Increase the size of the in-memory sort buffer by increasing the value of `tez.runtime.io.sort.mb`. To speed up your app: Increase the buffer spill percentage by increasing the value of `tez.runtime.sort.spill.percent`.