Presto/Trino Recommendations

Pepperdata recommendations for Presto/Trino queries are generated by the Query Profiler, which you must enable for Presto/Trino monitoring when you configure Query Spotlight. The Presto tile in the Recommendations section of the Pepperdata dashboard shows how many recommendations were made during the last 24 hours, along with their severity levels.

Query Spotlight support for Presto/Trino queries is in Technical Preview, and we welcome your feedback.
Supervisor v6.5.24 or later is required to receive recommendations for Presto/Trino queries.

Recommendations information is shown in several places in the Pepperdata dashboard:

  • To see a table of all the Presto/Trino queries that received recommendations at a given severity level, click the linked severity text in the Presto tile.

  • To see the recommendations’ severity levels for all recently run queries, show the Queries Overview page by using the left-nav menu to select Query Spotlight > Queries.

  • To view the Query Profiler report, click the title of the Presto tile, or use the left-nav menu to select Query Spotlight > Query Profiler.

The table describes the Pepperdata recommendations for Presto/Trino queries: each recommendation’s name, its type (general guidance or specific tuning values to change), what triggered the recommendation (the cause), the text of the actual recommendation, and notes that provide additional information.

Because Pepperdata is continually improving the recommendations as more and more applications and queries are profiled, the name, cause, and/or recommendation text might be slightly different from what’s shown in this documentation.

For details about how the recommendations appear in an application’s detail page, see Recommendations Tab.
Presto/Trino Recommendations
Name Type Cause Recommendation Notes
Guidance Tuning

Too large a result set from Presto Cross join

The result set of the query’s Cross join is greater than or equal to <N>.

Rewrite the query to add join conditions and eliminate Cross joins.

Cross join of large tables can be resource-intensive.

Too many joins and too much data processed by joins

The query has more than 5 join operations. The data processed by the join operations exceeds <N SIZE>.

Denormalize tables to reduce or eliminate the need for joins.

A lot of joins that process a lot of data can be resource-intensive.

Query missing LIMIT clause

ORDER BY is used without a LIMIT clause.

ORDER BY clause returns results of a query in sort order. This could cause memory pressure on the worker that is sorting the results, which could result in long execution times or failed queries.

Sorting a large data set can be resource intensive.

Query selecting all columns

All columns are selected in the query.

When running queries, limit the final SELECT statement to only the required columns (not all columns). Trimming the number of columns reduces the data that needs to be processed through the query execution pipeline. This especially helps when querying tables that have a lot of columns that are string-based, and when performing multiple joins or aggregations.

Selecting all columns can be resource-intensive.

Hive tables missing statistics

The following tables are missing relevant table and/or column statistics: <table1>, <table2>.

Run compute stats on the following tables: <table1>, <table2>.

Hive query planner uses table and column statistics to generate effective plans. Missing statistics can result in inaccurate plans and poor query performance.