AWS Redshift useful admin queries

AWS Redshift has proven to be a highly scalable and reliable data warehouse solution.  Like any other database/data warehouse system, we need to constantly monitor it.  Some systems are more sophisticated than others, some offer better management consoles than others.

In the case of AWS Redshift the console will give you infrastructure or cluster related info related to health, configuration, etc.  But, we also need to monitor things like data health, query execution, space available.  There’s an extensive list of system tables and views that you can query to get this information.  They are easy to identify  because their names are prefixed with STL, STV, SVL or SVV.

I have made my own catalog of useful admin queries I constantly run in order to have a better idea of where I stand.

System Tables

STL – tables are generated from logs that have been persisted to disk to provide a history of the system.

More information, here.

STV – tables are virtual tables that contain snapshots of the current system data.

More information, here.

System Views

SVV – View that reference STV tables.

SVL – Views that reference SVL table.

More information, here,

AWS Redshift useful admin queries  

I have made a list of the queries I use most on a regular basis (which I keep updating constantly):

 

List of current sessions/connections


SELECT *
FROM stv_sessions;

List of locked tables


SELECT *
FROM stv_locks;

Terminate or kill a session/connection


SELECT pg_terminate_backend(PID);

Cancel a running query


select pg_cancel_backend(PID);

Error detail for queries that fail


SELECT *
FROM stl_load_errors
ORDER BY starttime DESC;

Get list of DDL statements

DDL statements include:

CREATE SCHEMA, TABLE, VIEW
DROP SCHEMA, TABLE, VIEW
ALTER SCHEMA, TABLE


SELECT xid,
starttime,
SEQUENCE,
SUBSTRING(TEXT,1,200) AS TEXT
FROM stl_ddltext
ORDER BY xid DESC,
SEQUENCE;

List of tables by user

SELECT *
FROM pg_tables
WHERE schemaname = 'USER_NAME';

Summary data for a given query

SELECT *
FROM svl_query_summary
WHERE query = QUERY_ID
ORDER BY stm, seg, step;

List of queries currently ‘Running’


SELECT pid,
TRIM(user_name),
starttime,
query,
SUBSTRING(query,1,20),
status
FROM stv_recents
WHERE status = 'Running';

Disk space summary

SELECT SUM(capacity) / 1024 AS capacity_gbytes,
SUM(used) / 1024 AS used_gbytes,
(SUM( capacity) - SUM(used)) / 1024 AS free_gbytes
FROM stv_partitions
WHERE part_begin = 0;

Disk space summary by table


SELECT TRIM(pgdb.datname) AS DATABASE,
TRIM(pgn.nspname) AS SCHEMA,
TRIM(a.name) AS TABLE,
b.mbytes,
a.rows
FROM (SELECT db_id, id, name, SUM(ROWS) AS ROWS FROM stv_tbl_perm a GROUP BY db_id, id, name) AS a
JOIN pg_class AS pgc ON pgc.oid = a.id
JOIN pg_namespace AS pgn ON pgn.oid = pgc.relnamespace
JOIN pg_database AS pgdb ON pgdb.oid = a.db_id
JOIN (SELECT tbl, COUNT(*) AS mbytes FROM stv_blocklist GROUP BY tbl) b ON a.id = b.tbl
ORDER BY mbytes DESC, a.db_id, a.name;

Table definition: fields summary including sortkey and distkey


-- Show which schemas are included in the search
SHOW search_path;
-- Add all necessary schemas to include in search path
SET search_path TO '$user','public','SCHEMA_1','SCHEMA_2';
-- Show table definitions
SELECT *
FROM pg_table_def
WHERE schemaname = 'SCHEMA_NAME' -- optional
AND tablename = 'TABLE_NAME'; -- optional

Show query text for specific query


SELECT query,
TRIM(querytxt) AS sqlquery
FROM stl_query
WHERE query = 295235
ORDER BY query DESC LIMIT 5;

 

For more information about AWS Redshift System Tables and Views go here.

I hope this information is as useful for you as it is for me!

Leave a Reply

Your email address will not be published. Required fields are marked *