Friday, June 15, 2012

Data skew in Greenplum

Running this on a Greenplum DB 4.2 system will show you where you have data skew > 10%.

From the Greenplum admin guide gp_skew_idle_fractions:

select * from gp_toolkit.gp_skew_idle_fractions where siffraction > 0.1;

sifoid: objectid of the table
sifnamespace: schema name
sifrelname: table name
siffraction: The percentage of the system that is idle during a table scan, which is an indicator of uneven data distribution or query processing skew. For example, a value of 0.1 indicates 10% skew, a value of 0.5 indicates 50% skew, and so on. Tables that have more than 10% skew should have their distribution policies evaluated.