Posts

Showing posts from June, 2012

Data skew in Greenplum

Running this on a Greenplum DB 4.2 system will show you where you have data skew > 10%. From the Greenplum admin guide gp_skew_idle_fractions: select * from gp_toolkit.gp_skew_idle_fractions where siffraction > 0.1; sifoid: objectid of the table sifnamespace: schema name sifrelname: table name siffraction: The percentage of the system that is idle during a table scan, which is an indicator of uneven data distribution or query processing skew. For example, a value of 0.1 indicates 10% skew, a value of 0.5 indicates 50% skew, and so on. Tables that have more than 10% skew should have their distribution policies evaluated.