So let's say you have 1 million rows of data, grouped into 200 categories (eg... by Product Code)and you are only interested in the top 100 product codes while bucketing the remaining 100 under "Other". Implement this easily by using multiple data streams. In the SAQL below there are 2 data streams labeled q and q1. The limit or top "X" is determined by a toggle which is then implanted in the SAQL thru bindings. Here is the SAQL in non-compact form.
--first data stream q = load \"zDSForecastXProductLineItem\"; q = group q by {{column(static_1.selection,[\"Val\"]).asObject()}}; q = foreach q generate {{column(static_1.selection,[\"Val\"]).asObject()}} as '{{column(static_1.selection,[\"Val\"]).asObject()}}', sum('Total_Value__c') as 'TotValue'; q = order q by 'TotValue' desc; --top N as a binding q = limit q {{column(static_2.selection,[\"lmt\"]).asObject()}}; --2nd data stream q1 = load \"zDSForecastXProductLineItem\"; q1 = group q1 by {{column(static_1.selection,[\"Val\"]).asObject()}}; q1 = foreach q1 generate {{column(static_1.selection,[\"Val\"]).asObject()}} as '{{column(static_1.selection,[\"Val\"]).asObject()}}' ,sum('Total_Value__c') as 'TotValue'; --the "OFFSET" excludes the top N form 1st DS q1 = order q1 by 'TotValue' desc;q1=offset q1 {{column(static_2.selection,[\"lmt\"]).asObject()}}; --these next statements takes the "others" and buckets them into 1 group ie "all" then reprojects q1 = group q1 by all; q1 = foreach q1 generate \"Other\" as '{{column(static_1.selection,[\"Val\"]).asObject()}}', sum('TotValue') as 'TotValue'; --last one generates a datastream called "final" which unions both data stream. final=union q,q1;
0 Comments
Leave a Reply. |
Archives
June 2025
|