Function percentile_approx(column, percentage, accuracy=10000) returns the approximate percentile value of the specified numeric column at the given percentage.
Parameter |
Description |
column |
numeric column |
percentage |
percentile to be calculated |
accuracy |
used for approximation algorithm, higher value for better accuracy. OPTIONAL |
The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of `column` at the given percentage array.
select
percentile_approx(value, 0.1) as 10th,
percentile_approx(value, 0.5) as 50th,
percentile_approx(value, 0.9) as 90th
from temperature_stream
select
percentile_approx(value, array(0.1,0.5,0.9)) as quantiles
from temperature_stream
Platforms: WhereOS, Spark, Hive
Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.