Introduction
This is a list of built-in functions of WhereOS, based on Spark & Hive functions and 3rd party libraries. More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store
Function: !
! expr – Logical not.
Class: org.apache.spark.sql.catalyst.expressions.Not
Function: %
expr1 % expr2 – Returns the remainder after `expr1`/`expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Remainder
Function: &
expr1 & expr2 – Returns the result of bitwise AND of `expr1` and `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.BitwiseAnd
Function: *
expr1 * expr2 – Returns `expr1`*`expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Multiply
Function: +
expr1 + expr2 – Returns `expr1`+`expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Add
Function: –
expr1 – expr2 – Returns `expr1`-`expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Subtract
Function: /
expr1 / expr2 – Returns `expr1`/`expr2`. It always performs floating point division.
Class: org.apache.spark.sql.catalyst.expressions.Divide
Function: <
expr1 < expr2 - Returns true if `expr1` is less than `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.LessThan
Function: <=
expr1 <= expr2 - Returns true if `expr1` is less than or equal to `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.LessThanOrEqual
expr1 <=> expr2 – Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
Class: org.apache.spark.sql.catalyst.expressions.EqualNullSafe
Function: =
expr1 = expr2 – Returns true if `expr1` equals `expr2`, or false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.EqualTo
Function: ==
expr1 == expr2 – Returns true if `expr1` equals `expr2`, or false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.EqualTo
expr1 > expr2 – Returns true if `expr1` is greater than `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.GreaterThan
expr1 >= expr2 – Returns true if `expr1` is greater than or equal to `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual
Function: ^
expr1 ^ expr2 – Returns the result of bitwise exclusive OR of `expr1` and `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.BitwiseXor
Function: abs
abs(expr) – Returns the absolute value of the numeric value.
Class: org.apache.spark.sql.catalyst.expressions.Abs
Function: acos
acos(expr) – Returns the inverse cosine (a.k.a. arc cosine) of `expr`, as if computed by `java.lang.Math.acos`.
Class: org.apache.spark.sql.catalyst.expressions.Acos
Function: add_bias
add_bias(feature_vector in array) – Returns features with a bias in array
Class: hivemall.ftvec.AddBiasUDF
Function: add_days
Class: brickhouse.udf.date.AddDaysUDF
Function: add_feature_index
add_feature_index(ARRAY[DOUBLE]: dense feature vector) – Returns a feature vector with feature indices
Class: hivemall.ftvec.AddFeatureIndexUDF
Function: add_field_indices
add_field_indices(array features) – Returns arrays of string that field indices (:)* are augmented
Class: hivemall.ftvec.trans.AddFieldIndicesUDF
Function: add_field_indicies
add_field_indicies(array features) – Returns arrays of string that field indices (:)* are augmented
Class: hivemall.ftvec.trans.AddFieldIndicesUDF
Function: add_months
add_months(start_date, num_months) – Returns the date that is `num_months` after `start_date`.
Class: org.apache.spark.sql.catalyst.expressions.AddMonths
Function: aggregate
aggregate(expr, start, merge, finish) – Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
Class: org.apache.spark.sql.catalyst.expressions.ArrayAggregate
Function: amplify
amplify(const int xtimes, *) – amplify the input records x-times
Class: hivemall.ftvec.amplify.AmplifierUDTF
Function: and
expr1 and expr2 – Logical AND.
Class: org.apache.spark.sql.catalyst.expressions.And
Function: angular_distance
angular_distance(ftvec1, ftvec2) – Returns an angular distance of the given two vectors
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
angular_distance(l.features, r.features) as distance,
distance2similarity(angular_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;
doc1 doc2 distance similarity
1 3 0.31678355 0.75942624
1 2 0.33333337 0.75
2 3 0.09841931 0.91039914
2 1 0.33333337 0.75
3 2 0.09841931 0.91039914
3 1 0.31678355 0.75942624
Class: hivemall.knn.distance.AngularDistanceUDF
Function: angular_similarity
angular_similarity(ftvec1, ftvec2) – Returns an angular similarity of the given two vectors
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
angular_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;
doc1 doc2 similarity
1 3 0.68321645
1 2 0.6666666
2 3 0.9015807
2 1 0.6666666
3 2 0.9015807
3 1 0.68321645
Class: hivemall.knn.similarity.AngularSimilarityUDF
Function: append_array
Class: brickhouse.udf.collect.AppendArrayUDF
Function: approx_count_distinct
approx_count_distinct(expr[, relativeSD]) – Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum estimation error allowed.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
Function: approx_percentile
approx_percentile(col, percentage [, accuracy]) – Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
Function: argmin_kld
argmin_kld(float mean, float covar) – Returns mean or covar that minimize a KL-distance among distributions
The returned value is (1.0 / (sum(1.0 / covar))) * (sum(mean / covar)
Class: hivemall.ensemble.ArgminKLDistanceUDAF
Function: array
array(expr, …) – Returns an array with the given elements.
Class: org.apache.spark.sql.catalyst.expressions.CreateArray
Function: array_append
array_append(array arr, T elem) – Append an element to the end of an array
SELECT array_append(array(1,2),3);
1,2,3
SELECT array_append(array(‘a’,’b’),’c’);
“a”,”b”,”c”
Class: hivemall.tools.array.ArrayAppendUDF
Function: array_avg
array_avg(array) – Returns an array in which each element is the mean of a set of numbers
WITH input as (
select array(1.0, 2.0, 3.0) as nums
UNION ALL
select array(2.0, 3.0, 4.0) as nums
)
select
array_avg(nums)
from
input;
[“1.5″,”2.5″,”3.5”]
Class: hivemall.tools.array.ArrayAvgGenericUDAF
Function: array_concat
array_concat(array x1, array x2, ..) – Returns a concatenated array
SELECT array_concat(array(1),array(2,3));
[1,2,3]
Class: hivemall.tools.array.ArrayConcatUDF
Function: array_contains
array_contains(array, value) – Returns true if the array contains the value.
Class: org.apache.spark.sql.catalyst.expressions.ArrayContains
Function: array_distinct
array_distinct(array) – Removes duplicate values from the array.
Class: org.apache.spark.sql.catalyst.expressions.ArrayDistinct
Function: array_except
array_except(array1, array2) – Returns an array of the elements in array1 but not in array2,without duplicates.
Class: org.apache.spark.sql.catalyst.expressions.ArrayExcept
Function: array_flatten
array_flatten(array>) – Returns an array with the elements flattened.
SELECT array_flatten(array(array(1,2,3),array(4,5),array(6,7,8)));
[1,2,3,4,5,6,7,8]
Class: hivemall.tools.array.ArrayFlattenUDF
Function: array_hash_values
array_hash_values(array values, [string prefix [, int numFeatures], boolean useIndexAsPrefix]) returns hash values in array
Class: hivemall.ftvec.hashing.ArrayHashValuesUDF
Function: array_index
Class: brickhouse.udf.collect.ArrayIndexUDF
Function: array_intersect
array_intersect(array1, array2) – Returns an array of the elements in the intersection of array1 andarray2, without duplicates.
Class: org.apache.spark.sql.catalyst.expressions.ArrayIntersect
Function: array_join
array_join(array, delimiter[, nullReplacement]) – Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.
Class: org.apache.spark.sql.catalyst.expressions.ArrayJoin
Function: array_max
array_max(array) – Returns the maximum value in the array. NULL elements are skipped.
Class: org.apache.spark.sql.catalyst.expressions.ArrayMax
Function: array_min
array_min(array) – Returns the minimum value in the array. NULL elements are skipped.
Class: org.apache.spark.sql.catalyst.expressions.ArrayMin
Function: array_position
array_position(array, element) – Returns the (1-based) index of the first element of the array as long.
Class: org.apache.spark.sql.catalyst.expressions.ArrayPosition
Function: array_remove
array_remove(array, element) – Remove all elements that equal to element from array.
Class: org.apache.spark.sql.catalyst.expressions.ArrayRemove
Function: array_repeat
array_repeat(element, count) – Returns the array containing element count times.
Class: org.apache.spark.sql.catalyst.expressions.ArrayRepeat
Function: array_slice
array_slice(array values, int offset [, int length]) – Slices the given array by the given offset and length parameters.
SELECT
array_slice(array(1,2,3,4,5,6),2,4),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
0, — offset
2 — length
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
6, — offset
3 — length
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
6, — offset
10 — length
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
6 — offset
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
-3 — offset
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
-3, — offset
2 — length
);
[3,4]
[“zero”,”one”]
[“six”,”seven”,”eight”]
[“six”,”seven”,”eight”,”nine”,”ten”]
[“six”,”seven”,”eight”,”nine”,”ten”]
[“eight”,”nine”,”ten”]
[“eight”,”nine”]
Class: hivemall.tools.array.ArraySliceUDF
Function: array_sort
array_sort(array) – Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.
Class: org.apache.spark.sql.catalyst.expressions.ArraySort
Function: array_sum
array_sum(array) – Returns an array in which each element is summed up
WITH input as (
select array(1.0, 2.0, 3.0) as nums
UNION ALL
select array(2.0, 3.0, 4.0) as nums
)
select
array_sum(nums)
from
input;
[“3.0″,”5.0″,”7.0”]
Class: hivemall.tools.array.ArraySumUDAF
Function: array_to_str
array_to_str(array arr [, string sep=’,’]) – Convert array to string using a sperator
SELECT array_to_str(array(1,2,3),’-‘);
1-2-3
Class: hivemall.tools.array.ArrayToStrUDF
Function: array_union
array_union(array1, array2) – Returns an array of the elements in the union of array1 and array2, without duplicates.
Class: org.apache.spark.sql.catalyst.expressions.ArrayUnion
Function: arrays_overlap
arrays_overlap(a1, a2) – Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.ArraysOverlap
Function: arrays_zip
arrays_zip(a1, a2, …) – Returns a merged array of structs in which the N-th struct contains allN-th values of input arrays.
Class: org.apache.spark.sql.catalyst.expressions.ArraysZip
Function: ascii
ascii(str) – Returns the numeric value of the first character of `str`.
Class: org.apache.spark.sql.catalyst.expressions.Ascii
Function: asin
asin(expr) – Returns the inverse sine (a.k.a. arc sine) the arc sin of `expr`, as if computed by `java.lang.Math.asin`.
Class: org.apache.spark.sql.catalyst.expressions.Asin
Function: assert
Asserts in case boolean input is false. Optionally it asserts with message if input string provided. assert(boolean) assert(boolean, string)
Class: brickhouse.udf.sanity.AssertUDF
Function: assert_equals
Class: brickhouse.udf.sanity.AssertEqualsUDF
Function: assert_less_than
Class: brickhouse.udf.sanity.AssertLessThanUDF
Function: assert_true
assert_true(expr) – Throws an exception if `expr` is not true.
Class: org.apache.spark.sql.catalyst.expressions.AssertTrue
Function: atan
atan(expr) – Returns the inverse tangent (a.k.a. arc tangent) of `expr`, as if computed by `java.lang.Math.atan`
Class: org.apache.spark.sql.catalyst.expressions.Atan
Function: atan2
atan2(exprY, exprX) – Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`exprX`, `exprY`), as if computed by `java.lang.Math.atan2`.
Class: org.apache.spark.sql.catalyst.expressions.Atan2
Function: auc
auc(array rankItems | double score, array correctItems | int label [, const int recommendSize = rankItems.size ]) – Returns AUC
Class: hivemall.evaluation.AUCUDAF
Function: average_precision
average_precision(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) – Returns MAP
Class: hivemall.evaluation.MAPUDAF
Function: avg
avg(expr) – Returns the mean calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Average
Function: base64
base64(bin) – Converts the argument from a binary `bin` to a base 64 string.
Class: org.apache.spark.sql.catalyst.expressions.Base64
Function: base91
base91(BINARY bin) – Convert the argument from binary to a BASE91 string
SELECT base91(deflate(‘aaaaaaaaaaaaaaaabbbbccc’));
AA+=kaIM|WTt!+wbGAA
Class: hivemall.tools.text.Base91UDF
Function: bbit_minhash
bbit_minhash(array<> features [, int numHashes]) – Returns a b-bits minhash value
Class: hivemall.knn.lsh.bBitMinHashUDF
Function: bigint
bigint(expr) – Casts the value `expr` to the target data type `bigint`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: bin
bin(expr) – Returns the string representation of the long value `expr` represented in binary.
Class: org.apache.spark.sql.catalyst.expressions.Bin
Function: binarize_label
binarize_label(int/long positive, int/long negative, …) – Returns positive/negative records that are represented as (…, int label) where label is 0 or 1
Class: hivemall.ftvec.trans.BinarizeLabelUDTF
Function: binary
binary(expr) – Casts the value `expr` to the target data type `binary`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: bit_length
bit_length(expr) – Returns the bit length of string data or number of bits of binary data.
Class: org.apache.spark.sql.catalyst.expressions.BitLength
Function: bits_collect
bits_collect(int|long x) – Returns a bitset in array
Class: hivemall.tools.bits.BitsCollectUDAF
Function: bits_or
bits_or(array b1, array b2, ..) – Returns a logical OR given bitsets
SELECT unbits(bits_or(to_bits(array(1,4)),to_bits(array(2,3))));
[1,2,3,4]
Class: hivemall.tools.bits.BitsORUDF
Function: bloom
Constructs a BloomFilter by aggregating a set of keys bloom(string key)
Class: brickhouse.udf.bloom.BloomUDAF
Function: bloom_and
Returns the logical AND of two bloom filters; representing the intersection of values in both bloom1 AND bloom2 bloom_and(string bloom1, string bloom2)
Class: brickhouse.udf.bloom.BloomAndUDF
Function: bloom_contains
Returns true if the referenced bloom filter contains the key.. bloom_contains(string key, string bloomfilter)
Class: brickhouse.udf.bloom.BloomContainsUDF
Function: bloom_contains_any
bloom_contains_any(string bloom, string key) or bloom_contains_any(string bloom, array keys)- Returns true if the bloom filter contains any of the given key
WITH data1 as (
SELECT explode(array(1,2,3,4,5)) as id
),
data2 as (
SELECT explode(array(1,3,5,6,8)) as id
),
bloom as (
SELECT bloom(id) as bf
FROM data1
)
SELECT
l.*
FROM
data2 l
CROSS JOIN bloom r
WHERE
bloom_contains_any(r.bf, array(l.id))
Class: hivemall.sketch.bloom.BloomContainsAnyUDF
Function: bloom_not
Returns the logical NOT of a bloom filters; representing the set of values NOT in bloom1 bloom_not(string bloom)
Class: brickhouse.udf.bloom.BloomNotUDF
Function: bloom_or
Returns the logical OR of two bloom filters; representing the intersection of values in either bloom1 OR bloom2 bloom_or(string bloom1, string bloom2)
Class: brickhouse.udf.bloom.BloomOrUDF
Function: boolean
boolean(expr) – Casts the value `expr` to the target data type `boolean`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: bpr_sampling
bpr_sampling(int userId, List posItems [, const string options])- Returns a relation consists of
Class: hivemall.ftvec.ranking.BprSamplingUDTF
Function: bround
bround(expr, d) – Returns `expr` rounded to `d` decimal places using HALF_EVEN rounding mode.
Class: org.apache.spark.sql.catalyst.expressions.BRound
Function: build_bins
build_bins(number weight, const int num_of_bins[, const boolean auto_shrink = false]) – Return quantiles representing bins: array
Class: hivemall.ftvec.binning.BuildBinsUDAF
Function: call_kone_elevator
Class: com.whereos.udf.KONEElevatorCallUDF
Function: cardinality
cardinality(expr) – Returns the size of an array or a map.The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true.If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input.By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
Class: org.apache.spark.sql.catalyst.expressions.Size
Function: cast
cast(expr AS type) – Casts the value `expr` to the target data type `type`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: cast_array
Class: brickhouse.udf.collect.CastArrayUDF
Function: cast_map
Class: brickhouse.udf.collect.CastMapUDF
Function: categorical_features
categorical_features(array featureNames, feature1, feature2, .. [, const string options]) – Returns a feature vector array
Class: hivemall.ftvec.trans.CategoricalFeaturesUDF
Function: cbrt
cbrt(expr) – Returns the cube root of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Cbrt
Function: ceil
ceil(expr) – Returns the smallest integer not smaller than `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Ceil
Function: ceiling
ceiling(expr) – Returns the smallest integer not smaller than `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Ceil
Function: changefinder
changefinder(double|array x [, const string options]) – Returns outlier/change-point scores and decisions using ChangeFinder. It will return a tuple
Class: hivemall.anomaly.ChangeFinderUDF
Function: char
char(expr) – Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)
Class: org.apache.spark.sql.catalyst.expressions.Chr
Function: char_length
char_length(expr) – Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
Class: org.apache.spark.sql.catalyst.expressions.Length
Function: character_length
character_length(expr) – Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
Class: org.apache.spark.sql.catalyst.expressions.Length
Function: chi2
chi2(array> observed, array> expected) – Returns chi2_val and p_val of each columns as , array>
Class: hivemall.ftvec.selection.ChiSquareUDF
Function: chr
chr(expr) – Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)
Class: org.apache.spark.sql.catalyst.expressions.Chr
Function: coalesce
coalesce(expr1, expr2, …) – Returns the first non-null argument if exists. Otherwise, null.
Class: org.apache.spark.sql.catalyst.expressions.Coalesce
Function: collect
collect(x) – Returns an array of all the elements in the aggregation group
Class: brickhouse.udf.collect.CollectUDAF
Function: collect_list
collect_list(expr) – Collects and returns a list of non-unique elements.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.CollectList
Function: collect_max
collect_max(x, val, n) – Returns an map of the max N numeric values in the aggregation group
Class: brickhouse.udf.collect.CollectMaxUDAF
Function: collect_set
collect_set(expr) – Collects and returns a set of unique elements.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.CollectSet
Function: combine
combine(a,b) – Returns a combined list of two lists, or a combined map of two maps
Class: brickhouse.udf.collect.CombineUDF
Function: combine_hyperloglog
combine_hyperloglog(x) – Combined two HyperLogLog++ binary blobs.
Class: brickhouse.udf.hll.CombineHyperLogLogUDF
Function: combine_previous_sketch
combine_previous_sketch(grouping, map) – Returns a map of the combined keys of previous calls to this
Class: brickhouse.udf.sketch.CombinePreviousSketchUDF
Function: combine_sketch
combine_sketch(x) – Combine two sketch sets.
Class: brickhouse.udf.sketch.CombineSketchUDF
Function: combine_unique
combine_unique(x) – Returns an array of all distinct elements of all lists in the aggregation group
Class: brickhouse.udf.collect.CombineUniqueUDAF
Function: concat
concat(col1, col2, …, colN) – Returns the concatenation of col1, col2, …, colN.
Class: org.apache.spark.sql.catalyst.expressions.Concat
Function: concat_array
concat_array(array x1, array x2, ..) – Returns a concatenated array
SELECT array_concat(array(1),array(2,3));
[1,2,3]
Class: hivemall.tools.array.ArrayConcatUDF
Function: concat_ws
concat_ws(sep, [str | array(str)]+) – Returns the concatenation of the strings separated by `sep`.
Class: org.apache.spark.sql.catalyst.expressions.ConcatWs
Function: conditional_emit
conditional_emit(a,b) – Emit features of a row according to various conditions
Class: brickhouse.udf.collect.ConditionalEmit
Function: conv
conv(num, from_base, to_base) – Convert `num` from `from_base` to `to_base`.
Class: org.apache.spark.sql.catalyst.expressions.Conv
Function: conv2dense
conv2dense(int feature, float weight, int nDims) – Return a dense model in array
Class: hivemall.ftvec.conv.ConvertToDenseModelUDAF
Function: convert_label
convert_label(const int|const float) – Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1
Class: hivemall.tools.ConvertLabelUDF
Function: convert_to_sketch
convert_to_sketch(x) – Truncate a large array of strings, and return a list of strings representing a sketch of those items
Class: brickhouse.udf.sketch.ConvertToSketchUDF
Function: corr
corr(expr1, expr2) – Returns Pearson coefficient of correlation between a set of number pairs.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Corr
Function: cos
cos(expr) – Returns the cosine of `expr`, as if computed by `java.lang.Math.cos`.
Class: org.apache.spark.sql.catalyst.expressions.Cos
Function: cosh
cosh(expr) – Returns the hyperbolic cosine of `expr`, as if computed by `java.lang.Math.cosh`.
Class: org.apache.spark.sql.catalyst.expressions.Cosh
Function: cosine_distance
cosine_distance(ftvec1, ftvec2) – Returns a cosine distance of the given two vectors
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
cosine_distance(l.features, r.features) as distance,
distance2similarity(cosine_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;
doc1 doc2 distance similarity
1 3 0.45566893 0.6869694
1 2 0.5 0.6666667
2 3 0.04742068 0.95472616
2 1 0.5 0.6666667
3 2 0.04742068 0.95472616
3 1 0.45566893 0.6869694
Class: hivemall.knn.distance.CosineDistanceUDF
Function: cosine_similarity
cosine_similarity(ftvec1, ftvec2) – Returns a cosine similarity of the given two vectors
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
cosine_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;
doc1 doc2 similarity
1 3 0.5443311
1 2 0.5
2 3 0.9525793
2 1 0.5
3 2 0.9525793
3 1 0.5443311
Class: hivemall.knn.similarity.CosineSimilarityUDF
Function: cot
cot(expr) – Returns the cotangent of `expr`, as if computed by `1/java.lang.Math.cot`.
Class: org.apache.spark.sql.catalyst.expressions.Cot
Function: count
count(*) – Returns the total number of retrieved rows, including rows containing null. count(expr[, expr…]) – Returns the number of rows for which the supplied expression(s) are all non-null. count(DISTINCT expr[, expr…]) – Returns the number of rows for which the supplied expression(s) are unique and non-null.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count
Function: count_min_sketch
count_min_sketch(col, eps, confidence, seed) – Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.CountMinSketchAgg
Function: covar_pop
covar_pop(expr1, expr2) – Returns the population covariance of a set of number pairs.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.CovPopulation
Function: covar_samp
covar_samp(expr1, expr2) – Returns the sample covariance of a set of number pairs.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.CovSample
Function: crc32
crc32(expr) – Returns a cyclic redundancy check value of the `expr` as a bigint.
Class: org.apache.spark.sql.catalyst.expressions.Crc32
Function: cube
cube([col1[, col2 ..]]) – create a multi-dimensional cube using the specified columns so that we can run aggregation on them.
Class: org.apache.spark.sql.catalyst.expressions.Cube
Function: cume_dist
cume_dist() – Computes the position of a value relative to all values in the partition.
Class: org.apache.spark.sql.catalyst.expressions.CumeDist
Function: current_database
current_database() – Returns the current database.
Class: org.apache.spark.sql.catalyst.expressions.CurrentDatabase
Function: current_date
current_date() – Returns the current date at the start of query evaluation.
Class: org.apache.spark.sql.catalyst.expressions.CurrentDate
Function: current_timestamp
current_timestamp() – Returns the current timestamp at the start of query evaluation.
Class: org.apache.spark.sql.catalyst.expressions.CurrentTimestamp
Function: date
date(expr) – Casts the value `expr` to the target data type `date`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: date_add
date_add(start_date, num_days) – Returns the date that is `num_days` after `start_date`.
Class: org.apache.spark.sql.catalyst.expressions.DateAdd
date_format(timestamp, fmt) – Converts `timestamp` to a value of string in the format specified by the date format `fmt`.
Class: org.apache.spark.sql.catalyst.expressions.DateFormatClass
Function: date_range
date_range(a,b,c) – Generates a range of integers from a to b incremented by c or the elements of a map into multiple rows and columns
Class: brickhouse.udf.date.DateRangeUDTF
Function: date_sub
date_sub(start_date, num_days) – Returns the date that is `num_days` before `start_date`.
Class: org.apache.spark.sql.catalyst.expressions.DateSub
Function: date_trunc
date_trunc(fmt, ts) – Returns timestamp `ts` truncated to the unit specified by the format model `fmt`.`fmt` should be one of [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
Class: org.apache.spark.sql.catalyst.expressions.TruncTimestamp
Function: datediff
datediff(endDate, startDate) – Returns the number of days from `startDate` to `endDate`.
Class: org.apache.spark.sql.catalyst.expressions.DateDiff
Function: dateseries
Class: com.whereos.udf.DateSeriesUDF
Function: day
day(date) – Returns the day of month of the date/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.DayOfMonth
Function: dayofmonth
dayofmonth(date) – Returns the day of month of the date/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.DayOfMonth
Function: dayofweek
dayofweek(date) – Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, …, 7 = Saturday).
Class: org.apache.spark.sql.catalyst.expressions.DayOfWeek
Function: dayofyear
dayofyear(date) – Returns the day of year of the date/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.DayOfYear
Function: decimal
decimal(expr) – Casts the value `expr` to the target data type `decimal`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: decode
decode(bin, charset) – Decodes the first argument using the second argument character set.
Class: org.apache.spark.sql.catalyst.expressions.Decode
Function: deflate
deflate(TEXT data [, const int compressionLevel]) – Returns a compressed BINARY object by using Deflater. The compression level must be in range [-1,9]
SELECT base91(deflate(‘aaaaaaaaaaaaaaaabbbbccc’));
AA+=kaIM|WTt!+wbGAA
Class: hivemall.tools.compress.DeflateUDF
Function: degrees
degrees(expr) – Converts radians to degrees.
Class: org.apache.spark.sql.catalyst.expressions.ToDegrees
Function: dense_rank
dense_rank() – Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.
Class: org.apache.spark.sql.catalyst.expressions.DenseRank
Function: dimsum_mapper
dimsum_mapper(array row, map colNorms [, const string options]) – Returns column-wise partial similarities
Class: hivemall.knn.similarity.DIMSUMMapperUDTF
Function: distance2similarity
distance2similarity(float d) – Returns 1.0 / (1.0 + d)
Class: hivemall.knn.similarity.Distance2SimilarityUDF
Function: distcache_gets
distcache_gets(filepath, key, default_value [, parseKey]) – Returns map|value_type
Class: hivemall.tools.mapred.DistributedCacheLookupUDF
Function: distributed_bloom
Loads a bloomfilter from a file in distributed cache, and makes available as a named bloom. distributed_bloom(string filename) distributed_bloom(string filename, boolean returnEncoded)
Class: brickhouse.udf.bloom.DistributedBloomUDF
Function: distributed_map
Class: brickhouse.udf.dcache.DistributedMapUDF
Function: double
double(expr) – Casts the value `expr` to the target data type `double`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: e
e() – Returns Euler’s number, e.
Class: org.apache.spark.sql.catalyst.expressions.EulerNumber
Function: each_top_k
each_top_k(int K, Object group, double cmpKey, *) – Returns top-K values (or tail-K values when k is less than 0)
Class: hivemall.tools.EachTopKUDTF
Function: element_at
element_at(array, index) – Returns element of array at given (1-based) index. If index < 0, accesses elements from the last to the first. Returns NULL if the index exceeds the length of the array. element_at(map, key) - Returns value for given key, or NULL if the key is not contained in the map
Class: org.apache.spark.sql.catalyst.expressions.ElementAt
Function: elt
elt(n, input1, input2, …) – Returns the `n`-th input, e.g., returns `input2` when `n` is 2.
Class: org.apache.spark.sql.catalyst.expressions.Elt
Function: encode
encode(str, charset) – Encodes the first argument using the second argument character set.
Class: org.apache.spark.sql.catalyst.expressions.Encode
Function: estimated_reach
estimated_reach(x) – Estimate reach from a sketch set of Strings.
Class: brickhouse.udf.sketch.EstimatedReachUDF
Function: euclid_distance
euclid_distance(ftvec1, ftvec2) – Returns the square root of the sum of the squared differences: sqrt(sum((x – y)^2))
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
euclid_distance(l.features, r.features) as distance,
distance2similarity(euclid_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;
doc1 doc2 distance similarity
1 2 2.4494898 0.28989795
1 3 2.6457512 0.2742919
2 3 1.0 0.5
2 1 2.4494898 0.28989795
3 2 1.0 0.5
3 1 2.6457512 0.2742919
Class: hivemall.knn.distance.EuclidDistanceUDF
Function: euclid_similarity
euclid_similarity(ftvec1, ftvec2) – Returns a euclid distance based similarity, which is `1.0 / (1.0 + distance)`, of the given two vectors
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
euclid_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;
doc1 doc2 similarity
1 2 0.28989795
1 3 0.2742919
2 3 0.5
2 1 0.28989795
3 2 0.5
3 1 0.2742919
Class: hivemall.knn.similarity.EuclidSimilarity
Function: exists
exists(expr, pred) – Tests whether a predicate holds for one or more elements in the array.
Class: org.apache.spark.sql.catalyst.expressions.ArrayExists
Function: exp
exp(expr) – Returns e to the power of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Exp
Function: explode
explode(expr) – Separates the elements of array `expr` into multiple rows, or the elements of map `expr` into multiple rows and columns.
Class: org.apache.spark.sql.catalyst.expressions.Explode
Function: explode_outer
explode_outer(expr) – Separates the elements of array `expr` into multiple rows, or the elements of map `expr` into multiple rows and columns.
Class: org.apache.spark.sql.catalyst.expressions.Explode
Function: explodegeometry
Class: com.whereos.udf.ExplodeGeometryUDTF
Function: explodemultipolygon
Class: com.whereos.udf.ExplodeMultiPolygonUDTF
Function: expm1
expm1(expr) – Returns exp(`expr`) – 1.
Class: org.apache.spark.sql.catalyst.expressions.Expm1
extract_feature(feature_vector in array) – Returns features in array
Class: hivemall.ftvec.ExtractFeatureUDF
extract_weight(feature_vector in array) – Returns the weights of features in array
Class: hivemall.ftvec.ExtractWeightUDF
Class: com.whereos.udf.ExtractFramesUDTF
Class: com.whereos.udf.ExtractPixelsUDTF
Function: f1score
f1score(array[int], array[int]) – Return a F1 score
Class: hivemall.evaluation.F1ScoreUDAF
Function: factorial
factorial(expr) – Returns the factorial of `expr`. `expr` is [0..20]. Otherwise, null.
Class: org.apache.spark.sql.catalyst.expressions.Factorial
Function: feature
feature( feature, value) – Returns a feature string
Class: hivemall.ftvec.FeatureUDF
Function: feature_binning
feature_binning(array features, map> quantiles_map) – returns a binned feature vector as an array feature_binning(number weight, array quantiles) – returns bin ID as int
WITH extracted as (
select
extract_feature(feature) as index,
extract_weight(feature) as value
from
input l
LATERAL VIEW explode(features) r as feature
),
mapping as (
select
index,
build_bins(value, 5, true) as quantiles — 5 bins with auto bin shrinking
from
extracted
group by
index
),
bins as (
select
to_map(index, quantiles) as quantiles
from
mapping
)
select
l.features as original,
feature_binning(l.features, r.quantiles) as features
from
input l
cross join bins r
> [“name#Jacob”,”gender#Male”,”age:20.0″] [“name#Jacob”,”gender#Male”,”age:2″]
> [“name#Isabella”,”gender#Female”,”age:20.0″] [“name#Isabella”,”gender#Female”,”age:2″]
Class: hivemall.ftvec.binning.FeatureBinningUDF
Function: feature_hashing
feature_hashing(array features [, const string options]) – returns a hashed feature vector in array
select feature_hashing(array(‘aaa:1.0′,’aaa’,’bbb:2.0′), ‘-libsvm’);
[“4063537:1.0″,”4063537:1″,”8459207:2.0”]
select feature_hashing(array(‘aaa:1.0′,’aaa’,’bbb:2.0′), ‘-features 10’);
[“7:1.0″,”7″,”1:2.0”]
select feature_hashing(array(‘aaa:1.0′,’aaa’,’bbb:2.0′), ‘-features 10 -libsvm’);
[“1:2.0″,”7:1.0″,”7:1”]
Class: hivemall.ftvec.hashing.FeatureHashingUDF
Function: feature_index
feature_index(feature_vector in array) – Returns feature indices in array
Class: hivemall.ftvec.FeatureIndexUDF
Function: feature_pairs
feature_pairs(feature_vector in array, [, const string options]) – Returns a relation
Class: hivemall.ftvec.pairing.FeaturePairsUDTF
Function: ffm_features
ffm_features(const array featureNames, feature1, feature2, .. [, const string options]) – Takes categorical variables and returns a feature vector array in a libffm format ::
Class: hivemall.ftvec.trans.FFMFeaturesUDF
Function: filter
filter(expr, func) – Filters the input array using the given predicate.
Class: org.apache.spark.sql.catalyst.expressions.ArrayFilter
Function: find_in_set
find_in_set(str, str_array) – Returns the index (1-based) of the given string (`str`) in the comma-delimited list (`str_array`). Returns 0, if the string was not found or if the given string (`str`) contains a comma.
Class: org.apache.spark.sql.catalyst.expressions.FindInSet
Function: first
first(expr[, isIgnoreNull]) – Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.First
Function: first_element
first_element(x) – Returns the first element in an array
SELECT first_element(array(‘a’,’b’,’c’));
a
SELECT first_element(array());
NULL
Class: hivemall.tools.array.FirstElementUDF
Function: first_index
first_index(x) – Last value in an array
Class: brickhouse.udf.collect.FirstIndexUDF
Function: first_value
first_value(expr[, isIgnoreNull]) – Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.First
Function: flatten
flatten(arrayOfArrays) – Transforms an array of arrays into a single array.
Class: org.apache.spark.sql.catalyst.expressions.Flatten
Function: float
float(expr) – Casts the value `expr` to the target data type `float`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: float_array
float_array(nDims) – Returns an array of nDims elements
Class: hivemall.tools.array.AllocFloatArrayUDF
Function: floor
floor(expr) – Returns the largest integer not greater than `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Floor
Function: fmeasure
fmeasure(array|int|boolean actual, array|int| boolean predicted [, const string options]) – Return a F-measure (f1score is the special with beta=1.0)
Class: hivemall.evaluation.FMeasureUDAF
format_number(expr1, expr2) – Formats the number `expr1` like ‘#,###,###.##’, rounded to `expr2` decimal places. If `expr2` is 0, the result has no decimal point or fractional part. `expr2` also accept a user specified format. This is supposed to function like MySQL’s FORMAT.
Class: org.apache.spark.sql.catalyst.expressions.FormatNumber
format_string(strfmt, obj, …) – Returns a formatted string from printf-style format strings.
Class: org.apache.spark.sql.catalyst.expressions.FormatString
Function: from_camel_case
from_camel_case(a) – Converts a string in CamelCase to one containing underscores.
Class: brickhouse.udf.json.ConvertFromCamelCaseUDF
Function: from_json
from_json(jsonStr, schema[, options]) – Returns a struct value with the given `jsonStr` and `schema`.
Class: org.apache.spark.sql.catalyst.expressions.JsonToStructs
Function: from_unixtime
from_unixtime(unix_time, format) – Returns `unix_time` in the specified `format`.
Class: org.apache.spark.sql.catalyst.expressions.FromUnixTime
Function: from_utc_timestamp
from_utc_timestamp(timestamp, timezone) – Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, ‘GMT+1’ would yield ‘2017-07-14 03:40:00.0’.
Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp
Function: generate_series
generate_series(const int|bigint start, const int|bigint end) – Generate a series of values, from start to end. A similar function to PostgreSQL’s [generate_serics](https://www.postgresql.org/docs/current/static/functions-srf.html)
SELECT generate_series(2,4);
2
3
4
SELECT generate_series(5,1,-2);
5
3
1
SELECT generate_series(4,3);
(no return)
SELECT date_add(current_date(),value),value from (SELECT generate_series(1,3)) t;
2018-04-21 1
2018-04-22 2
2018-04-23 3
WITH input as (
SELECT 1 as c1, 10 as c2, 3 as step
UNION ALL
SELECT 10, 2, -3
)
SELECT generate_series(c1, c2, step) as series
FROM input;
1
4
7
10
10
7
4
Class: hivemall.tools.GenerateSeriesUDTF
Function: generateheatmap
Class: com.whereos.udf.HeatmapGenerateUDTF
Function: geocode
Class: com.whereos.udf.GeocodingUDTF
Function: geokeyradius
Class: com.whereos.udf.GeoKeyRadiusUDTF
Function: geokeys
Class: com.whereos.udf.GeoKeysUDTF
Function: get_json_object
get_json_object(json_txt, path) – Extracts a json object from `path`.
Class: org.apache.spark.sql.catalyst.expressions.GetJsonObject
Function: greatest
greatest(expr, …) – Returns the greatest value of all parameters, skipping null values.
Class: org.apache.spark.sql.catalyst.expressions.Greatest
Function: group_count
A sequence id for all rows with the same value for a specific grouping
Class: brickhouse.udf.collect.GroupCountUDF
Function: grouping
grouping(col) – indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.”,
Class: org.apache.spark.sql.catalyst.expressions.Grouping
Function: grouping_id
grouping_id([col1[, col2 ..]]) – returns the level of grouping, equals to `(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)`
Class: org.apache.spark.sql.catalyst.expressions.GroupingID
Function: guess_attribute_types
guess_attribute_types(ANY, …) – Returns attribute types
select guess_attribute_types(*) from train limit 1;
Q,Q,C,C,C,C,Q,C,C,C,Q,C,Q,Q,Q,Q,C,Q
Class: hivemall.smile.tools.GuessAttributesUDF
Function: hamming_distance
hamming_distance(integer A, integer B) – Returns Hamming distance between A and B
select
hamming_distance(0,3) as c1,
hamming_distance(“0″,”3”) as c2 — 0=0x00, 3=0x11
;
c1 c2
2 2
Class: hivemall.knn.distance.HammingDistanceUDF
Function: hash
hash(expr1, expr2, …) – Returns a hash value of the arguments.
Class: org.apache.spark.sql.catalyst.expressions.Murmur3Hash
Function: hash_md5
Class: brickhouse.udf.sketch.HashMD5UDF
Function: haversine_distance
haversine_distance(double lat1, double lon1, double lat2, double lon2, [const boolean mile=false])::double – return distance between two locations in km [or miles] using `haversine` formula
Usage: select latlon_distance(lat1, lon1, lat2, lon2) from …
Class: hivemall.geospatial.HaversineDistanceUDF
Function: hbase_balanced_key
hbase_balanced_key(keyStr,numRegions) – Returns an HBase key balanced evenly across regions
Class: brickhouse.hbase.GenerateBalancedKeyUDF
Function: hbase_batch_get
hbase_batch_get(table,key,family) – Do a single HBase Get on a table
Class: brickhouse.hbase.BatchGetUDF
Function: hbase_batch_put
hbase_batch_put(config_map, key, value) – Perform batch HBase updates of a table
Class: brickhouse.hbase.BatchPutUDAF
Function: hbase_cached_get
hbase_cached_get(configMap,key,template) – Returns a cached object, given an HBase config, a key, and a template object used to interpret JSON
Class: brickhouse.hbase.CachedGetUDF
Function: hbase_get
hbase_get(table,key,family) – Do a single HBase Get on a table
Class: brickhouse.hbase.GetUDF
Function: hbase_put
string hbase_put(config, map key_value) – string hbase_put(config, key, value) – Do a HBase Put on a table. Config must contain zookeeper quorum, table name, column, and qualifier. Example of usage: hbase_put(map(‘hbase.zookeeper.quorum’, ‘hb-zoo1,hb-zoo2’, ‘table_name’, ‘metrics’, ‘family’, ‘c’, ‘qualifier’, ‘q’), ‘test.prod.visits.total’, ‘123456’)
Class: brickhouse.hbase.PutUDF
Function: hex
hex(expr) – Converts `expr` to hexadecimal.
Class: org.apache.spark.sql.catalyst.expressions.Hex
Function: hitrate
hitrate(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) – Returns HitRate
Class: hivemall.evaluation.HitRateUDAF
Function: hivemall_version
hivemall_version() – Returns the version of Hivemall
SELECT hivemall_version();
Class: hivemall.HivemallVersionUDF
Function: hll_est_cardinality
hll_est_cardinality(x) – Estimate reach from a HyperLogLog++.
Class: brickhouse.udf.hll.EstimateCardinalityUDF
Function: hour
hour(timestamp) – Returns the hour component of the string/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.Hour
Function: hyperloglog
hyperloglog(x, [b]) – Constructs a HyperLogLog++ estimator to estimate reach for large values, with optional bit parameter for specifying precision (b must be in [4,16]). Default is b = 6. Returns a binary value that represents the HyperLogLog++ data structure.
Class: brickhouse.udf.hll.HyperLogLogUDAF
Function: hypot
hypot(expr1, expr2) – Returns sqrt(`expr1`**2 + `expr2`**2).
Class: org.apache.spark.sql.catalyst.expressions.Hypot
Function: if
if(expr1, expr2, expr3) – If `expr1` evaluates to true, then returns `expr2`; otherwise returns `expr3`.
Class: org.apache.spark.sql.catalyst.expressions.If
Function: ifnull
ifnull(expr1, expr2) – Returns `expr2` if `expr1` is null, or `expr1` otherwise.
Class: org.apache.spark.sql.catalyst.expressions.IfNull
Function: in
expr1 in(expr2, expr3, …) – Returns true if `expr` equals to any valN.
Class: org.apache.spark.sql.catalyst.expressions.In
Function: indexed_features
indexed_features(double v1, double v2, …) – Returns a list of features as array: [1:v1, 2:v2, ..]
Class: hivemall.ftvec.trans.IndexedFeatures
Function: infinity
infinity() – Returns the constant representing positive infinity.
Class: hivemall.tools.math.InfinityUDF
Function: inflate
inflate(BINARY compressedData) – Returns a decompressed STRING by using Inflater
SELECT inflate(unbase91(base91(deflate(‘aaaaaaaaaaaaaaaabbbbccc’))));
aaaaaaaaaaaaaaaabbbbccc
Class: hivemall.tools.compress.InflateUDF
Function: initcap
initcap(str) – Returns `str` with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.
Class: org.apache.spark.sql.catalyst.expressions.InitCap
Function: inline
inline(expr) – Explodes an array of structs into a table.
Class: org.apache.spark.sql.catalyst.expressions.Inline
Function: inline_outer
inline_outer(expr) – Explodes an array of structs into a table.
Class: org.apache.spark.sql.catalyst.expressions.Inline
input_file_block_length() – Returns the length of the block being read, or -1 if not available.
Class: org.apache.spark.sql.catalyst.expressions.InputFileBlockLength
input_file_block_start() – Returns the start offset of the block being read, or -1 if not available.
Class: org.apache.spark.sql.catalyst.expressions.InputFileBlockStart
input_file_name() – Returns the name of the file being read, or empty string if not available.
Class: org.apache.spark.sql.catalyst.expressions.InputFileName
Function: instr
instr(str, substr) – Returns the (1-based) index of the first occurrence of `substr` in `str`.
Class: org.apache.spark.sql.catalyst.expressions.StringInstr
Function: int
int(expr) – Casts the value `expr` to the target data type `int`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: intersect_array
intersect_array(array1, array2, …) – Returns the intersection of a set of arrays
Class: brickhouse.udf.collect.ArrayIntersectUDF
Function: is_finite
is_finite(x) – Determine if x is finite.
SELECT is_finite(333), is_finite(infinity());
true false
Class: hivemall.tools.math.IsFiniteUDF
Function: is_infinite
is_infinite(x) – Determine if x is infinite.
Class: hivemall.tools.math.IsInfiniteUDF
Function: is_nan
is_nan(x) – Determine if x is not-a-number.
Class: hivemall.tools.math.IsNanUDF
Function: is_stopword
is_stopword(string word) – Returns whether English stopword or not
Class: hivemall.tools.text.StopwordUDF
Function: isnotnull
isnotnull(expr) – Returns true if `expr` is not null, or false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.IsNotNull
Function: isnull
isnull(expr) – Returns true if `expr` is null, or false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.IsNull
Function: isochronedistanceedges
Class: com.whereos.udf.IsochroneDistanceEdgesUDTF
Function: isochronedistancepolygons
Class: com.whereos.udf.IsochroneDistancePolygonsUDTF
Function: isochronedurationedges
Class: com.whereos.udf.IsochroneDurationEdgesUDTF
Function: isochronedurationpolygons
Class: com.whereos.udf.IsochroneDistancePolygonsUDTF
Function: item_pairs_sampling
item_pairs_sampling(array pos_items, const int max_item_id [, const string options])- Returns a relation consists of
Class: hivemall.ftvec.ranking.ItemPairsSamplingUDTF
Function: jaccard_distance
jaccard_distance(integer A, integer B [,int k=128]) – Returns Jaccard distance between A and B
select
jaccard_distance(0,3) as c1,
jaccard_distance(“0″,”3”) as c2, — 0=0x00, 0=0x11
jaccard_distance(0,4) as c3
;
c1 c2 c3
0.03125 0.03125 0.015625
Class: hivemall.knn.distance.JaccardDistanceUDF
Function: jaccard_similarity
jaccard_similarity(A, B [,int k]) – Returns Jaccard similarity coefficient of A and B
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
jaccard_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;
doc1 doc2 similarity
1 2 0.14285715
1 3 0.0
2 3 0.6
2 1 0.14285715
3 2 0.6
3 1 0.0
Class: hivemall.knn.similarity.JaccardIndexUDF
Function: java_method
java_method(class, method[, arg1[, arg2 ..]]) – Calls a method with reflection.
Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
Function: jobconf_gets
jobconf_gets() – Returns the value from JobConf
Class: hivemall.tools.mapred.JobConfGetsUDF
Function: jobid
jobid() – Returns the value of mapred.job.id
Class: hivemall.tools.mapred.JobIdUDF
Function: join_array
Class: brickhouse.udf.collect.JoinArrayUDF
Function: json_map
json_map(json) – Returns a map of key-value pairs from a JSON string
Class: brickhouse.udf.json.JsonMapUDF
Function: json_split
json_split(json) – Returns a array of JSON strings from a JSON Array
Class: brickhouse.udf.json.JsonSplitUDF
Function: json_tuple
json_tuple(jsonStr, p1, p2, …, pn) – Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.
Class: org.apache.spark.sql.catalyst.expressions.JsonTuple
Function: kld
kld(double mu1, double sigma1, double mu2, double sigma2) – Returns KL divergence between two distributions
Class: hivemall.knn.distance.KLDivergenceUDF
Function: kpa_predict
kpa_predict(@Nonnull double xh, @Nonnull double xk, @Nullable float w0, @Nonnull float w1, @Nonnull float w2, @Nullable float w3) – Returns a prediction value in Double
Class: hivemall.classifier.KPAPredictUDAF
Function: kurtosis
kurtosis(expr) – Returns the kurtosis value calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Kurtosis
Function: l1_normalize
l1_normalize(ftvec string) – Returned a L1 normalized value
Class: hivemall.ftvec.scaling.L1NormalizationUDF
Function: l2_norm
l2_norm(double x) – Return a L2 norm of the given input x.
WITH input as (
select generate_series(1,3) as v
)
select l2_norm(v) as l2norm
from input;
3.7416573867739413 = sqrt(1^2+2^2+3^2))
Class: hivemall.tools.math.L2NormUDAF
Function: l2_normalize
l2_normalize(ftvec string) – Returned a L2 normalized value
Class: hivemall.ftvec.scaling.L2NormalizationUDF
Function: lag
lag(input[, offset[, default]]) – Returns the value of `input` at the `offset`th row before the current row in the window. The default value of `offset` is 1 and the default value of `default` is null. If the value of `input` at the `offset`th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), `default` is returned.
Class: org.apache.spark.sql.catalyst.expressions.Lag
Function: last
last(expr[, isIgnoreNull]) – Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Function: last_day
last_day(date) – Returns the last day of the month which the date belongs to.
Class: org.apache.spark.sql.catalyst.expressions.LastDay
Function: last_element
last_element(x) – Return the last element in an array
SELECT last_element(array(‘a’,’b’,’c’));
c
Class: hivemall.tools.array.LastElementUDF
Function: last_index
last_index(x) – Last value in an array
Class: brickhouse.udf.collect.LastIndexUDF
Function: last_value
last_value(expr[, isIgnoreNull]) – Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Function: lat2tiley
lat2tiley(double lat, int zoom)::int – Returns the tile number of the given latitude and zoom level
Class: hivemall.geospatial.Lat2TileYUDF
Function: lcase
lcase(str) – Returns `str` with all characters changed to lowercase.
Class: org.apache.spark.sql.catalyst.expressions.Lower
Function: lda_predict
lda_predict(string word, float value, int label, float lambda[, const string options]) – Returns a list which consists of
Class: hivemall.topicmodel.LDAPredictUDAF
Function: lead
lead(input[, offset[, default]]) – Returns the value of `input` at the `offset`th row after the current row in the window. The default value of `offset` is 1 and the default value of `default` is null. If the value of `input` at the `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), `default` is returned.
Class: org.apache.spark.sql.catalyst.expressions.Lead
Function: least
least(expr, …) – Returns the least value of all parameters, skipping null values.
Class: org.apache.spark.sql.catalyst.expressions.Least
Function: left
left(str, len) – Returns the leftmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is an empty string.
Class: org.apache.spark.sql.catalyst.expressions.Left
Function: length
length(expr) – Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
Class: org.apache.spark.sql.catalyst.expressions.Length
Function: levenshtein
levenshtein(str1, str2) – Returns the Levenshtein distance between the two given strings.
Class: org.apache.spark.sql.catalyst.expressions.Levenshtein
Function: like
str like pattern – Returns true if str matches pattern, null if any arguments are null, false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.Like
Function: ln
ln(expr) – Returns the natural logarithm (base e) of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Log
Function: locate
locate(substr, str[, pos]) – Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based.
Class: org.apache.spark.sql.catalyst.expressions.StringLocate
Function: log
log(base, expr) – Returns the logarithm of `expr` with `base`.
Class: org.apache.spark.sql.catalyst.expressions.Logarithm
Function: log10
log10(expr) – Returns the logarithm of `expr` with base 10.
Class: org.apache.spark.sql.catalyst.expressions.Log10
Function: log1p
log1p(expr) – Returns log(1 + `expr`).
Class: org.apache.spark.sql.catalyst.expressions.Log1p
Function: log2
log2(expr) – Returns the logarithm of `expr` with base 2.
Class: org.apache.spark.sql.catalyst.expressions.Log2
Function: logloss
logloss(double predicted, double actual) – Return a Logrithmic Loss
Class: hivemall.evaluation.LogarithmicLossUDAF
Function: logress
logress(array features, float target [, constant string options]) – Returns a relation consists of <{int|bigint|string} feature, float weight>
Class: hivemall.regression.LogressUDTF
Function: lon2tilex
lon2tilex(double lon, int zoom)::int – Returns the tile number of the given longitude and zoom level
Class: hivemall.geospatial.Lon2TileXUDF
Function: lower
lower(str) – Returns `str` with all characters changed to lowercase.
Class: org.apache.spark.sql.catalyst.expressions.Lower
Function: lpad
lpad(str, len, pad) – Returns `str`, left-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters.
Class: org.apache.spark.sql.catalyst.expressions.StringLPad
Function: lr_datagen
lr_datagen(options string) – Generates a logistic regression dataset
WITH dual AS (SELECT 1) SELECT lr_datagen(‘-n_examples 1k -n_features 10’) FROM dual;
Class: hivemall.dataset.LogisticRegressionDataGeneratorUDTF
Function: ltrim
ltrim(str) – Removes the leading space characters from `str`. ltrim(trimStr, str) – Removes the leading string contains the characters from the trim string
Class: org.apache.spark.sql.catalyst.expressions.StringTrimLeft
Function: mae
mae(double predicted, double actual) – Return a Mean Absolute Error
Class: hivemall.evaluation.MeanAbsoluteErrorUDAF
Function: manhattan_distance
manhattan_distance(list x, list y) – Returns sum(|x – y|)
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
manhattan_distance(l.features, r.features) as distance,
distance2similarity(angular_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;
doc1 doc2 distance similarity
1 2 4.0 0.75
1 3 5.0 0.75942624
2 3 1.0 0.91039914
2 1 4.0 0.75
3 2 1.0 0.91039914
3 1 5.0 0.75942624
Class: hivemall.knn.distance.ManhattanDistanceUDF
Function: map
map(key0, value0, key1, value1, …) – Creates a map with the given key/value pairs.
Class: org.apache.spark.sql.catalyst.expressions.CreateMap
Function: map_concat
map_concat(map, …) – Returns the union of all the given maps
Class: org.apache.spark.sql.catalyst.expressions.MapConcat
Function: map_exclude_keys
map_exclude_keys(Map map, array filteringKeys) – Returns the filtered entries of a map not having specified keys
SELECT map_exclude_keys(map(1,’one’,2,’two’,3,’three’),array(2,3));
{1:”one”}
Class: hivemall.tools.map.MapExcludeKeysUDF
Function: map_filter_keys
map_filter_keys(map, key_array) – Returns the filtered entries of a map corresponding to a given set of keys
Class: brickhouse.udf.collect.MapFilterKeysUDF
Function: map_from_arrays
map_from_arrays(keys, values) – Creates a map with a pair of the given key/value arrays. All elements in keys should not be null
Class: org.apache.spark.sql.catalyst.expressions.MapFromArrays
Function: map_from_entries
map_from_entries(arrayOfEntries) – Returns a map created from the given array of entries.
Class: org.apache.spark.sql.catalyst.expressions.MapFromEntries
Function: map_get_sum
map_get_sum(map src, array keys) – Returns sum of values that are retrieved by keys
Class: hivemall.tools.map.MapGetSumUDF
Function: map_include_keys
map_include_keys(Map map, array filteringKeys) – Returns the filtered entries of a map having specified keys
SELECT map_include_keys(map(1,’one’,2,’two’,3,’three’),array(2,3));
{2:”two”,3:”three”}
Class: hivemall.tools.map.MapIncludeKeysUDF
Function: map_index
Class: brickhouse.udf.collect.MapIndexUDF
Function: map_key_values
map_key_values(map) – Returns a Array of key-value pairs contained in a Map
Class: brickhouse.udf.collect.MapKeyValuesUDF
Function: map_keys
map_keys(map) – Returns an unordered array containing the keys of the map.
Class: org.apache.spark.sql.catalyst.expressions.MapKeys
Function: map_tail_n
map_tail_n(map SRC, int N) – Returns the last N elements from a sorted array of SRC
Class: hivemall.tools.map.MapTailNUDF
Function: map_url
map_url(double lat, double lon, int zoom [, const string option]) – Returns a URL string
OpenStreetMap: http://tile.openstreetmap.org/${zoom}/${xtile}/${ytile}.png
Google Maps: https://www.google.com/maps/@${lat},${lon},${zoom}z
Class: hivemall.geospatial.MapURLUDF
Function: map_values
map_values(map) – Returns an unordered array containing the values of the map.
Class: org.apache.spark.sql.catalyst.expressions.MapValues
Function: max
max(expr) – Returns the maximum value of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Max
Function: max_label
max_label(double value, string label) – Returns a label that has the maximum value
Class: hivemall.ensemble.MaxValueLabelUDAF
Function: maxrow
maxrow(ANY compare, …) – Returns a row that has maximum value in the 1st argument
Class: hivemall.ensemble.MaxRowUDAF
Function: md5
md5(expr) – Returns an MD5 128-bit checksum as a hex string of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Md5
Function: mean
mean(expr) – Returns the mean calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Average
Function: mhash
mhash(string word) returns a murmurhash3 INT value starting from 1
Class: hivemall.ftvec.hashing.MurmurHash3UDF
Function: min
min(expr) – Returns the minimum value of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min
Function: minhash
minhash(ANY item, array features [, constant string options]) – Returns n different k-depth signatures (i.e., clusterid) for each item
Class: hivemall.knn.lsh.MinHashUDTF
Function: minhashes
minhashes(array<> features [, int numHashes, int keyGroup [, boolean noWeight]]) – Returns minhash values
Class: hivemall.knn.lsh.MinHashesUDF
Function: minkowski_distance
minkowski_distance(list x, list y, double p) – Returns sum(|x – y|^p)^(1/p)
WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
minkowski_distance(l.features, r.features, 1) as distance1, — p=1 (manhattan_distance)
minkowski_distance(l.features, r.features, 2) as distance2, — p=2 (euclid_distance)
minkowski_distance(l.features, r.features, 3) as distance3, — p=3
manhattan_distance(l.features, r.features) as manhattan_distance,
euclid_distance(l.features, r.features) as euclid_distance
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance1 asc;
doc1 doc2 distance1 distance2 distance3 manhattan_distance euclid_distance
1 2 4.0 2.4494898 2.1544347 4.0 2.4494898
1 3 5.0 2.6457512 2.2239802 5.0 2.6457512
2 3 1.0 1.0 1.0 1.0 1.0
2 1 4.0 2.4494898 2.1544347 4.0 2.4494898
3 2 1.0 1.0 1.0 1.0 1.0
3 1 5.0 2.6457512 2.2239802 5.0 2.6457512
Class: hivemall.knn.distance.MinkowskiDistanceUDF
Function: minute
minute(timestamp) – Returns the minute component of the string/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.Minute
Function: mod
expr1 mod expr2 – Returns the remainder after `expr1`/`expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Remainder
Function: monotonically_increasing_id
monotonically_increasing_id() – Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.
Class: org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID
Function: month
month(date) – Returns the month component of the date/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.Month
Function: months_between
months_between(timestamp1, timestamp2[, roundOff]) – If `timestamp1` is later than `timestamp2`, then the result is positive. If `timestamp1` and `timestamp2` are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.
Class: org.apache.spark.sql.catalyst.expressions.MonthsBetween
Function: moving_avg
return the moving average of a time series for a given timewindow
Class: brickhouse.udf.timeseries.MovingAvgUDF
Function: mrr
mrr(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) – Returns MRR
Class: hivemall.evaluation.MRRUDAF
Function: mse
mse(double predicted, double actual) – Return a Mean Squared Error
Class: hivemall.evaluation.MeanSquaredErrorUDAF
Function: multiday_count
multiday_count(x) – Returns a count of events over several different periods,
Class: brickhouse.udf.sketch.MultiDaySketcherUDAF
Function: named_struct
named_struct(name1, val1, name2, val2, …) – Creates a struct with the given field names and values.
Class: org.apache.spark.sql.catalyst.expressions.CreateNamedStruct
Function: nan
nan() – Returns the constant representing not-a-number.
SELECT nan(), is_nan(nan());
NaN true
Class: hivemall.tools.math.NanUDF
Function: nanvl
nanvl(expr1, expr2) – Returns `expr1` if it’s not NaN, or `expr2` otherwise.
Class: org.apache.spark.sql.catalyst.expressions.NaNvl
Function: ndcg
ndcg(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) – Returns nDCG
Class: hivemall.evaluation.NDCGUDAF
Function: negative
negative(expr) – Returns the negated value of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.UnaryMinus
Function: next_day
next_day(start_date, day_of_week) – Returns the first date which is later than `start_date` and named as indicated.
Class: org.apache.spark.sql.catalyst.expressions.NextDay
Function: normalize_unicode
normalize_unicode(string str [, string form]) – Transforms `str` with the specified normalization form. The `form` takes one of NFC (default), NFD, NFKC, or NFKD
SELECT normalize_unicode(‘ハンカクカナ’,’NFKC’);
ハンカクカナ
SELECT normalize_unicode(‘㈱㌧㌦Ⅲ’,’NFKC’);
(株)トンドルIII
Class: hivemall.tools.text.NormalizeUnicodeUDF
Function: not
not expr – Logical not.
Class: org.apache.spark.sql.catalyst.expressions.Not
Function: now
now() – Returns the current timestamp at the start of query evaluation.
Class: org.apache.spark.sql.catalyst.expressions.CurrentTimestamp
Function: ntile
ntile(n) – Divides the rows for each window partition into `n` buckets ranging from 1 to at most `n`.
Class: org.apache.spark.sql.catalyst.expressions.NTile
Function: nullif
nullif(expr1, expr2) – Returns null if `expr1` equals to `expr2`, or `expr1` otherwise.
Class: org.apache.spark.sql.catalyst.expressions.NullIf
Function: numeric_range
numeric_range(a,b,c) – Generates a range of integers from a to b incremented by c or the elements of a map into multiple rows and columns
Class: brickhouse.udf.collect.NumericRange
Function: nvl
nvl(expr1, expr2) – Returns `expr2` if `expr1` is null, or `expr1` otherwise.
Class: org.apache.spark.sql.catalyst.expressions.Nvl
Function: nvl2
nvl2(expr1, expr2, expr3) – Returns `expr2` if `expr1` is not null, or `expr3` otherwise.
Class: org.apache.spark.sql.catalyst.expressions.Nvl2
Function: octet_length
octet_length(expr) – Returns the byte length of string data or number of bytes of binary data.
Class: org.apache.spark.sql.catalyst.expressions.OctetLength
Function: onehot_encoding
onehot_encoding(PRIMITIVE feature, …) – Compute onehot encoded label for each feature
WITH mapping as (
select
m.f1, m.f2
from (
select onehot_encoding(species, category) m
from test
) tmp
)
select
array(m.f1[t.species],m.f2[t.category],feature(‘count’,count)) as sparse_features
from
test t
CROSS JOIN mapping m;
[“2″,”8″,”count:9”]
[“5″,”8″,”count:10”]
[“1″,”6″,”count:101”]
Class: hivemall.ftvec.trans.OnehotEncodingUDAF
Function: or
expr1 or expr2 – Logical OR.
Class: org.apache.spark.sql.catalyst.expressions.Or
Function: parse_url
parse_url(url, partToExtract[, key]) – Extracts a part from a URL.
Class: org.apache.spark.sql.catalyst.expressions.ParseUrl
Function: percent_rank
percent_rank() – Computes the percentage ranking of a value in a group of values.
Class: org.apache.spark.sql.catalyst.expressions.PercentRank
Function: percentile
percentile(col, percentage [, frequency]) – Returns the exact percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral percentile(col, array(percentage1 [, percentage2]…) [, frequency]) – Returns the exact percentile value array of numeric column `col` at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Percentile
Function: percentile_approx
percentile_approx(col, percentage [, accuracy]) – Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
Function: permutations
Class: com.whereos.udf.PermutationUDTF
Function: pi
pi() – Returns pi.
Class: org.apache.spark.sql.catalyst.expressions.Pi
Function: plsa_predict
plsa_predict(string word, float value, int label, float prob[, const string options]) – Returns a list which consists of
Class: hivemall.topicmodel.PLSAPredictUDAF
Function: pmod
pmod(expr1, expr2) – Returns the positive value of `expr1` mod `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Pmod
Function: polynomial_features
polynomial_features(feature_vector in array) – Returns a feature vectorhaving polynomial feature space
Class: hivemall.ftvec.pairing.PolynomialFeaturesUDF
Function: popcnt
popcnt(a [, b]) – Returns a popcount value
select
popcnt(3),
popcnt(“3”), — 3=0x11
popcnt(array(1,3));
2 2 3
Class: hivemall.knn.distance.PopcountUDF
Function: populate_not_in
populate_not_in(list items, const int max_item_id [, const string options])- Returns a relation consists of that item does not exist in the given items
Class: hivemall.ftvec.ranking.PopulateNotInUDTF
Function: posexplode
posexplode(expr) – Separates the elements of array `expr` into multiple rows with positions, or the elements of map `expr` into multiple rows and columns with positions.
Class: org.apache.spark.sql.catalyst.expressions.PosExplode
Function: posexplode_outer
posexplode_outer(expr) – Separates the elements of array `expr` into multiple rows with positions, or the elements of map `expr` into multiple rows and columns with positions.
Class: org.apache.spark.sql.catalyst.expressions.PosExplode
Function: posexplodepairs
Class: com.whereos.udf.PosExplodePairsUDTF
Function: position
position(substr, str[, pos]) – Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based.
Class: org.apache.spark.sql.catalyst.expressions.StringLocate
Function: positive
positive(expr) – Returns the value of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.UnaryPositive
Function: pow
pow(expr1, expr2) – Raises `expr1` to the power of `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Pow
Function: power
power(expr1, expr2) – Raises `expr1` to the power of `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.Pow
Function: powered_features
powered_features(feature_vector in array, int degree [, boolean truncate]) – Returns a feature vector having a powered feature space
Class: hivemall.ftvec.pairing.PoweredFeaturesUDF
Function: precision_at
precision_at(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) – Returns Precision
Class: hivemall.evaluation.PrecisionUDAF
Function: prefixed_hash_values
prefixed_hash_values(array values, string prefix [, boolean useIndexAsPrefix]) returns array that each element has the specified prefix
Class: hivemall.ftvec.hashing.ArrayPrefixedHashValuesUDF
Function: printf
printf(strfmt, obj, …) – Returns a formatted string from printf-style format strings.
Class: org.apache.spark.sql.catalyst.expressions.FormatString
Function: quantified_features
quantified_features(boolean output, col1, col2, …) – Returns an identified features in a dense array
Class: hivemall.ftvec.trans.QuantifiedFeaturesUDTF
Function: quantify
quantify(boolean output, col1, col2, …) – Returns an identified features
Class: hivemall.ftvec.conv.QuantifyColumnsUDTF
Function: quantitative_features
quantitative_features(array featureNames, feature1, feature2, .. [, const string options]) – Returns a feature vector array
Class: hivemall.ftvec.trans.QuantitativeFeaturesUDF
Function: quarter
quarter(date) – Returns the quarter of the year for date, in the range 1 to 4.
Class: org.apache.spark.sql.catalyst.expressions.Quarter
Function: r2
r2(double predicted, double actual) – Return R Squared (coefficient of determination)
Class: hivemall.evaluation.R2UDAF
Function: radians
radians(expr) – Converts degrees to radians.
Class: org.apache.spark.sql.catalyst.expressions.ToRadians
Function: raise_error
raise_error() or raise_error(string msg) – Throws an error
SELECT product_id, price, raise_error(‘Found an invalid record’) FROM xxx WHERE price < 0.0
Class: hivemall.tools.sanity.RaiseErrorUDF
Function: rand
rand([seed]) – Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Class: org.apache.spark.sql.catalyst.expressions.Rand
Function: rand_amplify
rand_amplify(const int xtimes [, const string options], *) – amplify the input records x-times in map-side
Class: hivemall.ftvec.amplify.RandomAmplifierUDTF
Function: randn
randn([seed]) – Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.
Class: org.apache.spark.sql.catalyst.expressions.Randn
Function: rank
rank() – Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.
Class: org.apache.spark.sql.catalyst.expressions.Rank
Function: readjsongeometry
Class: com.whereos.udf.ReadJSONGeometryUDF
Function: recall_at
recall_at(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) – Returns Recall
Class: hivemall.evaluation.RecallUDAF
Function: reflect
reflect(class, method[, arg1[, arg2 ..]]) – Calls a method with reflection.
Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
regexp_extract(str, regexp[, idx]) – Extracts a group that matches `regexp`.
Class: org.apache.spark.sql.catalyst.expressions.RegExpExtract
Function: regexp_replace
regexp_replace(str, regexp, rep) – Replaces all substrings of `str` that match `regexp` with `rep`.
Class: org.apache.spark.sql.catalyst.expressions.RegExpReplace
Function: rendergeometries
Class: com.whereos.udf.CollectAndRenderGeometryUDF
Function: renderheatmap
Class: com.whereos.udf.HeatmapRenderUDF
Function: rendertile
Class: com.whereos.udf.TileRenderUDF
Function: repeat
repeat(str, n) – Returns the string which repeats the given string value n times.
Class: org.apache.spark.sql.catalyst.expressions.StringRepeat
Function: replace
replace(str, search[, replace]) – Replaces all occurrences of `search` with `replace`.
Class: org.apache.spark.sql.catalyst.expressions.StringReplace
Function: rescale
rescale(value, min, max) – Returns rescaled value by min-max normalization
Class: hivemall.ftvec.scaling.RescaleUDF
Function: reverse
reverse(array) – Returns a reversed string or an array with reverse order of elements.
Class: org.apache.spark.sql.catalyst.expressions.Reverse
Function: rf_ensemble
rf_ensemble(int yhat [, array proba [, double model_weight=1.0]]) – Returns ensembled prediction results in probabilities>
Class: hivemall.smile.tools.RandomForestEnsembleUDAF
Function: right
right(str, len) – Returns the rightmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is an empty string.
Class: org.apache.spark.sql.catalyst.expressions.Right
Function: rint
rint(expr) – Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Class: org.apache.spark.sql.catalyst.expressions.Rint
Function: rlike
str rlike regexp – Returns true if `str` matches `regexp`, or false otherwise.
Class: org.apache.spark.sql.catalyst.expressions.RLike
Function: rmse
rmse(double predicted, double actual) – Return a Root Mean Squared Error
Class: hivemall.evaluation.RootMeanSquaredErrorUDAF
Function: rollup
rollup([col1[, col2 ..]]) – create a multi-dimensional rollup using the specified columns so that we can run aggregation on them.
Class: org.apache.spark.sql.catalyst.expressions.Rollup
Function: round
round(expr, d) – Returns `expr` rounded to `d` decimal places using HALF_UP rounding mode.
Class: org.apache.spark.sql.catalyst.expressions.Round
Function: row_number
row_number() – Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
Class: org.apache.spark.sql.catalyst.expressions.RowNumber
Function: rowid
rowid() – Returns a generated row id of a form {TASK_ID}-{SEQUENCE_NUMBER}
Class: hivemall.tools.mapred.RowIdUDF
Function: rownum
rownum() – Returns a generated row number `sprintf(`%d%04d`,sequence,taskId)` in long
SELECT rownum() as rownum, xxx from …
Class: hivemall.tools.mapred.RowNumberUDF
Function: rpad
rpad(str, len, pad) – Returns `str`, right-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters.
Class: org.apache.spark.sql.catalyst.expressions.StringRPad
Function: rtrim
rtrim(str) – Removes the trailing space characters from `str`. rtrim(trimStr, str) – Removes the trailing string which contains the characters from the trim string from the `str`
Class: org.apache.spark.sql.catalyst.expressions.StringTrimRight
Function: salted_bigint
Class: brickhouse.hbase.SaltedBigIntUDF
Function: salted_bigint_key
Class: brickhouse.hbase.SaltedBigIntUDF
Function: schema_of_json
schema_of_json(json[, options]) – Returns schema in the DDL format of JSON string.
Class: org.apache.spark.sql.catalyst.expressions.SchemaOfJson
Function: second
second(timestamp) – Returns the second component of the string/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.Second
Function: select_k_best
select_k_best(array array, const array importance, const int k) – Returns selected top-k elements as array
Class: hivemall.tools.array.SelectKBestUDF
Function: sentences
sentences(str[, lang, country]) – Splits `str` into an array of array of words.
Class: org.apache.spark.sql.catalyst.expressions.Sentences
Function: sequence
sequence(start, stop, step) – Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions. Supported types are: byte, short, integer, long, date, timestamp. The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the ‘date’ or ‘timestamp’ type then the step expression must resolve to the ‘interval’ type, otherwise to the same type as the start and stop expressions.
Class: org.apache.spark.sql.catalyst.expressions.Sequence
Function: sessionize
sessionize(long timeInSec, long thresholdInSec [, String subject])- Returns a UUID string of a session.
SELECT
sessionize(time, 3600, ip_addr) as session_id,
time, ip_addr
FROM (
SELECT time, ipaddr
FROM weblog
DISTRIBUTE BY ip_addr, time SORT BY ip_addr, time DESC
) t1
Class: hivemall.tools.datetime.SessionizeUDF
Function: set_difference
set_difference(a,b) – Returns a list of those items in a, but not in b
Class: brickhouse.udf.collect.SetDifferenceUDF
Function: set_similarity
set_similarity(a,b) – Compute the Jaccard set similarity of two sketch sets.
Class: brickhouse.udf.sketch.SetSimilarityUDF
Function: sha
sha(expr) – Returns a sha1 hash value as a hex string of the `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Sha1
Function: sha1
sha1(expr) – Returns a sha1 hash value as a hex string of the `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Sha1
Function: sha2
sha2(expr, bitLength) – Returns a checksum of SHA-2 family as a hex string of `expr`. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.
Class: org.apache.spark.sql.catalyst.expressions.Sha2
Function: shiftleft
shiftleft(base, expr) – Bitwise left shift.
Class: org.apache.spark.sql.catalyst.expressions.ShiftLeft
Function: shiftright
shiftright(base, expr) – Bitwise (signed) right shift.
Class: org.apache.spark.sql.catalyst.expressions.ShiftRight
Function: shiftrightunsigned
shiftrightunsigned(base, expr) – Bitwise unsigned right shift.
Class: org.apache.spark.sql.catalyst.expressions.ShiftRightUnsigned
Function: shuffle
shuffle(array) – Returns a random permutation of the given array.
Class: org.apache.spark.sql.catalyst.expressions.Shuffle
Function: sigmoid
sigmoid(x) – Returns 1.0 / (1.0 + exp(-x))
WITH input as (
SELECT 3.0 as x
UNION ALL
SELECT -3.0 as x
)
select
1.0 / (1.0 + exp(-x)),
sigmoid(x)
from
input;
0.04742587317756678 0.04742587357759476
0.9525741268224334 0.9525741338729858
Class: hivemall.tools.math.SigmoidGenericUDF
Function: sign
sign(expr) – Returns -1.0, 0.0 or 1.0 as `expr` is negative, 0 or positive.
Class: org.apache.spark.sql.catalyst.expressions.Signum
Function: signum
signum(expr) – Returns -1.0, 0.0 or 1.0 as `expr` is negative, 0 or positive.
Class: org.apache.spark.sql.catalyst.expressions.Signum
Function: simple_r
Class: com.whereos.udf.RenjinUDF
Function: sin
sin(expr) – Returns the sine of `expr`, as if computed by `java.lang.Math.sin`.
Class: org.apache.spark.sql.catalyst.expressions.Sin
Function: singularize
singularize(string word) – Returns singular form of a given English word
SELECT singularize(lower(“Apples”));
“apple”
Class: hivemall.tools.text.SingularizeUDF
Function: sinh
sinh(expr) – Returns hyperbolic sine of `expr`, as if computed by `java.lang.Math.sinh`.
Class: org.apache.spark.sql.catalyst.expressions.Sinh
Function: size
size(expr) – Returns the size of an array or a map.The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true.If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input.By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
Class: org.apache.spark.sql.catalyst.expressions.Size
Function: sketch_hashes
sketch_hashes(x) – Return the MD5 hashes associated with a KMV sketch set of strings
Class: brickhouse.udf.sketch.SketchHashesUDF
Function: sketch_set
sketch_set(x) – Constructs a sketch set to estimate reach for large values
Class: brickhouse.udf.sketch.SketchSetUDAF
Function: skewness
skewness(expr) – Returns the skewness value calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Skewness
Function: slice
slice(x, start, length) – Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.
Class: org.apache.spark.sql.catalyst.expressions.Slice
Function: smallint
smallint(expr) – Casts the value `expr` to the target data type `smallint`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: snr
snr(array features, array one-hot class label) – Returns Signal Noise Ratio for each feature as array
Class: hivemall.ftvec.selection.SignalNoiseRatioUDAF
Function: sort_and_uniq_array
sort_and_uniq_array(array) – Takes array and returns a sorted array with duplicate elements eliminated
SELECT sort_and_uniq_array(array(3,1,1,-2,10));
[-2,1,3,10]
Class: hivemall.tools.array.SortAndUniqArrayUDF
Function: sort_array
sort_array(array[, ascendingOrder]) – Sorts the input array in ascending or descending order according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
Class: org.apache.spark.sql.catalyst.expressions.SortArray
Function: sort_by_feature
sort_by_feature(map in map) – Returns a sorted map
Class: hivemall.ftvec.SortByFeatureUDF
Function: soundex
soundex(str) – Returns Soundex code of the string.
Class: org.apache.spark.sql.catalyst.expressions.SoundEx
Function: space
space(n) – Returns a string consisting of `n` spaces.
Class: org.apache.spark.sql.catalyst.expressions.StringSpace
Function: spark_partition_id
spark_partition_id() – Returns the current partition id.
Class: org.apache.spark.sql.catalyst.expressions.SparkPartitionID
Function: split
split(str, regex) – Splits `str` around occurrences that match `regex`.
Class: org.apache.spark.sql.catalyst.expressions.StringSplit
Function: split_words
split_words(string query [, string regex]) – Returns an array containing splitted strings
Class: hivemall.tools.text.SplitWordsUDF
Function: splitlinestring
Class: com.whereos.udf.LineSplitterUDTF
Function: sqrt
sqrt(expr) – Returns the square root of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.Sqrt
Function: sst
sst(double|array x [, const string options]) – Returns change-point scores and decisions using Singular Spectrum Transformation (SST). It will return a tuple
Class: hivemall.anomaly.SingularSpectrumTransformUDF
Function: stack
stack(n, expr1, …, exprk) – Separates `expr1`, …, `exprk` into `n` rows.
Class: org.apache.spark.sql.catalyst.expressions.Stack
Function: std
std(expr) – Returns the sample standard deviation calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.StddevSamp
Function: stddev
stddev(expr) – Returns the sample standard deviation calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.StddevSamp
Function: stddev_pop
stddev_pop(expr) – Returns the population standard deviation calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.StddevPop
Function: stddev_samp
stddev_samp(expr) – Returns the sample standard deviation calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.StddevSamp
Function: str_to_map
str_to_map(text[, pairDelim[, keyValueDelim]]) – Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ‘,’ for `pairDelim` and ‘:’ for `keyValueDelim`.
Class: org.apache.spark.sql.catalyst.expressions.StringToMap
Function: string
string(expr) – Casts the value `expr` to the target data type `string`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: struct
struct(col1, col2, col3, …) – Creates a struct with the given field values.
Class: org.apache.spark.sql.catalyst.expressions.NamedStruct
Function: subarray
subarray(array values, int offset [, int length]) – Slices the given array by the given offset and length parameters.
SELECT
array_slice(array(1,2,3,4,5,6),2,4),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
0, — offset
2 — length
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
6, — offset
3 — length
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
6, — offset
10 — length
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
6 — offset
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
-3 — offset
),
array_slice(
array(“zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “ten”),
-3, — offset
2 — length
);
[3,4]
[“zero”,”one”]
[“six”,”seven”,”eight”]
[“six”,”seven”,”eight”,”nine”,”ten”]
[“six”,”seven”,”eight”,”nine”,”ten”]
[“eight”,”nine”,”ten”]
[“eight”,”nine”]
Class: hivemall.tools.array.ArraySliceUDF
Function: subarray_endwith
subarray_endwith(array original, int|text key) – Returns an array that ends with the specified key
SELECT subarray_endwith(array(1,2,3,4), 3);
[1,2,3]
Class: hivemall.tools.array.SubarrayEndWithUDF
Function: subarray_startwith
subarray_startwith(array original, int|text key) – Returns an array that starts with the specified key
SELECT subarray_startwith(array(1,2,3,4), 2);
[2,3,4]
Class: hivemall.tools.array.SubarrayStartWithUDF
Function: substr
substr(str, pos[, len]) – Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.
Class: org.apache.spark.sql.catalyst.expressions.Substring
Function: substring
substring(str, pos[, len]) – Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.
Class: org.apache.spark.sql.catalyst.expressions.Substring
Function: substring_index
substring_index(str, delim, count) – Returns the substring from `str` before `count` occurrences of the delimiter `delim`. If `count` is positive, everything to the left of the final delimiter (counting from the left) is returned. If `count` is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for `delim`.
Class: org.apache.spark.sql.catalyst.expressions.SubstringIndex
Function: sum
sum(expr) – Returns the sum calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Sum
Function: sum_array
Class: brickhouse.udf.timeseries.SumArrayUDF
Function: tan
tan(expr) – Returns the tangent of `expr`, as if computed by `java.lang.Math.tan`.
Class: org.apache.spark.sql.catalyst.expressions.Tan
Function: tanh
tanh(expr) – Returns the hyperbolic tangent of `expr`, as if computed by `java.lang.Math.tanh`.
Class: org.apache.spark.sql.catalyst.expressions.Tanh
Function: taskid
taskid() – Returns the value of mapred.task.partition
Class: hivemall.tools.mapred.TaskIdUDF
Function: tf
tf(string text) – Return a term frequency in
Class: hivemall.ftvec.text.TermFrequencyUDAF
Function: throw_error
Class: brickhouse.udf.sanity.ThrowErrorUDF
Function: tile
tile(double lat, double lon, int zoom)::bigint – Returns a tile number 2^2n where n is zoom level. tile(lat,lon,zoom) = xtile(lon,zoom) + ytile(lat,zoom) * 2^zoom
refer https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames for detail
Class: hivemall.geospatial.TileUDF
Function: tilex2lon
tilex2lon(int x, int zoom)::double – Returns longitude of the given tile x and zoom level
Class: hivemall.geospatial.TileX2LonUDF
Function: tiley2lat
tiley2lat(int y, int zoom)::double – Returns latitude of the given tile y and zoom level
Class: hivemall.geospatial.TileY2LatUDF
Function: timestamp
timestamp(expr) – Casts the value `expr` to the target data type `timestamp`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: tinyint
tinyint(expr) – Casts the value `expr` to the target data type `tinyint`.
Class: org.apache.spark.sql.catalyst.expressions.Cast
Function: to_bits
to_bits(int[] indexes) – Returns an bitset representation if the given indexes in long[]
SELECT to_bits(array(1,2,3,128));
[14,-9223372036854775808]
Class: hivemall.tools.bits.ToBitsUDF
Function: to_camel_case
to_camel_case(a) – Converts a string containing underscores to CamelCase
Class: brickhouse.udf.json.ConvertToCamelCaseUDF
Function: to_date
to_date(date_str[, fmt]) – Parses the `date_str` expression with the `fmt` expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the `fmt` is omitted.
Class: org.apache.spark.sql.catalyst.expressions.ParseToDate
Function: to_dense
to_dense(array feature_vector, int dimensions) – Returns a dense feature in array
Class: hivemall.ftvec.conv.ToDenseFeaturesUDF
Function: to_dense_features
to_dense_features(array feature_vector, int dimensions) – Returns a dense feature in array
Class: hivemall.ftvec.conv.ToDenseFeaturesUDF
Function: to_json
to_json(expr[, options]) – Returns a JSON string with a given struct value
Class: org.apache.spark.sql.catalyst.expressions.StructsToJson
Function: to_map
to_map(key, value) – Convert two aggregated columns into a key-value map
WITH input as (
select ‘aaa’ as key, 111 as value
UNION all
select ‘bbb’ as key, 222 as value
)
select to_map(key, value)
from input;
> {“bbb”:222,”aaa”:111}
Class: hivemall.tools.map.UDAFToMap
Function: to_ordered_list
to_ordered_list(PRIMITIVE value [, PRIMITIVE key, const string options]) – Return list of values sorted by value itself or specific key
WITH t as (
SELECT 5 as key, ‘apple’ as value
UNION ALL
SELECT 3 as key, ‘banana’ as value
UNION ALL
SELECT 4 as key, ‘candy’ as value
UNION ALL
SELECT 2 as key, ‘donut’ as value
UNION ALL
SELECT 3 as key, ‘egg’ as value
)
SELECT — expected output
to_ordered_list(value, key, ‘-reverse’), — [apple, candy, (banana, egg | egg, banana), donut] (reverse order)
to_ordered_list(value, key, ‘-k 2’), — [apple, candy] (top-k)
to_ordered_list(value, key, ‘-k 100’), — [apple, candy, (banana, egg | egg, banana), dunut]
to_ordered_list(value, key, ‘-k 2 -reverse’), — [donut, (banana | egg)] (reverse top-k = tail-k)
to_ordered_list(value, key), — [donut, (banana, egg | egg, banana), candy, apple] (natural order)
to_ordered_list(value, key, ‘-k -2’), — [donut, (banana | egg)] (tail-k)
to_ordered_list(value, key, ‘-k -100’), — [donut, (banana, egg | egg, banana), candy, apple]
to_ordered_list(value, key, ‘-k -2 -reverse’), — [apple, candy] (reverse tail-k = top-k)
to_ordered_list(value, ‘-k 2’), — [egg, donut] (alphabetically)
to_ordered_list(key, ‘-k -2 -reverse’), — [5, 4] (top-2 keys)
to_ordered_list(key), — [2, 3, 3, 4, 5] (natural ordered keys)
to_ordered_list(value, key, ‘-k 2 -kv_map’), — {4:”candy”,5:”apple”}
to_ordered_list(value, key, ‘-k 2 -vk_map’) — {“candy”:4,”apple”:5}
FROM
t
Class: hivemall.tools.list.UDAFToOrderedList
Function: to_ordered_map
to_ordered_map(key, value [, const int k|const boolean reverseOrder=false]) – Convert two aggregated columns into an ordered key-value map
with t as (
select 10 as key, ‘apple’ as value
union all
select 3 as key, ‘banana’ as value
union all
select 4 as key, ‘candy’ as value
)
select
to_ordered_map(key, value, true), — {10:”apple”,4:”candy”,3:”banana”} (reverse)
to_ordered_map(key, value, 1), — {10:”apple”} (top-1)
to_ordered_map(key, value, 2), — {10:”apple”,4:”candy”} (top-2)
to_ordered_map(key, value, 3), — {10:”apple”,4:”candy”,3:”banana”} (top-3)
to_ordered_map(key, value, 100), — {10:”apple”,4:”candy”,3:”banana”} (top-100)
to_ordered_map(key, value), — {3:”banana”,4:”candy”,10:”apple”} (natural)
to_ordered_map(key, value, -1), — {3:”banana”} (tail-1)
to_ordered_map(key, value, -2), — {3:”banana”,4:”candy”} (tail-2)
to_ordered_map(key, value, -3), — {3:”banana”,4:”candy”,10:”apple”} (tail-3)
to_ordered_map(key, value, -100) — {3:”banana”,4:”candy”,10:”apple”} (tail-100)
from t
Class: hivemall.tools.map.UDAFToOrderedMap
Function: to_sparse
to_sparse(array feature_vector) – Returns a sparse feature in array
Class: hivemall.ftvec.conv.ToSparseFeaturesUDF
Function: to_sparse_features
to_sparse_features(array feature_vector) – Returns a sparse feature in array
Class: hivemall.ftvec.conv.ToSparseFeaturesUDF
Function: to_string_array
to_string_array(array) – Returns an array of strings
select to_string_array(array(1.0,2.0,3.0));
[“1.0″,”2.0″,”3.0”]
Class: hivemall.tools.array.ToStringArrayUDF
Function: to_timestamp
to_timestamp(timestamp_str[, fmt]) – Parses the `timestamp_str` expression with the `fmt` expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the `fmt` is omitted.
Class: org.apache.spark.sql.catalyst.expressions.ParseToTimestamp
Function: to_unix_timestamp
to_unix_timestamp(timeExp[, format]) – Returns the UNIX timestamp of the given time.
Class: org.apache.spark.sql.catalyst.expressions.ToUnixTimestamp
Function: to_utc_timestamp
to_utc_timestamp(timestamp, timezone) – Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.
Class: org.apache.spark.sql.catalyst.expressions.ToUTCTimestamp
Function: tokenize
tokenize(string englishText [, boolean toLowerCase]) – Returns tokenized words in array
Class: hivemall.tools.text.TokenizeUDF
Function: train_adadelta_regr
train_adadelta_regr(array features, float target [, constant string options]) – Returns a relation consists of <{int|bigint|string} feature, float weight>
Class: hivemall.regression.AdaDeltaUDTF
Function: train_adagrad_rda
train_adagrad_rda(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Adagrad+RDA regularization binary classifier
Class: hivemall.classifier.AdaGradRDAUDTF
Function: train_adagrad_regr
train_adagrad_regr(array features, float target [, constant string options]) – Returns a relation consists of <{int|bigint|string} feature, float weight>
Class: hivemall.regression.AdaGradUDTF
Function: train_arow
train_arow(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Adaptive Regularization of Weight Vectors (AROW) binary classifier
Class: hivemall.classifier.AROWClassifierUDTF
Function: train_arow_regr
train_arow_regr(array features, float target [, constant string options]) – a standard AROW (Adaptive Reguralization of Weight Vectors) regressor that uses `y – w^Tx` for the loss function.
SELECT
feature,
argmin_kld(weight, covar) as weight
FROM (
SELECT
train_arow_regr(features,label) as (feature,weight,covar)
FROM
training_data
) t
GROUP BY feature
Class: hivemall.regression.AROWRegressionUDTF
Function: train_arowe2_regr
train_arowe2_regr(array features, float target [, constant string options]) – a refined version of AROW (Adaptive Reguralization of Weight Vectors) regressor that usages adaptive epsilon-insensitive hinge loss `|w^t – y| – epsilon * stddev` for the loss function
SELECT
feature,
argmin_kld(weight, covar) as weight
FROM (
SELECT
train_arowe2_regr(features,label) as (feature,weight,covar)
FROM
training_data
) t
GROUP BY feature
Class: hivemall.regression.AROWRegressionUDTF$AROWe2
Function: train_arowe_regr
train_arowe_regr(array features, float target [, constant string options]) – a refined version of AROW (Adaptive Reguralization of Weight Vectors) regressor that usages epsilon-insensitive hinge loss `|w^t – y| – epsilon` for the loss function
SELECT
feature,
argmin_kld(weight, covar) as weight
FROM (
SELECT
train_arowe_regr(features,label) as (feature,weight,covar)
FROM
training_data
) t
GROUP BY feature
Class: hivemall.regression.AROWRegressionUDTF$AROWe
Function: train_arowh
train_arowh(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by AROW binary classifier using hinge loss
Class: hivemall.classifier.AROWClassifierUDTF$AROWh
Function: train_classifier
train_classifier(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by a generic classifier
Class: hivemall.classifier.GeneralClassifierUDTF
Function: train_cw
train_cw(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Confidence-Weighted (CW) binary classifier
Class: hivemall.classifier.ConfidenceWeightedUDTF
Function: train_kpa
train_kpa(array features, int label [, const string options]) – returns a relation
Class: hivemall.classifier.KernelExpansionPassiveAggressiveUDTF
Function: train_lda
train_lda(array words[, const string options]) – Returns a relation consists of
Class: hivemall.topicmodel.LDAUDTF
Function: train_logistic_regr
train_logistic_regr(array features, float target [, constant string options]) – Returns a relation consists of <{int|bigint|string} feature, float weight>
Class: hivemall.regression.LogressUDTF
Function: train_logregr
train_logregr(array features, float target [, constant string options]) – Returns a relation consists of <{int|bigint|string} feature, float weight>
Class: hivemall.regression.LogressUDTF
Function: train_multiclass_arow
train_multiclass_arow(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>
Build a prediction model by Adaptive Regularization of Weight Vectors (AROW) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassAROWClassifierUDTF
Function: train_multiclass_arowh
train_multiclass_arowh(list features, int|string label [, const string options]) – Returns a relation consists of
Build a prediction model by Adaptive Regularization of Weight Vectors (AROW) multiclass classifier using hinge loss
Class: hivemall.classifier.multiclass.MulticlassAROWClassifierUDTF$AROWh
Function: train_multiclass_cw
train_multiclass_cw(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>
Build a prediction model by Confidence-Weighted (CW) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassConfidenceWeightedUDTF
Function: train_multiclass_pa
train_multiclass_pa(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>
Build a prediction model by Passive-Aggressive (PA) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF
Function: train_multiclass_pa1
train_multiclass_pa1(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>
Build a prediction model by Passive-Aggressive 1 (PA-1) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF$PA1
Function: train_multiclass_pa2
train_multiclass_pa2(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>
Build a prediction model by Passive-Aggressive 2 (PA-2) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF$PA2
Function: train_multiclass_perceptron
train_multiclass_perceptron(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>
Build a prediction model by Perceptron multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassPerceptronUDTF
Function: train_multiclass_scw
train_multiclass_scw(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>
Build a prediction model by Soft Confidence-Weighted (SCW-1) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF$SCW1
Function: train_multiclass_scw2
train_multiclass_scw2(list features, {int|string} label [, const string options]) – Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>
Build a prediction model by Soft Confidence-Weighted 2 (SCW-2) multiclass classifier
Class: hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF$SCW2
Function: train_pa
train_pa(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Passive-Aggressive (PA) binary classifier
Class: hivemall.classifier.PassiveAggressiveUDTF
Function: train_pa1
train_pa1(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Passive-Aggressive 1 (PA-1) binary classifier
Class: hivemall.classifier.PassiveAggressiveUDTF$PA1
Function: train_pa1_regr
train_pa1_regr(array features, float target [, constant string options]) – PA-1 regressor that returns a relation consists of `(int|bigint|string) feature, float weight`.
SELECT
feature,
avg(weight) as weight
FROM
(SELECT
train_pa1_regr(features,label) as (feature,weight)
FROM
training_data
) t
GROUP BY feature
Class: hivemall.regression.PassiveAggressiveRegressionUDTF
Function: train_pa1a_regr
train_pa1a_regr(array features, float target [, constant string options]) – Returns a relation consists of `(int|bigint|string) feature, float weight`.
Class: hivemall.regression.PassiveAggressiveRegressionUDTF$PA1a
Function: train_pa2
train_pa2(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Passive-Aggressive 2 (PA-2) binary classifier
Class: hivemall.classifier.PassiveAggressiveUDTF$PA2
Function: train_pa2_regr
train_pa2_regr(array features, float target [, constant string options]) – Returns a relation consists of `(int|bigint|string) feature, float weight`.
Class: hivemall.regression.PassiveAggressiveRegressionUDTF$PA2
Function: train_pa2a_regr
train_pa2a_regr(array features, float target [, constant string options]) – Returns a relation consists of `(int|bigint|string) feature, float weight`.
Class: hivemall.regression.PassiveAggressiveRegressionUDTF$PA2a
Function: train_perceptron
train_perceptron(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Perceptron binary classifier
Class: hivemall.classifier.PerceptronUDTF
Function: train_plsa
train_plsa(array words[, const string options]) – Returns a relation consists of
Class: hivemall.topicmodel.PLSAUDTF
Function: train_randomforest_classifier
train_randomforest_classifier(array features, int label [, const string options, const array classWeights])- Returns a relation consists of var_importance, int oob_errors, int oob_tests>
Class: hivemall.smile.classification.RandomForestClassifierUDTF
Function: train_randomforest_regr
train_randomforest_regr(array features, double target [, string options]) – Returns a relation consists of var_importance, double oob_errors, int oob_tests>
Class: hivemall.smile.regression.RandomForestRegressionUDTF
Function: train_randomforest_regressor
train_randomforest_regressor(array features, double target [, string options]) – Returns a relation consists of var_importance, double oob_errors, int oob_tests>
Class: hivemall.smile.regression.RandomForestRegressionUDTF
Function: train_regressor
train_regressor(list features, double label [, const string options]) – Returns a relation consists of
Build a prediction model by a generic regressor
Class: hivemall.regression.GeneralRegressorUDTF
Function: train_scw
train_scw(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Soft Confidence-Weighted (SCW-1) binary classifier
Class: hivemall.classifier.SoftConfideceWeightedUDTF$SCW1
Function: train_scw2
train_scw2(list features, int label [, const string options]) – Returns a relation consists of
Build a prediction model by Soft Confidence-Weighted 2 (SCW-2) binary classifier
Class: hivemall.classifier.SoftConfideceWeightedUDTF$SCW2
Function: train_slim
train_slim( int i, map r_i, map> topKRatesOfI, int j, map r_j [, constant string options]) – Returns row index, column index and non-zero weight value of prediction model
Class: hivemall.recommend.SlimUDTF
transform(expr, func) – Transforms elements in an array using the function.
Class: org.apache.spark.sql.catalyst.expressions.ArrayTransform
Function: translate
translate(input, from, to) – Translates the `input` string by replacing the characters present in the `from` string with the corresponding characters in the `to` string.
Class: org.apache.spark.sql.catalyst.expressions.StringTranslate
Function: transpose_and_dot
transpose_and_dot(array X, array Y) – Returns dot(X.T, Y) as array>, shape = (X.#cols, Y.#cols)
WITH input as (
select array(1.0, 2.0, 3.0, 4.0) as x, array(1, 2) as y
UNION ALL
select array(2.0, 3.0, 4.0, 5.0) as x, array(1, 2) as y
)
select
transpose_and_dot(x, y) as xy,
transpose_and_dot(y, x) as yx
from
input;
[[“3.0″,”6.0”],[“5.0″,”10.0”],[“7.0″,”14.0”],[“9.0″,”18.0”]] [[“3.0″,”5.0″,”7.0″,”9.0”],[“6.0″,”10.0″,”14.0″,”18.0”]]
Class: hivemall.tools.matrix.TransposeAndDotUDAF
Function: tree_export
tree_export(string model, const string options, optional array featureNames=null, optional array classNames=null) – exports a Decision Tree model as javascript/dot]
Class: hivemall.smile.tools.TreeExportUDF
Function: tree_predict
tree_predict(string modelId, string model, array features [, const string options | const boolean classification=false]) – Returns a prediction result of a random forest in a posteriori> for classification and for regression
Class: hivemall.smile.tools.TreePredictUDF
Function: tree_predict_v1
tree_predict_v1(string modelId, int modelType, string script, array features [, const boolean classification]) – Returns a prediction result of a random forest
Class: hivemall.smile.tools.TreePredictUDFv1
Function: trim
trim(str) – Removes the leading and trailing space characters from `str`. trim(BOTH trimStr FROM str) – Remove the leading and trailing `trimStr` characters from `str` trim(LEADING trimStr FROM str) – Remove the leading `trimStr` characters from `str` trim(TRAILING trimStr FROM str) – Remove the trailing `trimStr` characters from `str`
Class: org.apache.spark.sql.catalyst.expressions.StringTrim
Function: trunc
trunc(date, fmt) – Returns `date` with the time portion of the day truncated to the unit specified by the format model `fmt`.`fmt` should be one of [“year”, “yyyy”, “yy”, “mon”, “month”, “mm”]
Class: org.apache.spark.sql.catalyst.expressions.TruncDate
Function: truncate_array
Class: brickhouse.udf.collect.TruncateArrayUDF
Function: try_cast
try_cast(ANY src, const string typeName) – Explicitly cast a value as a type. Returns null if cast fails.
SELECT try_cast(array(1.0,2.0,3.0), ‘array‘)
SELECT try_cast(map(‘A’,10,’B’,20,’C’,30), ‘map‘)
Class: hivemall.tools.TryCastUDF
Function: ucase
ucase(str) – Returns `str` with all characters changed to uppercase.
Class: org.apache.spark.sql.catalyst.expressions.Upper
Function: udfarrayconcat
udfarrayconcat(values) – Concatenates the array arguments
Class: com.whereos.udf.UDFArrayConcat
Function: unbase64
unbase64(str) – Converts the argument from a base 64 string `str` to a binary.
Class: org.apache.spark.sql.catalyst.expressions.UnBase64
Function: unbase91
unbase91(string) – Convert a BASE91 string to a binary
SELECT inflate(unbase91(base91(deflate(‘aaaaaaaaaaaaaaaabbbbccc’))));
aaaaaaaaaaaaaaaabbbbccc
Class: hivemall.tools.text.Unbase91UDF
Function: unbits
unbits(long[] bitset) – Returns an long array of the give bitset representation
SELECT unbits(to_bits(array(1,4,2,3)));
[1,2,3,4]
Class: hivemall.tools.bits.UnBitsUDF
Function: unhex
unhex(expr) – Converts hexadecimal `expr` to binary.
Class: org.apache.spark.sql.catalyst.expressions.Unhex
Function: union_hyperloglog
union_hyperloglog(x) – Merges multiple hyperloglogs together.
Class: brickhouse.udf.hll.UnionHyperLogLogUDAF
Function: union_map
union_map(x) – Returns a map which contains the union of an aggregation of maps
Class: brickhouse.udf.collect.UnionUDAF
Function: union_max
union_max(x, n) – Returns an map of the union of maps of max N elements in the aggregation group
Class: brickhouse.udf.collect.UnionMaxUDAF
Function: union_sketch
union_sketch(x) – Constructs a sketch set to estimate reach for large values by collecting multiple sketches
Class: brickhouse.udf.sketch.UnionSketchSetUDAF
Function: union_vector_sum
union_vector_sum(x) – Aggregate adding vectors together
Class: brickhouse.udf.timeseries.VectorUnionSumUDAF
Function: unix_timestamp
unix_timestamp([timeExp[, format]]) – Returns the UNIX timestamp of current or specified time.
Class: org.apache.spark.sql.catalyst.expressions.UnixTimestamp
Function: upper
upper(str) – Returns `str` with all characters changed to uppercase.
Class: org.apache.spark.sql.catalyst.expressions.Upper
Function: uuid
uuid() – Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
Class: org.apache.spark.sql.catalyst.expressions.Uuid
Function: var_pop
var_pop(expr) – Returns the population variance calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.VariancePop
Function: var_samp
var_samp(expr) – Returns the sample variance calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.VarianceSamp
Function: variance
variance(expr) – Returns the sample variance calculated from values of a group.
Class: org.apache.spark.sql.catalyst.expressions.aggregate.VarianceSamp
Function: vector_add
Class: brickhouse.udf.timeseries.VectorAddUDF
Function: vector_cross_product
Multiply a vector times another vector
Class: brickhouse.udf.timeseries.VectorCrossProductUDF
Function: vector_dot
vector_dot(array x, array y) – Performs vector dot product.
SELECT vector_dot(array(1.0,2.0,3.0),array(2.0,3.0,4.0));
20
SELECT vector_dot(array(1.0,2.0,3.0),2);
[2.0,4.0,6.0]
Class: hivemall.tools.vector.VectorDotUDF
Function: vector_dot_product
Return the Dot product of two vectors
Class: brickhouse.udf.timeseries.VectorDotProductUDF
Function: vector_magnitude
Class: brickhouse.udf.timeseries.VectorMagnitudeUDF
Function: vector_scalar_mult
Multiply a vector times a scalar
Class: brickhouse.udf.timeseries.VectorMultUDF
Function: vectorize_features
vectorize_features(array featureNames, feature1, feature2, .. [, const string options]) – Returns a feature vector array
Class: hivemall.ftvec.trans.VectorizeFeaturesUDF
Function: voted_avg
voted_avg(double value) – Returns an averaged value by bagging for classification
Class: hivemall.ensemble.bagging.VotedAvgUDAF
Function: weekday
weekday(date) – Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, …, 6 = Sunday).
Class: org.apache.spark.sql.catalyst.expressions.WeekDay
Function: weekofyear
weekofyear(date) – Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
Class: org.apache.spark.sql.catalyst.expressions.WeekOfYear
Function: weight_voted_avg
weight_voted_avg(expr) – Returns an averaged value by considering sum of positive/negative weights
Class: hivemall.ensemble.bagging.WeightVotedAvgUDAF
Function: when
CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END – When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.
Class: org.apache.spark.sql.catalyst.expressions.CaseWhen
Function: window
Class: org.apache.spark.sql.catalyst.expressions.TimeWindow
Function: word_ngrams
word_ngrams(array words, int minSize, int maxSize]) – Returns list of n-grams for given words, where `minSize <= n <= maxSize`
SELECT word_ngrams(tokenize('Machine learning is fun!', true), 1, 2);
["machine","machine learning","learning","learning is","is","is fun","fun"]
Class: hivemall.tools.text.WordNgramsUDF
Function: write_to_graphite
Writes metric or collection of metrics to graphite.write_to_graphite(String hostname, int port, Map nameToValue, Long timestampInSeconds) write_to_graphite(String hostname, int port, Map nameToValue) write_to_graphite(String hostname, int port, String metricName, Double metricVaule, Long timestampInSeconds) write_to_graphite(String hostname, int port, String metricName, Double metricVaule)
Class: brickhouse.udf.sanity.WriteToGraphiteUDF
Function: write_to_tsdb
This function writes metrics to the TSDB (metics names should look like proc.loadavg.1min, http.hits while tags string is space separated collection of tags). On failiure returns ‘WRITE_FAILED’ otherwise ‘WRITE_OK’ write_to_tsdb(String hostname, int port, Map nameToValue, String tags, Long timestampInSeconds) write_to_tsdb(String hostname, int port, Map nameToValue, String tags) write_to_tsdb(String hostname, int port, Map nameToValue) write_to_tsdb(String hostname, int port, String metricName, Double metricVaule, String tags, Long timestampInSeconds) write_to_tsdb(String hostname, int port, String metricName, Double metricVaule, String tags) write_to_tsdb(String hostname, int port, String metricName, Double metricVaule)
Class: brickhouse.udf.sanity.WriteToTSDBUDF
Function: x_rank
x_rank(KEY) – Generates a pseudo sequence number starting from 1 for each key
Class: hivemall.tools.RankSequenceUDF
Function: xpath
xpath(xml, xpath) – Returns a string array of values within the nodes of xml that match the XPath expression.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathList
Function: xpath_boolean
xpath_boolean(xml, xpath) – Returns true if the XPath expression evaluates to true, or if a matching node is found.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathBoolean
Function: xpath_double
xpath_double(xml, xpath) – Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathDouble
Function: xpath_float
xpath_float(xml, xpath) – Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathFloat
Function: xpath_int
xpath_int(xml, xpath) – Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathInt
Function: xpath_long
xpath_long(xml, xpath) – Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathLong
Function: xpath_number
xpath_number(xml, xpath) – Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathDouble
Function: xpath_short
xpath_short(xml, xpath) – Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathShort
Function: xpath_string
xpath_string(xml, xpath) – Returns the text contents of the first xml node that matches the XPath expression.
Class: org.apache.spark.sql.catalyst.expressions.xml.XPathString
Function: year
year(date) – Returns the year component of the date/timestamp.
Class: org.apache.spark.sql.catalyst.expressions.Year
Function: zip_with
zip_with(left, right, func) – Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.
Class: org.apache.spark.sql.catalyst.expressions.ZipWith
Function: zscore
zscore(value, mean, stddev) – Returns a standard score (zscore)
Class: hivemall.ftvec.scaling.ZScoreUDF
Function: |
expr1 | expr2 – Returns the result of bitwise OR of `expr1` and `expr2`.
Class: org.apache.spark.sql.catalyst.expressions.BitwiseOr
Function: ~
~ expr – Returns the result of bitwise NOT of `expr`.
Class: org.apache.spark.sql.catalyst.expressions.BitwiseNot