onehot_encoding(PRIMITIVE feature, …) – Compute onehot encoded label for each feature
WITH mapping as (
select
m.f1, m.f2
from (
select onehot_encoding(species, category) m
from test
) tmp
)
select
array(m.f1[t.species],m.f2[t.category],feature(‘count’,count)) as sparse_features
from
test t
CROSS JOIN mapping m;
[“2″,”8″,”count:9”]
[“5″,”8″,”count:10”]
[“1″,”6″,”count:101”]
Platforms: WhereOS, Spark, Hive
Class: hivemall.ftvec.trans.OnehotEncodingUDAF
More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.