Get invited to our slack community and get access to opportunities and data science insights

minkowski_distance


minkowski_distance(list x, list y, double p) – Returns sum(|x – y|^p)^(1/p)

WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
minkowski_distance(l.features, r.features, 1) as distance1, — p=1 (manhattan_distance)
minkowski_distance(l.features, r.features, 2) as distance2, — p=2 (euclid_distance)
minkowski_distance(l.features, r.features, 3) as distance3, — p=3
manhattan_distance(l.features, r.features) as manhattan_distance,
euclid_distance(l.features, r.features) as euclid_distance
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance1 asc;

doc1 doc2 distance1 distance2 distance3 manhattan_distance euclid_distance
1 2 4.0 2.4494898 2.1544347 4.0 2.4494898
1 3 5.0 2.6457512 2.2239802 5.0 2.6457512
2 3 1.0 1.0 1.0 1.0 1.0
2 1 4.0 2.4494898 2.1544347 4.0 2.4494898
3 2 1.0 1.0 1.0 1.0 1.0
3 1 5.0 2.6457512 2.2239802 5.0 2.6457512

Platforms: WhereOS, Spark, Hive
Class: hivemall.knn.distance.MinkowskiDistanceUDF

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Related Post

Leave a Comment