hive常用需求写法

会一直更新 值得收藏

1. hive中划分年龄段函数用法

第一种 case when

case
                    when age >= 0 and age <= 10 then '0-10'
                    when age > 10 and age <= 20 then '10-20'
                    when age > 20 and age <= 30 then '20-30'
                    when age > 30 and age <= 40 then '30-40'
                    when age > 40 and age <= 50 then '40-50'
                    when age > 50 and age <= 60 then '50-60'
                    when age > 60 and age <= 70 then '60-70'
                    end age_phase

第二种 数学函数拼接

select concat(floor(20 / 10) * 10, '-', (floor(20 / 10) + 1) * 10) as x;

2. 求连续登录的活跃用户

方法一:窗口函数按照用户id分组 日期减去出现的次数 相同日期则为连续登录 最后 hiving 筛选 >= 为连续登录两天的用户

select user_id,
                   ds,
                   max(age) as age,
                   count(2) as cnt
            from (select *,
                         date_sub(t2.dt, rn) as ds
                  from (select t1.dt,
                               t1.user_id,
                               t1.age,
                               row_number() over ( partition by t1.user_id order by t1.dt asc ) as rn
                        from (select distinct dt, user_id, age
                              from test5) t1) t2) t3
            group by ds, user_id
            having cnt >= 2