flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pritchard, Charles X. -ND" <Charles.X.Pritchard....@disney.com>
Subject Re: Converting array<tinyint> in Flume avro default output to string in Hive
Date Thu, 14 Nov 2013 18:44:37 GMT
It’s a common case in that the Event listed here is the generic avro_event object when serializing
to HDFS.

We had someone simply change Event from body[byte[]] to body[String] when serializing, which
has the unfortunate side-effect of altering data if it’s not UTF-8.
It did however solve the Hive issue quickly.


On Nov 14, 2013, at 10:22 AM, Nitin Pawar <nitinpawar432@gmail.com<mailto:nitinpawar432@gmail.com>>

Concat support is there .. but for for string datatypes. Not for tinyints.  Not sure its so
common use case.
If you want to build it then you can contribute back to hive.

On Thu, Nov 14, 2013 at 11:48 PM, Deepak Subhramanian <deepak.subhramanian@gmail.com<mailto:deepak.subhramanian@gmail.com>>
Thanks Nitin. UDF is a good solution. I was wondering if there was a builtin support for hive
since it is the default flume format for flume avro sink.

Thanks, Deepak

On Wed, Nov 13, 2013 at 1:15 PM, Nitin Pawar <nitinpawar432@gmail.com<mailto:nitinpawar432@gmail.com>>
sorry hit send to soon ..

correction rather than just changing your table definition.

On Wed, Nov 13, 2013 at 6:45 PM, Nitin Pawar <nitinpawar432@gmail.com<mailto:nitinpawar432@gmail.com>>
Not really sure there is a direct way to concat anything other than strings in hive unless
typecasting them to string.

So you may want to keep the datatype of array elements to strings and try. else you may want
to build your own udf to do it which looks more elegant way rather than just typecasting it.

On Wed, Nov 13, 2013 at 5:18 PM, Deepak Subhramanian <deepak.subhramanian@gmail.com<mailto:deepak.subhramanian@gmail.com>>


Anyone tried reading the default avro output from flume in Hive.

I am using Flume to generate events in the default flume avro output format. Bytes in avro
schema are stored as array<tinyint> in Hive when I use avroserde for hive . How do I
convert array<tinyint> to string to read the flume body data. I am using hive version

CREATE  external TABLE flume_avro_test ROW FORMAT
    > SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    > LOCATION '/testlogs/2013/11/08/17'
    > TBLPROPERTIES ('avro.schema.literal'='{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}');

describe flume_avro_test
    > ;
headers map<string,string> from deserializer
body array<tinyint> from deserializer

Deepak Subhramanian

Nitin Pawar

Nitin Pawar

Deepak Subhramanian

Nitin Pawar

View raw message