flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rickard Cardell <rickard.card...@klarna.com>
Subject Re: Append existing Avro file - HDFS Sink
Date Fri, 12 Oct 2018 07:20:28 GMT
Den fre 20 apr. 2018 20:49Nitin Kumar <nitin.kumar2512@gmail.com> skrev:

> Hi All,
> I am using Flume v1.8 in which Flume agent comprises of Kafka Channel &
> HDFS Sink.
> I am able to write data in Avro file on HDFS into a external HIVE table,
> but the problem is whenever Flume gets restarted it closes that file and
> open a new file because of which I can see many small files. (Data is
> partition on the basis of date)
> Can't Flume append to existing file to avoid creation of new file?
No, not hdfs-sink at least

Also, how can I solve this problem which leads to creation of too many
> small files?

We also used the hdfs-sink but because of the high maintenance we went for
hbase-sink instead, which also gave us deduplication. The major drawback is
that it requires an extra step, an hbase to hdfs job.

Your many-small-files problem might be solved with an extra step, e.g oozie
job, that would merge smaller files to larger ones.

That would also solve the problem with the left over temp-files that flume
doesn't clean up in some circumstances


> Any help would be appreciated.
> --
> *Regards,Nitin Kumar*

View raw message