flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: How to convert *.bz2.tmp to *.bz2 file after restating the instance
Date Thu, 13 Nov 2014 00:24:56 GMT
Depending on your configuration setup, every batch is likely writing a
stream of bzip2 and these are effectively concatenated together into a
single file. So Hive should (hopefully) be reading all of them except the
last (partial) batch, which is OK to throw away because Flume will retry it
when it comes back up. If Hive doesn't support that, maybe you should try
writing in a format other than compressed text -- possibly compressed Avro
or compressed SequenceFile (both of these formats support compression
internally and are handled well by most tools).

Regarding the .tmp file, this should be manually renamed to a non-tmp file
when a server crash or ungraceful shutdown happens (or set up a cron job to
look for old ones). Flume doesn't currently try to remember the .tmp files
it previously wrote to and try to rename or continue them.

Mike

On Tue, Nov 11, 2014 at 3:35 PM, Arun Gujjar <arungujjartest@yahoo.com>
wrote:

> Hi,
>
>
> Whenever we restart flume agent it creates a new HDFS file and start
> writing the data into that file. The earlier file which was created will
> still be left as *bz2.tmp and from HIVE queries we found that we were not
> able to read the data from this file.
> Here are the two questions I have .
> 1. Could you please suggest how we can convert this bz2.tmp to bz2 file?
> because we loose this data i.e. present in bz2.tmp file today.
> 2. Is there as way to configure flume to start writing the data into the
> existing bz2.tmp file instead of creating a new file?
>
> Can someone please answer this?
>
> Regards
> Arun
>
>
>

Mime
View raw message