flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Transfering compressed (gzip) files
Date Mon, 22 Oct 2012 18:39:10 GMT
   Flume is designed to transfer a continuous stream of events into hadoop.
It appears that in your use case each gzip file is a collection of events
that needs to be moved.  The closest thing that i can see flume supporting
your use case is through the spooling directory source
... which has not yet been released.

On Mon, Oct 22, 2012 at 11:14 AM, Sadananda Hegde <saduhegde@gmail.com>wrote:

> Hi Harish,
> I am still exploring my options and that's part of my question too - which
> source should I be using.
> Currently I have set up my flume ng configuration to use exec source (exec
> source, file channel and hdfs sink); but can change to use a
> different source if it handles the compressed files.
> Thanks,
> Sadu
> On Mon, Oct 22, 2012 at 10:27 AM, Harish Mandala <mvharish14988@gmail.com>wrote:
>> Hi,
>> Which of the flume sources are you trying to use?
>> Regards,
>> Harish
>> On Mon, Oct 22, 2012 at 11:18 AM, Sadananda Hegde <saduhegde@gmail.com>wrote:
>>> My application servers produce data files that are in compressed format
>>> (gzip). I am planning to use flume ng (1.2.0) to collect those files and
>>> transfer them to hadoop cluster (write to HDFS). Is it possible to read and
>>> transfer them without uncomressing first? My sink would be HDFS and there
>>> are options to compress before writing to HDFS. That would work fine if my
>>> source is uncompressed text file and need to store hdfs file in compressed
>>> format. But in my case, the source itself is compressed. What would be the
>>> best options to handle such cases?
>>> Thanks for your help.
>>> Sadu

View raw message