flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Flume for multi KB or MB docs?
Date Tue, 16 Oct 2012 03:14:52 GMT
Hi Mike,

Thanks for the info!  Our docs, however, are not quite 100MB - more like 5MB max and most
of the time under 10KB.  Would you still say Flume is not the right tool for the job?  If
so, what is the main concern?  Is it about the number of documents Flume will keep in memory
at any one time and thus require a potentially large heap and still risk OOMing?  Or is the
main concern that writing such "large" documents to disk will be slow?

My documents need to end up in Solr or ElasticSearch and maybe also in HDFS, so I was hoping
I could get ES and HDFS sinks from Flume for free.

Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 

> From: Mike Percy <mpercy@apache.org>
>To: user@flume.apache.org; Otis Gospodnetic <otis_gospodnetic@yahoo.com> 
>Sent: Monday, October 15, 2012 6:15 PM
>Subject: Re: Flume for multi KB or MB docs?
>Hi Otis,
>Flume was designed as a streaming event transport system, not as a general purpose file
transfer system. The two have quite different characteristics, so while binary files could
be transported by Flume, if you tried to transport a 100MB PDF as a single event you may have
issues around memory allocation, GC, transfer speed, etc., since we hold at least one event
at a time in memory. However if you want to transfer a large log file and each line is an
event then it's a perfect use case because you care about the individual events more than
the file itself.
>For transferring very large binary files that are not events or records, you may want
to look for something that it good at being a single-hop system with resume capability, like
rsync, to transfer the files. Then I suppose you could use the hadoop fs shell and a small
script to store the data onto HDFS. You probably wouldn't need all the fancy tagging, routing,
and serialization features that Flume has.
>Hope this helps.
>On Sun, Oct 14, 2012 at 5:49 PM, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
>>We're considering using Flume for transport of potentially large "documents" (think
documents that can be as small as tweets or as large as PDF files).
>>I'm wondering if Flume is suitable for transporting potentially large documents (in
the most reliable mode, too) or if there is something inherent in Flume that makes it a poor
choice for this use case?
>>Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 
View raw message