flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: Flume with Kafka , Architecture.
Date Tue, 17 Feb 2015 20:29:52 GMT
I like the first option (Kafka + Flume cluster to HDFS cluster)

Flume doesn't actually benefit much from being local to HDFS, and as you
noticed - it may take resources from Spark and Impala.

Flume can live on same nodes as Kafka. Especially if you are using it with
Kafka channel - Kafka can be a bit sensitive to serious memory or disk

Hope this helps.


On Tue, Feb 17, 2015 at 2:13 AM, Guillermo Ortiz <konstt2000@gmail.com>

> Hi,
> I have some machines with Kafka and DataNotes in different machines. I
> want to get with Flume the data from Kafka and store in HDFS. What's
> the best architecture? I assume that all the machines have access to
> the others.
> Cluster1 (Kafka + Flume) ---> Cluster2 (Hdfs)
> There are a agent in each machine where  Kafka is installed and the
> sink writes in HDFS directly, it could be configured some compress
> option in the sink, etc..
> Cluster1 (Kafka + Flume + Avro) --> Cluster2(Flume + Avro + HDFS)
> There are a agent in each machine where  Kafka is installed. Flume
> sends data to another flume through Avro and Flume which is installed
> in the DataNode writes data in HDFS.
> Cluster1 (Kafka) --> Cluster2(Flume + HDFS)
> Flume is just installed in the DataNodes
> I don't like to install Flume in the DataNodes because these machines
> execute process as Spark, Hive, Impala, MapReduce and they spend so
> many resources on theirs tasks. On other hand, it is where data have
> to be sent.
> I could be configure more than one source to get data from Kafka and
> more than one Flume to have more htan one VM.
> Could someone comment about advantages and disvantages that finds in
> each scenario?

View raw message