flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Flume with Kafka , Architecture.
Date Tue, 17 Feb 2015 10:13:35 GMT
Hi,

I have some machines with Kafka and DataNotes in different machines. I
want to get with Flume the data from Kafka and store in HDFS. What's
the best architecture? I assume that all the machines have access to
the others.

Cluster1 (Kafka + Flume) ---> Cluster2 (Hdfs)
There are a agent in each machine where  Kafka is installed and the
sink writes in HDFS directly, it could be configured some compress
option in the sink, etc..

Cluster1 (Kafka + Flume + Avro) --> Cluster2(Flume + Avro + HDFS)
There are a agent in each machine where  Kafka is installed. Flume
sends data to another flume through Avro and Flume which is installed
in the DataNode writes data in HDFS.

Cluster1 (Kafka) --> Cluster2(Flume + HDFS)
Flume is just installed in the DataNodes

I don't like to install Flume in the DataNodes because these machines
execute process as Spark, Hive, Impala, MapReduce and they spend so
many resources on theirs tasks. On other hand, it is where data have
to be sent.
I could be configure more than one source to get data from Kafka and
more than one Flume to have more htan one VM.
Could someone comment about advantages and disvantages that finds in
each scenario?

Mime
View raw message