flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Flume perf measurements
Date Fri, 10 Apr 2015 18:44:56 GMT
Will have this info on the wiki soon, but thought of sending it out right away to users list
also since there seem to be some threads on performance in the users list.



Sample Flume v1.4 Measurements for reference:

Here are some sample measurements taken with a single agent and 500 byte events.

Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes).

Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM.


1.     File channel with HDFS Sink (Sequence File):

Source: 4 x Exec Source, 100k batchSize

HDFS Sink Batch size: 500,000

Channel: File

Number of data dirs: 8






Events/Sec


Sink Count


1 data dirs


2 data dirs


4 data dirs


6 data dirs


8 data dirs


10 data dirs


1


14.3 k

















2


21.9 k

















4





35.8 k














8


24.8 k


43.8 k


72.5 k


77 k


78.6 k


76.6 k


10








58 k








12








49.3 k


49 k





Was looking for sweet spot in perf. So did not take measurements for all data  points on grid.
Only too for the ones that made sense. For example: when perf dropped by adding more sinks,
did not take more measurements for those rows.


2.     HDFS Sink:

Channel: Memory



# of  HDFS

Sinks


Snappy

BatchSz:1.2mill


Snappy

BatchSz:1.4mill


Sequence File

BatchSz:1.2mill


1


34.3 k


33 k


33 k


2


71 k


75 k


69 k


4


141 k


145 k


141 k


8


271 k


273 k


251 k


12


382 k


380 k


370 k


16


478 k


538 k


486 k



Some simple observations :

  *   increasing number of dataDirs helps FC perf even on single disk systems
  *   Increasing  number of sinks helps
  *   Max throughput observed was about 538k events/sec for HDFS sink which is approx 240MB/s

Mime
View raw message