flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Taborsky <michal.tabor...@nrholding.com>
Subject Re: Logs used as Flume Sources and real-time analytics
Date Thu, 02 Feb 2012 14:54:42 GMT
Hello Alain,

we are using Flume for probably the same purposes. We are writing JSON
encoded event data to flat file on every application server. Since each
application server writes only maybe tens of events per second, the
performance hit of writing to disk is negligible (and the events are
written to disk only after the content is generated and sent to the user,
so there is no latency for the end user). This file is tailed by Flume and
delivered thru collectors to HDFS. The collectors are forking the events to
RabbitMQ as well. We have a Node.js application, that picks up these events
and does some real-time analytics on them. The delay between event
origination and analytics is below 10 seconds, usually 1-3 seconds in total.

Hope this helps.

Michal Táborský*
*chief systems architect*
Netretail Holding, BV
nrholding.com <http://www.nrholding.com>

2012/2/2 Alain RODRIGUEZ <arodrime@gmail.com>

> Hi,
> I'm new with Flume and I'd like to use it to get a stable flow of data to
> my database (To be able to handle rush hours by delaying the write in
> database, without introducing any timeout or latency to the user).
> My questions are :
> What is the best way to create the log file that will be used as source
> for flume ?
> Our production environment is running apache servers and php scripts.
> I can't just use access log because some informations are stored in
> session, so I need to build a custom source.
> An other point is that writing a file seems to be primitive and not really
> efficient since it writes the disk instead of writing the memory for any
> event I store (many events every second).
> How to use this system (as Facebook does with scribe) to proceed real-time
> analytics ?
> I'm open to here about hdfs, hbase or whatever could help reaching my
> goals which are a stable flow to the database and near real-time analytics
> (seconds to minutes).
> Thanks for your help.
> Alain

View raw message