flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Logs used as Flume Sources and real-time analytics
Date Thu, 02 Feb 2012 14:01:36 GMT

I'm new with Flume and I'd like to use it to get a stable flow of data to
my database (To be able to handle rush hours by delaying the write in
database, without introducing any timeout or latency to the user).

My questions are :

What is the best way to create the log file that will be used as source for
flume ?

Our production environment is running apache servers and php scripts.
I can't just use access log because some informations are stored in
session, so I need to build a custom source.
An other point is that writing a file seems to be primitive and not really
efficient since it writes the disk instead of writing the memory for any
event I store (many events every second).

How to use this system (as Facebook does with scribe) to proceed real-time
analytics ?

I'm open to here about hdfs, hbase or whatever could help reaching my goals
which are a stable flow to the database and near real-time analytics
(seconds to minutes).

Thanks for your help.


View raw message