I'm researching about Flume as a solution for web analytics.
I read some texts about that, and my idea is to use Flume to collect the logs and put in a Cassadra database. But first i have some doubts that I wanna share.
Is a good approach process the log "in the fly" and insert it in the database processed?
Or is better collect the log, and store them (e.g. HDFS), and have scheduled jobs with Pig and later insert in a database like HBase or Cassandra?
I found an interesting solution made by Gemini (now Cloudian) called logprocessing, someone used it?