flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alaa Ali <contact.a...@gmail.com>
Subject Extract data using regex into HBase
Date Sun, 26 Oct 2014 18:50:09 GMT
Hello! I want to receive syslog, parse out the input using regex into
fields (for example username, source IP, destination IP), and store the
data in HBase into columns corresponding to those fields. I know how to do
the syslog source, but how do I go about doing the extraction+storing?

My thoughts:

1. Can I use a Regex Extractor Interceptor to make my own serializer
implementation that extracts data into multiple headers in the event? Then
use the AsyncHBase sink serializer to simply store the header values into
columns? Can I do that?

2. Should I pass the data to the AsyncHBase sink unaltered, and implement
everything in the sink's serializer.

It is worth noting that the input is in different formats, so my regex
implementation isn't one simple regex and will probably contain a lot of
ifs to, for example, extract the username because it won't always be in the
same place in the log. Which approach is best, or is there another
approach, or am I getting it wrong?

​- ​
Alaa Ali

View raw message