flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@cloudera.com>
Subject Re: Query regarding readMultiLine in Morphlines config
Date Thu, 17 Jul 2014 06:06:02 GMT
A morphline receives a flume event at a time. What and how much is contained in the flume event
is up to you, but flume isn’t really designed to send large events such as whole files or
parts of files, it’s designed to send small discrete events, like a log line per event,
or similar.

There is no existing command that does what you want. Consider writing a custom morphline
command that reads your event and spits out whatever you want, per http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#Implementing-your-own-Custom-Command

Having said that, the bottleneck is typically in Lucene inside Solr server, and Flume overheads
are insignificant in comparison to that.


On Jul 16, 2014, at 2:36 AM, Sanjay Ramanathan <sanjay.ramanathan@lucidworks.com> wrote:

> Hi,
> I have a log file with multiple records. (1 line= 1 record).
> I want to send N lines (say 20) at a time to morphlines, and then send it to Solr as
a single Solr document.
> (This is an experiment to see if the performance is better than the regular way, of using
readLine and parsing each log line as a solarDocument).
> The number of documents is going to be in billions.
> I had a look at the readMultiLine documentation present here: http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/readMultiLine
> I would like to know how to effectively use readMultiLine(if it is possible), to tell
readMultiLine to pick up 20 lines/records in one go, and create 20 fields with the text of
each line. (use a counter within the regex, or something similar).
> Kindly let me know if you have worked on something similar, or redirect me to some informative
pages for similar problem statement.
> Sincerely,
> Sanjay Ramanathan

View raw message