flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Shannon <cshannon...@gmail.com>
Subject Re: Event breaking in flume
Date Mon, 30 Dec 2013 16:48:42 GMT
For the company I'm with, we looked at using the spooling directory source
for multi-line and binary content, and we decided it was best to create our
own source that handled our data streams. Our data consisted of 100 MB tar
files generated by TeaLeaf and deposited in a directory every 10 seconds.
Creating a custom source will probably save you a lot of grief, because
that will give you the most control (and hence most certainty). It did for
us.


On Mon, Dec 30, 2013 at 8:17 AM, Brock Noland <brock@cloudera.com> wrote:

> Yes, it is possible to handle multi-line events and handling stack traces
> is very common place.
>
> However, using exec source is going to be limiting. The "correct" solution
> is:
>
> 1) Use spooling directory source
> 2) Write a little deserializer to handle your format.
>
> Another solution is:
>
> 1) replace new lines with something like __NL__ by a perl script in your
> exec source
> 2) Use morphlines to replace __NL__ with \n
>
> A third and less desirable solution would be:
>
> 1) Use the morphlines intercepter to merge multiple events to a single
> event. This will not work well for a varity or reasons but the most common
> being that the exec source could hit it's "batch" size in the middle of of
> a stack trace in which case the stack trace will be in to different batches.
>
> Brock
>
>
>
> On Mon, Dec 30, 2013 at 5:05 AM, Joao Salcedo <joao.salcedo@gmail.com>wrote:
>
>> Looks that it is possible based on regular expression pattern matching
>>
>>
>> http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/readMultiLine
>>
>>
>> On Mon, Dec 30, 2013 at 9:56 PM, Chhaya Vishwakarma <
>> Chhaya.Vishwakarma@lntinfotech.com> wrote:
>>
>>> So is it not possible to handle multiline events in flume?
>>>
>>>
>>>
>>> *From:* Joao Salcedo [mailto:joao.salcedo@gmail.com]
>>> *Sent:* Monday, December 30, 2013 4:22 PM
>>>
>>> *To:* user@flume.apache.org
>>> *Subject:* Re: Event breaking in flume
>>>
>>>
>>>
>>> Maybe you can set up some morphlines and do some ETL in your event.
>>>
>>>
>>>
>>> I hope this help you.
>>>
>>>
>>>
>>>
>>> http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
>>>
>>>
>>>
>>> Cheers
>>>
>>>
>>>
>>> On Mon, Dec 30, 2013 at 9:34 PM, Ashish <paliwalashish@gmail.com> wrote:
>>>
>>> I am not aware of any options out of the box. Maybe someone else can
>>> help.
>>>
>>> Alternate way is to write a custom source.
>>>
>>>
>>>
>>> On Mon, Dec 30, 2013 at 3:56 PM, Chhaya Vishwakarma <
>>> Chhaya.Vishwakarma@lntinfotech.com> wrote:
>>>
>>> Hi
>>>
>>> Exec as source and tail command
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish [mailto:paliwalashish@gmail.com]
>>> *Sent:* Monday, December 30, 2013 3:48 PM
>>> *To:* user@flume.apache.org
>>> *Subject:* Re: Event breaking in flume
>>>
>>>
>>>
>>> What is the Source you are using?
>>>
>>>
>>>
>>> On Mon, Dec 30, 2013 at 3:23 PM, Chhaya Vishwakarma <
>>> Chhaya.Vishwakarma@lntinfotech.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> By default flume considers one line as one event, But I want to do
>>> breaking on some other criteria how it can be achieved in flume? Is it
>>> possible to do ?
>>>
>>>
>>>
>>> 10 Sep 2013 19:43:33,561 [WebContainer : 9] ERROR - An Error has occured
>>> for com.marsh.framework.core.exception.MarshException: Record has been
>>> modified since last retrieved - Resubmit transaction
>>>
>>>
>>>
>>> 10 Sep 2013 19:43:33,561 [WebContainer : 9] ERROR -
>>> handleException():com.marsh.framework.core.exception.MarshException: Record
>>> has been modified since last retrieved - Resubmit transaction
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.ServiceAgreementImpl.updateAgreement(ServiceAgreementImpl.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreementmgmt.CSAManagerImpl.updateCSA(CSAManagerImpl.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreementmgmt.ejb.EJSRemoteStatelessServiceagreementManager_3dcfd156.updateCSA(Unknown
>>> Source)
>>>
>>>      at
>>> com.marsh.csa.serviceagreementmgmt.ejb._ServiceagreementManagerRemote_Stub.updateCSA(_ServiceagreementManagerRemote_Stub.java(Compiled
>>> Code))
>>>
>>>      at com.marsh.csa.proxy.CSAProxy.updateCSA(CSAProxy.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.SaveCSAAction.performAction(SaveCSAAction.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.CSAAbstractStrutsAction.execute(CSAAbstractStrutsAction.java(Compiled
>>> Code))
>>>
>>>      at
>>> org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java(Inlined
>>> Compiled Code))
>>>
>>>      at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java(Compiled
>>> Code))
>>>
>>> Caused by: com.marsh.framework.core.exception.MarshException: Record has
>>> been modified since last retrieved - Resubmit transaction
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.ServiceAgreementDAO.updateServiceAgreement(ServiceAgreementDAO.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.ServiceAgreementDAO.update(ServiceAgreementDAO.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.SAUpdateImpl.updateServiceAgreement(SAUpdateImpl.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.SAUpdateImpl.update(SAUpdateImpl.java(Compiled
>>> Code))
>>>
>>>      ... 26 more
>>>
>>> Caused by: com.marsh.framework.core.exception.MarshException: Record has
>>> been modified since last retrieved - Resubmit transaction
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.SaveCSAAction.performAction(SaveCSAAction.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.marsh.csa.serviceagreement.CSAAbstractStrutsAction.execute(CSAAbstractStrutsAction.java(Compiled
>>> Code))
>>>
>>>      at
>>> org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java(Inlined
>>> Compiled Code))
>>>
>>>      at
>>> org.apache.struts.action.RequestProcessor.process(RequestProcessor.java(Compiled
>>> Code))
>>>
>>>      at
>>> org.apache.struts.action.ActionServlet.process(ActionServlet.java(Inlined
>>> Compiled Code))
>>>
>>>      at
>>> org.apache.struts.action.ActionServlet.doPost(ActionServlet.java(Compiled
>>> Code))
>>>
>>>      at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled
>>> Code))
>>>
>>>      at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled
>>> Code))
>>>
>>>      at
>>> com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java(Compiled
>>> Code))
>>>
>>>
>>>
>>> this is a log file which I am writing to HBase. Whatever is highlighted
>>> das yellow I want that as one event and gray as another event.
>>>
>>> Basically I want to break the events on Date? Is it possible to do ?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> Chhaya Vishwakarma
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) may contain
>>> confidential or privileged information for the intended recipient(s).
>>> Unintended recipients are prohibited from taking action on the basis of
>>> information in this e-mail and using or disseminating the information, and
>>> must notify the sender and delete it from their system. L&T Infotech will
>>> not accept responsibility or liability for the accuracy or completeness of,
>>> or the presence of any virus or disabling code in this e-mail"
>>>
>>>
>>>
>>>
>>>
>>> --
>>> thanks
>>> ashish
>>>
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>
>>>
>>>
>>>
>>>
>>> --
>>> thanks
>>> ashish
>>>
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>
>>>
>>>
>>
>>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>

Mime
View raw message