From flume-user-return-383-apmail-incubator-flume-user-archive=incubator.apache.org@incubator.apache.org Wed Oct 19 19:18:05 2011 Return-Path: X-Original-To: apmail-incubator-flume-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-flume-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87D50944A for ; Wed, 19 Oct 2011 19:18:05 +0000 (UTC) Received: (qmail 17027 invoked by uid 500); 19 Oct 2011 19:18:05 -0000 Delivered-To: apmail-incubator-flume-user-archive@incubator.apache.org Received: (qmail 17008 invoked by uid 500); 19 Oct 2011 19:18:05 -0000 Mailing-List: contact flume-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: flume-user@incubator.apache.org Delivered-To: mailing list flume-user@incubator.apache.org Received: (qmail 17000 invoked by uid 99); 19 Oct 2011 19:18:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2011 19:18:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cgandevia@gmail.com designates 209.85.214.47 as permitted sender) Received: from [209.85.214.47] (HELO mail-bw0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2011 19:17:58 +0000 Received: by bkat8 with SMTP id t8so2752813bka.6 for ; Wed, 19 Oct 2011 12:17:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=/4XFDL32HgbACjsNhb71EQso8ddTzrED12+gf3Ml22U=; b=pqAKN2AnTOLMGcktSVbOiYDwrKcpbNnu2zBhjzacnzcT9X9vpVEjsWmazbResIwkad XQtvyvIIBOVLKKNFE9LKleDBWgj5Kh8kZEspUoZsdHl2VxrCLDnqqge6OBSDBcwiKfmr WNcRH+aPJcFtC/0MaLq/N8NfYOYRdN4Lx9rTk= Received: by 10.204.143.5 with SMTP id s5mr6031985bku.65.1319051858194; Wed, 19 Oct 2011 12:17:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.26.208 with HTTP; Wed, 19 Oct 2011 12:16:58 -0700 (PDT) In-Reply-To: References: From: Cameron Gandevia Date: Wed, 19 Oct 2011 12:16:58 -0700 Message-ID: Subject: Re: flume dying on InterruptException (nanos) To: flume-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0015173ff2fc72b34104afabb144 X-Virus-Checked: Checked by ClamAV on apache.org --0015173ff2fc72b34104afabb144 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We were seeing the same issue when our HDFS instance was overloaded and taking over a second to respond. I assume if whatever backend is down the collector will die and need to be restarted when it becomes available again= ? Doesn't seem very reliable On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers wr= ote: > We saw this problem when it was taking more than 1 second for a response > from writing to Cassandra (our back end). A single long response will ki= ll > the collector. We had to revert back to the version of Flume that uses > syncrhonization instead of read/write locking to get around this. > > Ralph > > On Oct 18, 2011, at 1:55 PM, AD wrote: > > > Hello, > > > > My collector keeps dying with the following error, is this a known > issue? Any idea how to prevent or find out what is causing it ? is > format("%{nanos}" an issue ? > > > > 2011-10-17 23:16:33,957 INFO > com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode > flume1-18 exited with error: null > > java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(Abs= tractQueuedSynchronizer.java:1246) > > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(Reent= rantReadWriteLock.java:1009) > > at > com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296) > > at > com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:= 67) > > at > com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:= 67) > > > > > > source: collectorSource("35853") > > sink: regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/: > -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]= +)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_ > -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date= ","hbase_request_method","hbase_request_host","hbase_request_url","hbase_re= sponse_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","= hbase_cache_hitmiss","hbase_origin_firstbyte") > format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:") > split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) { > attr2hbase("apache_logs","f1","","hbase_") } > > --=20 Thanks Cameron Gandevia --0015173ff2fc72b34104afabb144 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We were seeing the same issue when our HDFS instance was overloaded and tak= ing over a second to respond. I assume if whatever backend is down the coll= ector will die and need to be restarted when it becomes available again? Do= esn't seem very reliable=C2=A0

On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers= <ralph.= goers@dslextreme.com> wrote:
We saw this problem when it was taking more than 1 second for a response fr= om writing to Cassandra (our back end). =C2=A0A single long response will k= ill the collector. =C2=A0We had to revert back to the version of Flume that= uses syncrhonization instead of read/write locking to get around this.

Ralph

On Oct 18, 2011, at 1:55 PM, AD wrote:

> Hello,
>
> =C2=A0My collector keeps dying with the following error, is this a kno= wn issue? Any idea how to prevent or find out what is causing it ? =C2=A0is= format("%{nanos}" an issue ?
>
> 2011-10-17 23:16:33,957 INFO com.cloudera.flume.core.connector.DirectD= river: Connector logicalNode flume1-18 exited with error: null
> java.lang.InterruptedException
> =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.locks.AbstractQueuedSynch= ronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
> =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.locks.ReentrantReadWriteL= ock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> =C2=A0 =C2=A0 =C2=A0 at com.cloudera.flume.handlers.rolling.RollSink.c= lose(RollSink.java:296)
> =C2=A0 =C2=A0 =C2=A0 at com.cloudera.flume.core.EventSinkDecorator.clo= se(EventSinkDecorator.java:67)
> =C2=A0 =C2=A0 =C2=A0 at com.cloudera.flume.core.EventSinkDecorator.clo= se(EventSinkDecorator.java:67)
>
>
> source: =C2=A0collectorSource("35853")
> sink: =C2=A0regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/: -]+)\\]\\s(= [A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\= \s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_ -;]+)\"\\s(hit|mi= ss)\\s([0-9.]+)","hbase_remote_host","hbase_request_dat= e","hbase_request_method","hbase_request_host",&qu= ot;hbase_request_url","hbase_response_status","hbase_re= sponse_bytes","hbase_referrer","hbase_user_agent",= "hbase_cache_hitmiss","hbase_origin_firstbyte") format(= "%{nanos}:") split(":", 0, "hbase_") format(&= quot;%{node}:") split(":",0,"hbase_node") digest(&= quot;MD5","hbase_md5") collector(10000) { attr2hbase("a= pache_logs","f1","","hbase_") }




--
= Thanks

Cameron Gandevia
--0015173ff2fc72b34104afabb144--