From user-return-6099-apmail-flume-user-archive=flume.apache.org@flume.apache.org Wed Oct 1 13:04:44 2014 Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EBBCF177D1 for ; Wed, 1 Oct 2014 13:04:44 +0000 (UTC) Received: (qmail 34990 invoked by uid 500); 1 Oct 2014 13:04:44 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 34933 invoked by uid 500); 1 Oct 2014 13:04:44 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 34923 invoked by uid 99); 1 Oct 2014 13:04:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 13:04:44 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ejudgie@gmail.com designates 209.85.216.53 as permitted sender) Received: from [209.85.216.53] (HELO mail-qa0-f53.google.com) (209.85.216.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 13:04:39 +0000 Received: by mail-qa0-f53.google.com with SMTP id v10so177118qac.26 for ; Wed, 01 Oct 2014 06:04:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=klSTdOLn79BVKM4gW7OYZZA3yfhENxKrIluRzjc+cw0=; b=h2JkIy78c2hn3HFYLC4o+W/HOn4AclaThmuRpvVvuQe3AFzEFdL2vXpawBOCdO65Q5 kJDV+KCg6aklkhDjcU6+gFEoonqzENyPF4iFAbvO3Dqy6NZGwbGyPb0OA4J3lEfBQ7j1 Nv+t2n/C+tTBVTSnWVpk9JwM8MSTlRIAFh4W7bpA/KIwBR3Bo1OVkCadUT8WJ8sbjtyr OACnLfBUDlmpqDDhAeYiBA2JFwmFj5MSZcaGOg8+MzWbot3kAeFKOvZPtBxdScMAQKx8 Wb03IgvEvJ3Mw/0jOsnYydbM1bs+ejAmoxKPG7tmsFQxButtb/iNDvaO+XIb2Opkv8H/ sazQ== X-Received: by 10.224.120.138 with SMTP id d10mr39058363qar.8.1412168657948; Wed, 01 Oct 2014 06:04:17 -0700 (PDT) Received: from eds-macbook-pro.home (pool-100-0-0-108.bstnma.fios.verizon.net. [100.0.0.108]) by mx.google.com with ESMTPSA id 75sm570723qgg.25.2014.10.01.06.04.16 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 01 Oct 2014 06:04:17 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_2AAEC26F-E641-49C3-A0FD-2825B39408B2" Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: HDFS sink to a remote HDFS node From: Ed Judge In-Reply-To: <1412137880576.7b2f369f@Nodemailer> Date: Wed, 1 Oct 2014 09:04:14 -0400 Cc: "shengyi.pan" Message-Id: <2F76C870-37F7-4167-9589-1691AB2F396F@gmail.com> References: <6C9F0C54-6F79-4407-A236-9FC9D0AF0BE0@gmail.com> <1412137880576.7b2f369f@Nodemailer> To: user@flume.apache.org X-Mailer: Apple Mail (2.1878.6) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_2AAEC26F-E641-49C3-A0FD-2825B39408B2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Looks like they are up. I see the following on one of the nodes but = both look generally the same (1 live datanode). [hadoop@localhost bin]$ hdfs dfsadmin -report 14/10/01 12:51:56 WARN util.NativeCodeLoader: Unable to load = native-hadoop library for your platform... using builtin-java classes = where applicable Configured Capacity: 40797364224 (38.00 GB) Present Capacity: 37030862848 (34.49 GB) DFS Remaining: 37030830080 (34.49 GB) DFS Used: 32768 (32 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 1 (1 total, 0 dead) Live datanodes: Name: 127.0.0.1:50010 (localhost) Hostname: localhost Decommission Status : Normal Configured Capacity: 40797364224 (38.00 GB) DFS Used: 32768 (32 KB) Non DFS Used: 3766501376 (3.51 GB) DFS Remaining: 37030830080 (34.49 GB) DFS Used%: 0.00% DFS Remaining%: 90.77% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Wed Oct 01 12:51:57 UTC 2014 I don=E2=80=99t know how to demonstrate that they are accessible except = to telnet into each of them. Right now that test shows that both nodes = accept the connection to port 50010. Is there some other test I can perform? Thanks, -Ed On Oct 1, 2014, at 12:31 AM, Hari Shreedharan = wrote: > Looks like one data node is inaccessible or down - so the HDFS client = has black listed it and the writes are failing as blocks are allocated = to that one. >=20 > Thanks, > Hari >=20 >=20 > On Tue, Sep 30, 2014 at 7:33 PM, Ed Judge wrote: >=20 > I=E2=80=99ve pulled over all of the Hadoop jar files for my flume = instance to use. I am seeing some slightly different errors now. = Basically I have 2 identically configured hadoop instances on the same = subnet. Running flume on those same instances and pointing flume at the = local hadoop/hdfs instance works fine and the files get written. = However, when I point it to the adjacent hadoop/hdfs instance I get many = exceptions/errors (show below) and the files never get written. Here is = my HDFS sink configuration on 10.0.0.14: >=20 > # Describe the sink > a1.sinks.k1.type =3D hdfs > a1.sinks.k1.hdfs.path =3D hdfs://10.0.0.16:9000/tmp/ > a1.sinks.k1.hdfs.filePrefix =3D twitter > a1.sinks.k1.hdfs.fileSuffix =3D .ds > a1.sinks.k1.hdfs.rollInterval =3D 0 > a1.sinks.k1.hdfs.rollSize =3D 10 > a1.sinks.k1.hdfs.rollCount =3D 0 > a1.sinks.k1.hdfs.fileType =3D DataStream > #a1.sinks.k1.serializer =3D TEXT > a1.sinks.k1.channel =3D c1 >=20 > Any idea why this is not working? >=20 > Thanks. >=20 > 01 Oct 2014 01:59:45,098 INFO = [SinkRunner-PollingRunner-DefaultSinkProcessor] = (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58) - Serializer =3D= TEXT, UseRawLocalFileSystem =3D false > 01 Oct 2014 01:59:45,385 INFO = [SinkRunner-PollingRunner-DefaultSinkProcessor] = (org.apache.flume.sink.hdfs.BucketWriter.open:261) - Creating = hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp > 01 Oct 2014 01:59:45,997 INFO [Twitter4J Async Dispatcher[0]] = (org.apache.flume.source.twitter.TwitterSource.onStatus:178) - = Processed 100 docs > 01 Oct 2014 01:59:47,754 INFO [Twitter4J Async Dispatcher[0]] = (org.apache.flume.source.twitter.TwitterSource.onStatus:178) - = Processed 200 docs > 01 Oct 2014 01:59:49,379 INFO [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStre= am:1378) - Exception in createBlockOutputStream > java.io.EOFException: Premature EOF: no length prefix available > at = org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987= ) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStrea= m(DFSOutputStream.java:1346) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1272) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja= va:525) > 01 Oct 2014 01:59:49,390 INFO [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream= :1275) - Abandoning = BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751 > 01 Oct 2014 01:59:49,398 INFO [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream= :1278) - Excluding datanode 127.0.0.1:50010 > 01 Oct 2014 01:59:49,431 WARN [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627) - = DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File = /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes = instead of minReplication (=3D1). There are 1 datanode(s) running and 1 = node(s) are excluded in this operation. > at = org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(Bl= ockManager.java:1430) > at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSN= amesystem.java:2684) > at = org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNode= RpcServer.java:584) > at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslat= orPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at = org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientN= amenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at = org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pro= tobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at = org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.= java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >=20 > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at = org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.j= ava:206) > at com.sun.proxy.$Proxy18.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at = org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoca= tionHandler.java:190) > at = org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHa= ndler.java:103) > at com.sun.proxy.$Proxy18.addBlock(Unknown Source) > at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBl= ock(ClientNamenodeProtocolTranslatorPB.java:361) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(D= FSOutputStream.java:1439) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1261) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja= va:525) > 01 Oct 2014 01:59:49,437 WARN [hdfs-k1-call-runner-2] = (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950) - Error while = syncing > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File = /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes = instead of minReplication (=3D1). There are 1 datanode(s) running and 1 = node(s) are excluded in this operation. > at = org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(Bl= ockManager.java:1430) > at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSN= amesystem.java:2684) > at = org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNode= RpcServer.java:584) > at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslat= orPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > at = org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientN= amenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at = org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pro= tobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at = org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.= java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >=20 > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at = org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.j= ava:206) > at com.sun.proxy.$Proxy18.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at = org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoca= tionHandler.java:190) > at = org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHa= ndler.java:103) > at com.sun.proxy.$Proxy18.addBlock(Unknown Source) > at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBl= ock(ClientNamenodeProtocolTranslatorPB.java:361) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(D= FSOutputStream.java:1439) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1261) > at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja= va:525) > 01 Oct 2014 01:59:49,439 WARN = [SinkRunner-PollingRunner-DefaultSinkProcessor] = (org.apache.flume.sink.hdfs.HDFSEventSink.process:463) - HDFS IO error > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File = /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes = instead of minReplication (=3D1). There are 1 datanode(s) running and 1 = node(s) are excluded in this operation. >=20 > On Sep 30, 2014, at 3:18 PM, Hari Shreedharan = wrote: >=20 >> You'd need to add the jars that hadoop itself depends on. Flume pulls = it in if Hadoop is installed on that machine, else you'd need to = manually download it and install it. If you are using Hadoop 2.x, = install the RPM provided by Bigtop. >>=20 >> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge wrote: >> I added commons-configuration and there is now another missing = dependency. What do you mean by =E2=80=9Call of Hadoop=E2=80=99s = dependencies=E2=80=9D? >>=20 >>=20 >> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan = wrote: >>=20 >>> You actually need to add of all Hadoop=E2=80=99s dependencies to = Flume classpath. Looks like Apache Commons Configuration is missing in = classpath. >>>=20 >>> Thanks, >>> Hari >>>=20 >>>=20 >>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge = wrote: >>>=20 >>> Thank you. I am using hadoop 2.5 which I think uses = protobuf-java-2.5.0.jar. >>>=20 >>> I am getting the following error even after adding those 2 jar files = to my flume-ng classpath: >>>=20 >>> 30 Sep 2014 18:27:03,269 INFO [lifecycleSupervisor-1-0] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61= ) - Configuration provider starting >>> 30 Sep 2014 18:27:03,278 INFO [conf-file-poller-0] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatc= herRunnable.run:133) - Reloading configuration file:./src.conf >>> 30 Sep 2014 18:27:03,288 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,289 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:9= 30) - Added sinks: k1 Agent: a1 >>> 30 Sep 2014 18:27:03,289 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,292 WARN [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration.:101) - Configuration = property ignored: i# =3D Describe the sink >>> 30 Sep 2014 18:27:03,292 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,292 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016) - Processing:k1 >>> 30 Sep 2014 18:27:03,312 INFO [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) - = Post-validation flume configuration contains configuration for agents: = [a1] >>> 30 Sep 2014 18:27:03,312 INFO [conf-file-poller-0] = (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150) = - Creating channels >>> 30 Sep 2014 18:27:03,329 INFO [conf-file-poller-0] = (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating = instance of channel c1 type memory >>> 30 Sep 2014 18:27:03,351 INFO [conf-file-poller-0] = (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) = - Created channel c1 >>> 30 Sep 2014 18:27:03,352 INFO [conf-file-poller-0] = (org.apache.flume.source.DefaultSourceFactory.create:39) - Creating = instance of source r1, type = org.apache.flume.source.twitter.TwitterSource >>> 30 Sep 2014 18:27:03,363 INFO [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:110) - = Consumer Key: 'tobhMtidckJoe1tByXDmI4pW3' >>> 30 Sep 2014 18:27:03,363 INFO [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:111) - = Consumer Secret: = '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ' >>> 30 Sep 2014 18:27:03,363 INFO [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:112) - Access = Token: '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5' >>> 30 Sep 2014 18:27:03,364 INFO [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:113) - Access = Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn' >>> 30 Sep 2014 18:27:03,825 INFO [conf-file-poller-0] = (org.apache.flume.sink.DefaultSinkFactory.create:40) - Creating = instance of sink: k1, type: hdfs >>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatc= herRunnable.run:145) - Failed to start agent because dependencies were = not found in classpath. Error follows. >>> java.lang.NoClassDefFoundError: = org/apache/commons/configuration/Configuration >>> at = org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsS= ystem.java:38) >>> at = org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetric= sSystem.java:36) >>> at = org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGrou= pInformation.java:106) >>> at = org.apache.hadoop.security.UserGroupInformation.(UserGroupInformat= ion.java:208) >>> at = org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:5= 53) >>> at = org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)= >>> at = org.apache.flume.conf.Configurables.configure(Configurables.java:41) >>> at = org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConf= igurationProvider.java:418) >>> at = org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(Abstr= actConfigurationProvider.java:103) >>> at = org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatch= erRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) >>> at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at = java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) >>> at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces= s$301(ScheduledThreadPoolExecutor.java:178) >>> at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S= cheduledThreadPoolExecutor.java:293) >>> at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145) >>> at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :615) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.lang.ClassNotFoundException: = org.apache.commons.configuration.Configuration >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> ... 17 more >>> 30 Sep 2014 18:27:33,491 INFO [agent-shutdown-hook] = (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79) - Stopping = lifecycle supervisor 10 >>> 30 Sep 2014 18:27:33,493 INFO [agent-shutdown-hook] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)= - Configuration provider stopping >>> [vagrant@localhost 6]$=20 >>>=20 >>> Is there another jar file I need? >>>=20 >>> Thanks. >>>=20 >>> On Sep 29, 2014, at 9:04 PM, shengyi.pan = wrote: >>>=20 >>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under = your flume-ng classpath, and the dependent hadoop jar version must match = your hadoop system. >>>> =20 >>>> if sink to hadoop-2.0.0, you should use "protobuf-java-2.4.1.jar" = (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is = under flume lib directory ), because the pb interface of hdfs-2.0 is = compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will = fail to start.... >>>> =20 >>>> =20 >>>> =20 >>>> =20 >>>> 2014-09-30 >>>> shengyi.pan >>>> =E5=8F=91=E4=BB=B6=E4=BA=BA=EF=BC=9AEd Judge >>>> =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4=EF=BC=9A2014-09-29 22:38 >>>> =E4=B8=BB=E9=A2=98=EF=BC=9AHDFS sink to a remote HDFS node >>>> =E6=94=B6=E4=BB=B6=E4=BA=BA=EF=BC=9A"user@flume.apache.org" >>>> =E6=8A=84=E9=80=81=EF=BC=9A >>>> =20 >>>> I am trying to run the flume-ng agent on one node with an HDFS sink = pointing to an HDFS filesystem on another node. >>>> Is this possible? What packages/jar files are needed on the flume = agent node for this to work? Secondary goal is to install only what is = needed on the flume-ng node. >>>>=20 >>>> # Describe the sink >>>> a1.sinks.k1.type =3D hdfs >>>> a1.sinks.k1.hdfs.path =3D hdfs:///tmp/ >>>>=20 >>>>=20 >>>> Thanks, >>>> Ed >>>=20 >>>=20 >>=20 >>=20 >=20 >=20 --Apple-Mail=_2AAEC26F-E641-49C3-A0FD-2825B39408B2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Looks like they are up.  I see the = following on one of the nodes but both look generally the same (1 live = datanode).

[hadoop@localhost bin]$ hdfs = dfsadmin -report
14/10/01 12:51:56 WARN util.NativeCodeLoader: = Unable to load native-hadoop library for your platform... using = builtin-java classes where applicable
Configured Capacity: 40797364224 = (38.00 GB)
Present Capacity: 37030862848 (34.49 GB)
DFS = Remaining: 37030830080 (34.49 GB)
DFS Used: 32768 (32 KB)
DFS Used%: = 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: = 0
Missing blocks: 0

-------------------------------------------------
Datanodes = available: 1 (1 total, 0 dead)

Live = datanodes:
Name: 127.0.0.1:50010 (localhost)
Hostname: localhost
Decommission = Status : Normal
Configured Capacity: 40797364224 (38.00 = GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 3766501376 (3.51 = GB)
DFS Remaining: 37030830080 (34.49 GB)
DFS Used%: 0.00%
DFS = Remaining%: 90.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 = (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache = Remaining%: 0.00%
Last contact: Wed Oct 01 12:51:57 UTC = 2014


I don=E2=80=99= t know how to demonstrate that they are accessible except to telnet into = each of them.  Right now that test shows that both nodes accept the = connection to port 50010.
Is there some other test I can = perform?

Thanks,
-Ed

On Oct 1, 2014, at 12:31 AM, Hari Shreedharan <hshreedharan@cloudera.com>= ; wrote:

Looks like one data node is = inaccessible or down - so the HDFS client has black listed it and the = writes are failing as blocks are allocated to that one.

Thanks,
Hari


On Tue, Sep 30, 2014 at 7:33 PM, = Ed Judge <ejudgie@gmail.com> = wrote:

I=E2=80=99ve pulled over all of the Hadoop jar files for my flume = instance to use.  I am seeing some slightly different errors now. =  Basically I have 2 identically configured hadoop instances on the = same subnet.  Running flume on those same instances and pointing = flume at the local hadoop/hdfs instance works fine and the files get = written.  However, when I point it to the adjacent hadoop/hdfs = instance I get many exceptions/errors (show below) and the files never = get written.  Here is my HDFS sink configuration on = 10.0.0.14:

# = Describe the sink
a1.sinks.k1.type =3D hdfs
a1.sinks.k1.hdfs.path =3D hdfs://10.0.0.16:9000/tmp/
a1.sinks.k1.hdfs.filePrefix =3D twitter
a1.sinks.k1.hdfs.fileSuffix =3D .ds
a1.sinks.k1.hdfs.rollInterval =3D 0
a1.sinks.k1.hdfs.rollSize =3D 10
a1.sinks.k1.hdfs.rollCount =3D 0
a1.sinks.k1.hdfs.fileType =3D DataStream
#a1.sinks.k1.serializer =3D TEXT
a1.sinks.k1.channel =3D c1

Any idea why this is not working?

Thanks.

01 Oct = 2014 01:59:45,098 INFO  = [SinkRunner-PollingRunner-DefaultSinkProcessor] = (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - = Serializer =3D TEXT, UseRawLocalFileSystem =3D false
01 Oct = 2014 01:59:45,385 INFO  = [SinkRunner-PollingRunner-DefaultSinkProcessor] = (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10= .0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
01 Oct = 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] = (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - = Processed 100 docs
01 Oct = 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] = (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - = Processed 200 docs
01 Oct = 2014 01:59:49,379 INFO  [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStre= am:1378)  - Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix = available
= at = org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987= )
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStrea= m(DFSOutputStream.java:1346)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1272)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja= va:525)
01 Oct = 2014 01:59:49,390 INFO  [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream= :1275)  - Abandoning = BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
01 Oct = 2014 01:59:49,398 INFO  [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream= :1278)  - Excluding datanode 127.0.0.1:50010
01 Oct = 2014 01:59:49,431 WARN  [Thread-7] = (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - = DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File = /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes = instead of minReplication (=3D1).  There are 1 datanode(s) running = and 1 node(s) are excluded in this operation.
= at = org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(Bl= ockManager.java:1430)
= at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSN= amesystem.java:2684)
= at = org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNode= RpcServer.java:584)
= at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslat= orPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
=
= at = org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientN= amenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
= at = org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pro= tobufRpcEngine.java:585)
= at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
= at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
= at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
= at java.security.AccessController.doPrivileged(Native = Method)
= at javax.security.auth.Subject.doAs(Subject.java:415)
= at = org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.= java:1548)
= at = org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

= at org.apache.hadoop.ipc.Client.call(Client.java:1410)
= at org.apache.hadoop.ipc.Client.call(Client.java:1363)
= at = org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.j= ava:206)
= at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
= at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method)
= at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57)
= at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43)
= at java.lang.reflect.Method.invoke(Method.java:606)
= at = org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoca= tionHandler.java:190)
= at = org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHa= ndler.java:103)
= at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
= at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBl= ock(ClientNamenodeProtocolTranslatorPB.java:361)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(D= FSOutputStream.java:1439)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1261)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja= va:525)
01 Oct = 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] = (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error = while syncing
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File = /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes = instead of minReplication (=3D1).  There are 1 datanode(s) running = and 1 node(s) are excluded in this operation.
= at = org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(Bl= ockManager.java:1430)
= at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSN= amesystem.java:2684)
= at = org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNode= RpcServer.java:584)
= at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslat= orPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
=
= at = org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientN= amenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
= at = org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pro= tobufRpcEngine.java:585)
= at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
= at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
= at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
= at java.security.AccessController.doPrivileged(Native = Method)
= at javax.security.auth.Subject.doAs(Subject.java:415)
= at = org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.= java:1548)
= at = org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

= at org.apache.hadoop.ipc.Client.call(Client.java:1410)
= at org.apache.hadoop.ipc.Client.call(Client.java:1363)
= at = org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.j= ava:206)
= at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
= at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method)
= at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57)
= at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43)
= at java.lang.reflect.Method.invoke(Method.java:606)
= at = org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoca= tionHandler.java:190)
= at = org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHa= ndler.java:103)
= at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
= at = org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBl= ock(ClientNamenodeProtocolTranslatorPB.java:361)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(D= FSOutputStream.java:1439)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(= DFSOutputStream.java:1261)
= at = org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.ja= va:525)
01 Oct = 2014 01:59:49,439 WARN  = [SinkRunner-PollingRunner-DefaultSinkProcessor] = (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO = error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File = /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes = instead of minReplication (=3D1).  There are 1 datanode(s) running = and 1 node(s) are excluded in this operation.

On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hshreedharan@cloudera.com>= ; wrote:

You'd need to add the jars that hadoop itself depends = on. Flume pulls it in if Hadoop is installed on that machine, else you'd = need to manually download it and install it. If you are using Hadoop = 2.x, install the RPM provided by Bigtop.

On Tue, Sep 30, 2014 at 12:12 PM, Ed = Judge <ejudgie@gmail.com> = wrote:
I added commons-configuration and there = is now another missing dependency.  What do you mean by =E2=80=9Call = of Hadoop=E2=80=99s dependencies=E2=80=9D?


On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hshreedharan@cloudera.com>= ; wrote:

You actually need to add of all Hadoop=E2=80=99s dependencies to = Flume classpath. Looks like Apache Commons Configuration is missing in = classpath.

Thanks,
Hari


On Tue, Sep 30, 2014 at 11:48 AM, = Ed Judge <ejudgie@gmail.com> = wrote:

Thank you.  I am using hadoop 2.5 which I think uses = protobuf-java-2.5.0.jar.

I am getting the following error even after adding those 2 jar = files to my flume-ng classpath:

30 Sep 2014 = 18:27:03,269 INFO  [lifecycleSupervisor-1-0] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61= )  - Configuration provider starting
30 Sep 2014 = 18:27:03,278 INFO  [conf-file-poller-0] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatc= herRunnable.run:133)  - Reloading configuration = file:./src.conf
30 Sep 2014 = 18:27:03,288 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,289 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:9= 30)  - Added sinks: k1 Agent: a1
30 Sep 2014 = 18:27:03,289 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,292 WARN  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - = Configuration property ignored: i# =3D Describe the sink
30 Sep 2014 = 18:27:03,292 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,292 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,293 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,293 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,293 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,293 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,293 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1= 016)  - Processing:k1
30 Sep 2014 = 18:27:03,312 INFO  [conf-file-poller-0] = (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) = - Post-validation flume configuration contains configuration for = agents: [a1]
30 Sep 2014 = 18:27:03,312 INFO  [conf-file-poller-0] = (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)&nbs= p; - Creating channels
30 Sep 2014 = 18:27:03,329 INFO  [conf-file-poller-0] = (org.apache.flume.channel.DefaultChannelFactory.create:40)  - = Creating instance of channel c1 type memory
30 Sep 2014 = 18:27:03,351 INFO  [conf-file-poller-0] = (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)&nbs= p; - Created channel c1
30 Sep 2014 = 18:27:03,352 INFO  [conf-file-poller-0] = (org.apache.flume.source.DefaultSourceFactory.create:39)  - = Creating instance of source r1, type = org.apache.flume.source.twitter.TwitterSource
30 Sep 2014 = 18:27:03,363 INFO  [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:110)  - = Consumer Key:        = 'tobhMtidckJoe1tByXDmI4pW3'
30 Sep 2014 = 18:27:03,363 INFO  [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:111)  - = Consumer Secret:     = '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
30 Sep 2014 = 18:27:03,363 INFO  [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:112)  - = Access Token:        = '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
30 Sep 2014 = 18:27:03,364 INFO  [conf-file-poller-0] = (org.apache.flume.source.twitter.TwitterSource.configure:113)  - = Access Token Secret: = 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
30 Sep 2014 = 18:27:03,825 INFO  [conf-file-poller-0] = (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating = instance of sink: k1, type: hdfs
30 Sep 2014 = 18:27:03,874 ERROR [conf-file-poller-0] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatc= herRunnable.run:145)  - Failed to start agent because dependencies = were not found in classpath. Error follows.
java.lang.NoClassDefF= oundError: org/apache/commons/configuration/Configuration
at = org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMe= tricsSystem.java:38)
at = org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(Default= MetricsSystem.java:36)
at = org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGrou= pInformation.java:106)
at = org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupIn= formation.java:208)
at = org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:5= 53)
at = org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)=
at = org.apache.flume.conf.Configurables.configure(Configurables.java:41)
=
at = org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConf= igurationProvider.java:418)
at = org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(Abstr= actConfigurationProvider.java:103)
at = org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatch= erRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at = java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces= s$301(ScheduledThreadPoolExecutor.java:178)
at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S= cheduledThreadPoolExecutor.java:293)
at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145)
at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :615)
at = java.lang.Thread.run(Thread.java:745)
Caused by: = java.lang.ClassNotFoundException: = org.apache.commons.configuration.Configuration
at = java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at = java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at = java.security.AccessController.doPrivileged(Native Method)
at = java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at = java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at = sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at = java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 17 more
30 Sep 2014 = 18:27:33,491 INFO  [agent-shutdown-hook] = (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - = Stopping lifecycle supervisor 10
30 Sep 2014 = 18:27:33,493 INFO  [agent-shutdown-hook] = (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)=   - Configuration provider stopping
[vagrant@localhost = 6]$ 

Is there another jar file I need?

Thanks.

On Sep 29, 2014, at 9:04 PM, shengyi.pan <shengyi.pan@gmail.com> = wrote:

you = need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your = flume-ng classpath, and the dependent hadoop jar version must = match your hadoop system.
 
if sink = to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" = (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is = under flume lib directory ), because the pb interface of = hdfs-2.0 is compiled wtih protobuf-2.4, while using = protobuf-2.5 the flume-ng will fail to start....
 
 
 
 
2014-09-30

shengyi.pan

=E5=8F=91=E4=BB=B6=E4=BA=BA=EF=BC=9AEd Judge <ejudgie@gmail.com>
=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4=EF=BC=9A2014-09-29 22:38
=E4=B8=BB=E9=A2=98=EF=BC=9AHD= FS sink to a remote HDFS node
=E6=94=B6=E4=BB=B6=E4=BA=BA=EF=BC=9A"user@flume.apache.org"<user@flume.apache.org>=
=E6=8A=84=E9=80=81=EF=BC=9A
 
I am trying to run the flume-ng agent on one = node with an HDFS sink pointing to an HDFS filesystem on another = node.
Is this possible?  What packages/jar files are needed on = the flume agent node for this to work?  Secondary goal is to = install only what is needed on the flume-ng node.

# Describe the = sink
a1.sinks.k1.type= =3D hdfs
a1.sinks.k1.hdfs.path= =3D hdfs://<remote IP address>/tmp/


Thanks,
Ed







= --Apple-Mail=_2AAEC26F-E641-49C3-A0FD-2825B39408B2--