flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Sanchez <vhdsh.interf...@gmail.com>
Subject Problems with flume + Hbase sink
Date Mon, 17 Jun 2013 14:02:36 GMT
Hi,

I got some problems using org.apache.flume.sink.hbase.HBaseSink I also
tried org.apache.flume.sink.hbase.AsyncHBaseSink but no success.

I'm running on:

Flume NG 1.3.0-cdh4.3.0 CDH4
Hadoop 2.0.0-cdh4.3.0 CDH4
HBase 0.94.6-cdh4.3.0 CDH4
Zookeeper 3.4.5-cdh4.3.0 CDH4
Cloudera Manager Management Daemons 4.5.0 Not applicable

1. I ran into https://issues.cloudera.org/browse/DISTRO-438. Then I used
the work around "Remove or rename zoo.cfg file from /etc/zookeeper/conf."

2. Hbase seems properly configure. I can manually do a put in the table,
but no success while using the sink.

hbase(main):004:0> put 'test_mEEsures','test_row1','M:cM1','test_value1'
0 row(s) in 0.0770 seconds

hbase(main):007:0> scan 'test_mEEsures'
ROW                             COLUMN+CELL
 test_row1                      column=M:cM1, timestamp=1370965687758,
value=test_value1

hbase(main):003:0> describe 'test_mEEsures'
DESCRIPTION
  ENABLED
 {NAME => 'test_mEEsures', FAMILIES => [{NAME => 'M', DATA_BLOCK_ENCODING
=>  true
 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3',
CO
 MPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
KEEP_DELETED_
 CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false',
ENCODE_ON_DISK
  => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.1020 seconds

3. I also had working examples of flume writing to hdfs, so I know that
source and channels are properly configured.


4. The problem seems to be in the HBase sink. When I send a message using
NC just to test I see in flume logs that "something is been created" but
when I check on Hbase there is no record of it. I checked logs on flume and
on hbase but I don't see where I'm missing something.

Any tip will be more than welcome!


Here is the sink part of the flume config I'm using:

mEEsuresAgent.sinks.SinkToHBase.channel       = MemoryChannel
mEEsuresAgent.sinks.SinkToHBase.type          =
org.apache.flume.sink.hbase.HBaseSink
mEEsuresAgent.sinks.SinkToHBase.table         = test_mEEsures
mEEsuresAgent.sinks.SinkToHBase.columnFamily  = M
mEEsuresAgent.sinks.SinkToHBase.column        = cM1
mEEsuresAgent.sinks.SinkToHBase.serializer    =
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
mEEsuresAgent.sinks.SinkToHBase.batchSize     = 1

This is part of the logs from flume (please check the last line)
6:41:15.533 PM INFO org.apache.flume.node.AbstractConfigurationProvider
Created channel MemoryChannel
6:41:15.534 PM INFO org.apache.flume.source.DefaultSourceFactory
Creating instance of source mEEsuresSRC1, type syslogudp
6:41:15.570 PM INFO org.apache.flume.sink.DefaultSinkFactory
Creating instance of sink: SinkToHBase, type:
org.apache.flume.sink.hbase.HBaseSink
6:41:15.972 PM INFO org.apache.flume.sink.hbase.HBaseSink
The write to WAL option is set to: true
6:41:15.974 PM INFO org.apache.flume.node.AbstractConfigurationProvider
Channel MemoryChannel connected to [mEEsuresSRC1, SinkToHBase]
6:41:15.981 PM INFO org.apache.flume.node.Application
Starting new configuration:{
sourceRunners:{mEEsuresSRC1=EventDrivenSourceRunner: {
source:org.apache.flume.source.SyslogUDPSource{name:mEEsuresSRC1,state:IDLE}
}} sinkRunners:{SinkToHBase=SinkRunner: {
policy:org.apache.flume.sink.DefaultSinkProcessor@2af081 counterGroup:{
name:null counters:{} } }}
channels:{MemoryChannel=org.apache.flume.channel.MemoryChannel{name:
MemoryChannel}} }
6:41:15.985 PM INFO org.apache.flume.node.Application
Starting Channel MemoryChannel
6:41:15.986 PM INFO org.apache.flume.node.Application
Waiting for channel: MemoryChannel to start. Sleeping for 500 ms
6:41:16.031 PM INFO org.apache.flume.instrumentation.MonitoredCounterGroup
Monitoried counter group for type: CHANNEL, name: MemoryChannel, registered
successfully.
6:41:16.031 PM INFO org.apache.flume.instrumentation.MonitoredCounterGroup
Component type: CHANNEL, name: MemoryChannel started
6:41:16.486 PM INFO org.apache.flume.node.Application
Starting Sink SinkToHBase
6:41:16.486 PM INFO org.apache.flume.node.Application
Starting Source mEEsuresSRC1
6:41:16.577 PM INFO org.mortbay.log
Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
6:41:16.856 PM INFO org.mortbay.log
jetty-6.1.26
6:41:16.914 PM INFO org.mortbay.log
Started SocketConnector@0.0.0.0:41414
6:41:18.581 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:zookeeper.version=3.4.5-cdh4.3.0--1, built on 05/20/2013
20:55 GMT
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:host.name=myhadoop.cluster
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.version=1.6.0_31
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.vendor=Sun Microsystems Inc.
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.home=/usr/java/jdk1.6.0_31/jre
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.class.path=/var/run/cloudera-scm-agent/ ... (lots
of stuff)
6:41:18.583 PM INFO org.apache.zookeeper.ZooKeeper
Client
environment:java.library.path=:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase/bin/../lib/native/Linux-amd64-64
6:41:18.583 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.io.tmpdir=/tmp
6:41:18.583 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.compiler=<NA>
6:41:18.584 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:os.name=Linux
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:os.arch=amd64
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:os.version=2.6.32-279.14.1.el6.x86_64
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:user.name=flume
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:user.home=/var/lib/flume-ng
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client
environment:user.dir=/var/run/cloudera-scm-agent/process/3234-flume-AGENT
6:41:18.600 PM INFO org.apache.zookeeper.ZooKeeper
Initiating client connection, connectString=myhadoop.cluster:2181
sessionTimeout=60000 watcher=hconnection
6:41:18.791 PM INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
The identifier of this process is 6845@myhadoop.cluster
6:41:18.836 PM INFO org.apache.zookeeper.ClientCnxn
Opening socket connection to server myhadoop.cluster/11.52.6.180:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
6:41:18.856 PM INFO org.apache.zookeeper.ClientCnxn
Socket connection established to myhadoop.cluster/11.52.6.180:2181,
initiating session
6:41:18.881 PM INFO org.apache.zookeeper.ClientCnxn
Session establishment complete on server myhadoop.cluster/11.52.6.180:2181,
sessionid = 0x13f0a901280001a, negotiated timeout = 60000
6:41:19.575 PM WARN org.apache.hadoop.conf.Configuration
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
6:42:17.288 PM WARN org.apache.flume.source.SyslogUtils
Event created from Invalid Syslog data.

Mime
View raw message