> You mentioned the collector needs to have direct connection to all region > servers and master.. > Could you help me how I can do that. If you're deploying the nodes at ec2, you don't really need to worry about it. The ec2 default security group should allow full connection between all nodes which you bring up. The thing I was talking about is to avoid firewall/vlan blocking ports between your flume collector and hbase region servers (which occurs in our data center for cluster isolation). Again, I don't think you need to worry about it right now. > I already have hbase and flume master as well as flume collector all are > different machine I need to link all these together but don't know how... Here is a simple step-by-step example. 1. flume collector: make sure it can access hbase 2. and create a hbaes table, here are some shell scripts: $ cat > /tmp/test/list < list > exit > EOF $ hbase shell /tmp/test/list TABLE 0 row(s) in 0.5820 seconds $ cat > /tmp/test/table < create 't1', 'c1','c2' > exit > EOF $ hbase shell /tmp/test/table 0 row(s) in 1.7270 seconds $ hbase shell /tmp/test/list TABLE t1 1 row(s) in 0.4770 seconds 3. copy hbase-site.xml to /usr/lib/flume/conf/ (assume you're using cdh flume) 4. copy hbase.jar to /usr/lib/flume/lib/ 5. try flume collector to write to hbase $ /usr/lib/flume/bin/flume node_nowatch -1 -s -n n1 \ -c 'n1: tail("/tmp/test/list") | hbase ("t1", "%s", "c1", "", "%S", "c2", "", "%{body}");' 6. scan hbase $ cat > /tmp/test/scan < scan 't1' > exit > EOF $ bin/hbase shell /tmp/test/scan ROW COLUMN+CELL 1317934401 column=c1:, timestamp=1317934402446, value=21 1317934401 column=c2:, timestamp=1317934402446, value=list 1317934402 column=c1:, timestamp=1317934402451, value=22 1317934402 column=c2:, timestamp=1317934402451, value=exit 2 row(s) in 0.5540 seconds After running it, you can make sure your collector can talk to hbase. And you can use your master to configure the collector. Please follow Otis's link for detail info. On 10/06/2011 06:34 AM, Saritha Ravi wrote: > Hi Mingjie, > Good Morning... > Thanks for your response. > I would like use default hbase sink I downloaded from > https://github.com/cloudera/flume/tree/master/plugins/(placed the > hbase-sink.jar in flume-home/lib and updated the flume-site.xml) > You mentioned the collector needs to have direct connection to all region > servers and master.. > Could you help me how I can do that. > > I already have hbase and flume master as well as flume collector all are > different machine I need to link all these together but don't know how... > > Thanks, > Saritha. > > On 10/6/11 2:37 AM, "Mingjie Lai" wrote: > >>> Does flume collector and and hbase master should be in the same >> cluster. >> >> In your case, the flume collector will be writing data to hbase as a >> regular hbase client. So it needs to access hbase thru either, 1) hbase >> java api, or 2) hbase rest/thrift gateway. If you want to use the >> default hbase sink (which uses java api), the collector need to have >> direct connection to all region servers and master. >> >> On the other hand, you can also build your own new hbase REST/thrift >> sink. And in this case, the collector only needs to talk to the REST >> gateway. >> >>> Can anyone suggest me >>> the basic steps how I can configure these two in ec2 cloud. >> >> I don't quite understand your question. Sounds like you've already had >> hbase, then you can just have some extra machines for flume nodes, >> master, etc. >> >> -mingjie >> >> On 10/05/2011 07:48 PM, Saritha Ravi wrote: >>> Hi All, >>> >>> I need to configure Flume with hbase in cloud. Could anyone help me with >>> this. Is there a better documentation. >>> Does flume collector and and hbase master should be in the same cluster. >>> I was able to configure hbase(master ,Zookeeper,region server using >>> WHIRR) and flume(Master , collector) from CDH3. Can anyone suggest me >>> the basic steps how I can configure these two in ec2 cloud. >>> >>> Thanks, >>> Saritha. >>> >>> * >>> * >> >