flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjie Lai <mjla...@gmail.com>
Subject Re: Flume-HBase
Date Thu, 06 Oct 2011 21:03:10 GMT
 > You mentioned the collector needs to have direct connection to all region
 > servers and master..
 > Could you help me how I can do that.

If you're deploying the nodes at ec2, you don't really need to worry 
about it. The ec2 default security group should allow full connection 
between all nodes which you bring up.

The thing I was talking about is to avoid firewall/vlan blocking ports 
between your flume collector and hbase region servers (which occurs in 
our data center for cluster isolation). Again, I don't think you need to 
worry about it right now.

 > I already have hbase and flume master as well as flume collector all are
 > different machine I need to link all these together but don't know how...

Here is a simple step-by-step example.

1. flume collector: make sure it can access hbase

2. and create a hbaes table, here are some shell scripts:
$ cat > /tmp/test/list <<EOF
 > list
 > exit
 > EOF
$ hbase shell /tmp/test/list
TABLE 

0 row(s) in 0.5820 seconds

$ cat > /tmp/test/table <<EOF
 > create 't1', 'c1','c2'
 > exit
 > EOF

$ hbase shell /tmp/test/table
0 row(s) in 1.7270 seconds

$ hbase shell /tmp/test/list
TABLE 

t1 

1 row(s) in 0.4770 seconds

3. copy hbase-site.xml to /usr/lib/flume/conf/ (assume you're using cdh 
flume)

4. copy hbase.jar to /usr/lib/flume/lib/

5. try flume collector to write to hbase
$ /usr/lib/flume/bin/flume node_nowatch -1 -s -n n1 \
-c 'n1: tail("/tmp/test/list") | hbase ("t1", "%s", "c1", "", "%S", 
"c2", "", "%{body}");'

6. scan hbase
$ cat > /tmp/test/scan <<EOF
 > scan 't1'
 > exit
 > EOF
$ bin/hbase shell /tmp/test/scan
ROW                           COLUMN+CELL
  1317934401                   column=c1:, timestamp=1317934402446, value=21
  1317934401                   column=c2:, timestamp=1317934402446, 
value=list
  1317934402                   column=c1:, timestamp=1317934402451, value=22
  1317934402                   column=c2:, timestamp=1317934402451, 
value=exit
2 row(s) in 0.5540 seconds

After running it, you can make sure your collector can talk to hbase. 
And you can use your master to configure the collector. Please follow 
Otis's link for detail info.


On 10/06/2011 06:34 AM, Saritha Ravi wrote:
> Hi Mingjie,
> Good Morning...
> Thanks  for your response.
> I would like use default hbase sink I downloaded from
> https://github.com/cloudera/flume/tree/master/plugins/(placed the
> hbase-sink.jar in flume-home/lib and updated the flume-site.xml)
> You mentioned the collector needs to have direct connection to all region
> servers and master..
> Could you help me how I can do that.
>
> I already have hbase and flume master as well as flume collector all are
> different machine I need to link all these together but don't know how...
>
> Thanks,
> Saritha.
>
> On 10/6/11 2:37 AM, "Mingjie Lai"<mjlai09@gmail.com>  wrote:
>
>>> Does flume collector and and hbase master should be in the same
>> cluster.
>>
>> In your case, the flume collector will be writing data to hbase as a
>> regular hbase client. So it needs to access hbase thru either, 1) hbase
>> java api, or 2) hbase rest/thrift gateway. If you want to use the
>> default hbase sink (which uses java api), the collector need to have
>> direct connection to all region servers and master.
>>
>> On the other hand, you can also build your own new hbase REST/thrift
>> sink. And in this case, the collector only needs to talk to the REST
>> gateway.
>>
>>> Can anyone suggest me
>>> the basic steps how I can configure these two in ec2 cloud.
>>
>> I don't quite understand your question. Sounds like you've already had
>> hbase, then you can just have some extra machines for flume nodes,
>> master, etc.
>>
>> -mingjie
>>
>> On 10/05/2011 07:48 PM, Saritha Ravi wrote:
>>> Hi All,
>>>
>>> I need to configure Flume with hbase in cloud. Could anyone help me with
>>> this. Is there a better documentation.
>>> Does flume collector and and hbase master should be in the same cluster.
>>> I was able to configure hbase(master ,Zookeeper,region server using
>>> WHIRR) and flume(Master , collector) from CDH3. Can anyone suggest me
>>> the basic steps how I can configure these two in ec2 cloud.
>>>
>>> Thanks,
>>> Saritha.
>>>
>>> *
>>> *
>>
>

Mime
View raw message