phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Azarov, Vadim" <Vadim.Aza...@teoco.com>
Subject RE: CsvBulkLoadTool error with Phoenix 4.0
Date Tue, 12 Aug 2014 05:18:47 GMT
Hi Gabriel,
Thanks for the thorough reply! :)

Re. the permissions - I've actually tried the -Dfs.permissions... parameter, but it did nothing.
After some digging, it seems it is ignored during the mapReduce job (found some references
to it not working on a cluster).
What did work eventually was indeed setting this ...umask parameter on the HDFS.

Do you think it would be fixed sometime soon?

Re. running with Cloudera - only after recompiling the sources (only the bulk-load-related
ones) it worked.

Is it planned to release a "ready to use" Phoenix version for the latest Cloudera/Hadoop?


Thank you,
Vadim

-----Original Message-----
From: Gabriel Reid [mailto:gabriel.reid@gmail.com] 
Sent: Monday, August 11, 2014 6:55 PM
To: user@phoenix.apache.org
Subject: Re: CsvBulkLoadTool error with Phoenix 4.0

Hi Vadim,

It looks like this is due to a file permissions issue. The bulk import tool creates HFiles
in a temporary directory, and these files then get moved into HBase. It's during this moving
that things are going wrong here.

The easiest (and worst) way of getting around this is by simply disabling permissions on HDFS.
It's obviously an easy way to do things, but has an obvious major drawback.

I think that a couple of other options are:
* you could run the import job as the hbase user, i.e. sudo -u hbase hadoop jar ....
* set the dfs.umask to a more permissive umask when running the import job. It might be possible
to just do this when starting up the job itself, with "hadoop jar phoenix-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool
-Dfs.permissions.umask-mode=0000 --table ..."

I actually haven't tried any of these yet (I'm currently running on a cluster without permissions
enabled), but the general idea is to make sure that the files and directories being created
by the bulk import tool can be read and written by the hbase user.

BTW, could you let me know what you needed to do in the end to get the bulk import to run
on CDH 5.1?

- Gabriel

On Mon, Aug 11, 2014 at 12:04 AM, Azarov, Vadim <Vadim.Azarov@teoco.com> wrote:
> Hi Gabriel,
> After a lot of playing around with the classpath, and after rebuilding 
> the Phoenix source, the CSVBulkLoading finally began, but during the stage where it should
copy the tmp HFiles into HBase, there is the error attached below.
>
> Running it from a CDH 5.1 VM, with the cloudera user.
> Tried several suggestions from various forums - no effect.
>
> Is there something that should be configured before running the job?
>
> Thank you!
> Vadim
>
> Sun Aug 10 14:54:45 PDT 2014, org.apache.hadoop.hbase.client.RpcRetryingCaller@5c994959,
java.io.IOException: java.io.IOException: Exception in rename
>         at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.rename(HRegionFileSystem.java:952)
>         at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitStoreFile(HRegionFileSystem.java:352)
>         at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.bulkLoadStoreFile(HRegionFileSystem.java:426)
>         at org.apache.hadoop.hbase.regionserver.HStore.bulkLoadHFile(HStore.java:666)
>         at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3621)
>         at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3527)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFile(HRegionServer.java:3262)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29499)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase,
access=WRITE, inode="/tmp/df84b3c1-1b02-446c-bee7-e2776bdd9e8c/M":cloudera:supergroup:drwxr-xr-x
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:182)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5584)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:3272)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3242)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3210)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:682)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:523)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
>         at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1636)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:532)
>         at org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214)
>         at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.rename(HRegionFileSystem.java:944)
>         ... 13 more
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=hbase, access=WRITE, inode="/tmp/df84b3c1-1b02-446c-bee7-e2776bdd9e8c/M":cloudera:supergroup:drwxr-xr-x
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:182)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5584)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:3272)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3242)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3210)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:682)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:523)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy16.rename(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor126.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy16.rename(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:431)
>         at sun.reflect.GeneratedMethodAccessor125.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
>         at com.sun.proxy.$Proxy17.rename(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1634)
>         ...
>
> -----Original Message-----
> From: Gabriel Reid [mailto:gabriel.reid@gmail.com]
> Sent: Thursday, August 07, 2014 10:11 PM
> To: user@phoenix.apache.org
> Subject: Re: CsvBulkLoadTool error with Phoenix 4.0
>
> Hi Vadim,
>
> Sorry for the long delay on this.
>
> Just to be sure, can you confirm that you're using the hadoop-2 build of Phoenix 4.0
on the client when starting up the CsvBulkLoadTool?
>
> Even if you are, this may actually require a rebuild of Phoenix using CDH 5.1.0 dependencies.
>
> Could you post the full stack trace that you're getting?
>
> - Gabriel
>
> On Mon, Aug 4, 2014 at 11:08 AM, Azarov, Vadim <Vadim.Azarov@teoco.com> wrote:
>> Hi,
>>
>> I'm getting this error when trying to use the sample bulk loading 
>> with MapReduce via CsvBulkLoadTool -
>>
>> java.lang.NoSuchMethodError:
>> org.apache.hadoop.net.NetUtils.getInputStream
>>
>>
>>
>> I'm using Phoenix 4.0, HBase 0.98.1 , Hadoop 2.3.0 and Clouder CDH
>> 5.1.0
>>
>>
>>
>> I saw that other encountered this problem with older and possibly 
>> mismatching versions –
>>
>> http://stackoverflow.com/questions/15363490/cascading-hbase-tap
>>
>>
>>
>> but thought that the latest ones should work ok.
>>
>>
>>
>> Could you suggest what seems to be the problem?
>>
>>
>>
>> Thank you,
>>
>> Vadim Azarov
>>
>>
>>
>> Information in this e-mail and its attachments is confidential and 
>> privileged under the TEOCO confidentiality terms that can be reviewed here.
> Information in this e-mail and its attachments is confidential and privileged under the
TEOCO confidentiality terms that can be reviewed here<http://www.teoco.com/email-disclaimer>.
Mime
View raw message