phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Help With CSVBulkLoadTool
Date Fri, 23 Oct 2015 14:37:30 GMT
Could you check how many regions the index table has? Or is it
enabled? Based on the error message you're getting, it looks like the
table doesn't have any regions, although I don't know if/how that's
possible.

- Gabriel

On Fri, Oct 23, 2015 at 3:49 PM, Riesland, Zack
<Zack.Riesland@sensus.com> wrote:
> Thanks Gabriel,
>
>
> Below is a redacted stack trace snippet.
>
> I think you're on to something: The "error" line references a "table" but actually mentions
an index:
>
> 15/10/23 06:00:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=<Index name>
failed due to exception:java.lang.IllegalArgumentException: No regions passed
>
> The index/table in question DOES appear to exist.
>
> Any thoughts?
>
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Opening socket connection to server <server
name>.<domain>/x.y.z.65:2181. Will not attempt to authenticate using SASL (unknown
error)
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Socket connection established to <server
name>.<domain>/x.y.z.65:2181, initiating session
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Session establishment complete on server
<server name>.<domain>/x.y.z.65:2181, sessionid = 0x350672b547f9b44, negotiated
timeout = 40000
> 15/10/23 06:00:56 INFO client.ConnectionManager$HConnectionImplementation: Closing master
protocol: MasterService
> 15/10/23 06:00:56 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper
sessionid=0x350672b547f9b44
> 15/10/23 06:00:56 INFO zookeeper.ZooKeeper: Session: 0x350672b547f9b44 closed
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: EventThread shut down
> 15/10/23 06:00:56 INFO query.ConnectionQueryServicesImpl: Found quorum: <server name>.<domain>:2181,<SERVER
NAME>.<domain>:2181,<SERVER>-001.<domain>:2181
> 15/10/23 06:00:56 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=<server
name>.<domain>:2181,<SERVER NAME>.<domain>:2181,<SERVER>-001.<domain>:2181
sessionTimeout=120000 watcher=hconnection-0x40d2382e, quorum=<server name>.<domain>:2181,<SERVER
NAME>.<domain>:2181,<SERVER>-001.<domain>:2181, baseZNode=/hbase
> 15/10/23 06:00:56 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x40d2382e
connecting to ZooKeeper ensemble=<server name>.<domain>:2181,<SERVER NAME>.<domain>:2181,<SERVER>-001.<domain>:2181
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Opening socket connection to server <SERVER
NAME>.<domain>/x.y.z.64:2181. Will not attempt to authenticate using SASL (unknown
error)
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Socket connection established to <SERVER
NAME>.<domain>/x.y.z.64:2181, initiating session
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Session establishment complete on server
<SERVER NAME>.<domain>/x.y.z.64:2181, sessionid = 0x250672b527a97bc, negotiated
timeout = 40000
> 15/10/23 06:00:56 INFO client.ConnectionManager$HConnectionImplementation: Closing master
protocol: MasterService
> 15/10/23 06:00:56 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper
sessionid=0x250672b527a97bc
> 15/10/23 06:00:56 INFO zookeeper.ZooKeeper: Session: 0x250672b527a97bc closed
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: EventThread shut down
> 15/10/23 06:00:56 WARN hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
> 15/10/23 06:00:56 INFO mapreduce.CsvBulkLoadTool: Configuring ZK quorum to <SERVER
NAME>.<domain>,<server name>.<domain>,<SERVER>-001.<domain>
> 15/10/23 06:00:56 WARN hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
> 15/10/23 06:00:56 INFO query.ConnectionQueryServicesImpl: Table <Table Name> has
been added into latestMetaData
> 15/10/23 06:00:56 INFO mapreduce.CsvBulkLoadTool: Configuring HFile output path to /tmp/caa620d5-f3fe-4523-a04e-51cf163f690d/<Table
Name>
> 15/10/23 06:00:56 INFO mapreduce.CsvBulkLoadTool: Configuring HFile output path to /tmp/caa620d5-f3fe-4523-a04e-51cf163f690d/<Index
name>
> 15/10/23 06:00:56 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=<server
name>.<domain>:2181,<SERVER NAME>.<domain>:2181,<SERVER>-001.<domain>:2181
sessionTimeout=120000 watcher=hconnection-0x33f3af12, quorum=<server name>.<domain>:2181,<SERVER
NAME>.<domain>:2181,<SERVER>-001.<domain>:2181, baseZNode=/hbase
> 15/10/23 06:00:56 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x33f3af12
connecting to ZooKeeper ensemble=<server name>.<domain>:2181,<SERVER NAME>.<domain>:2181,<SERVER>-001.<domain>:2181
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Opening socket connection to server <server
name>.<domain>/x.y.z.65:2181. Will not attempt to authenticate using SASL (unknown
error)
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Socket connection established to <server
name>.<domain>/x.y.z.65:2181, initiating session
> 15/10/23 06:00:56 INFO zookeeper.ClientCnxn: Session establishment complete on server
<server name>.<domain>/x.y.z.65:2181, sessionid = 0x350672b547f9b45, negotiated
timeout = 40000
> 15/10/23 06:00:56 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table
<Table Name>
> 15/10/23 06:00:56 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table
<Index name>
> 15/10/23 06:00:56 WARN hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
> 15/10/23 06:00:56 WARN hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
> 15/10/23 06:00:56 INFO mapreduce.HFileOutputFormat2: Configuring 0 reduce partitions
to match current region count
> 15/10/23 06:00:56 INFO mapreduce.HFileOutputFormat2: Writing partition information to
/tmp/partitions_2d12d0c0-17a0-46b6-9f0c-147f6d817ecb
> 15/10/23 06:00:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=<Index name>
failed due to exception:java.lang.IllegalArgumentException: No regions passed
> 15/10/23 06:00:56 INFO mapreduce.HFileOutputFormat2: Configuring 513 reduce partitions
to match current region count
> 15/10/23 06:00:56 INFO mapreduce.HFileOutputFormat2: Writing partition information to
/tmp/partitions_754b7c35-ab5d-40b2-bb21-dc324689f4ff
> 15/10/23 06:00:56 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib
library
> 15/10/23 06:00:56 INFO compress.CodecPool: Got brand-new compressor [.deflate]
> 15/10/23 06:00:57 INFO mapreduce.HFileOutputFormat2: Incremental table <Table Name>
output configured.
> 15/10/23 06:00:57 INFO mapreduce.CsvBulkLoadTool: Running MapReduce import job from /user/hdfs/er_v12_staging_1420070400-1445644799
to /tmp/caa620d5-f3fe-4523-a04e-51cf163f690d/<Table Name>
> 15/10/23 06:00:57 INFO impl.TimelineClientImpl: Timeline service address: http://<server
name>.<domain>:8188/ws/v1/timeline/
> 15/10/23 06:00:57 INFO client.RMProxy: Connecting to ResourceManager at <server name>.<domain>/x.y.z.65:8050
> 15/10/23 06:00:58 INFO input.FileInputFormat: Total input paths to process : 97
> 15/10/23 06:00:58 INFO mapreduce.JobSubmitter: number of splits:97
> 15/10/23 06:00:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445519089673_0106
> 15/10/23 06:00:58 INFO impl.YarnClientImpl: Submitted application application_1445519089673_0106
> 15/10/23 06:00:58 INFO mapreduce.Job: The url to track the job: http://<server name>.<domain>:8088/proxy/application_1445519089673_0106/
> 15/10/23 06:00:58 INFO mapreduce.Job: Running job: job_1445519089673_0106
> 15/10/23 06:01:06 INFO mapreduce.Job: Job job_1445519089673_0106 running in uber mode
: false
> 15/10/23 06:01:06 INFO mapreduce.Job:  map 0% reduce 0%
> 15/10/23 06:01:25 INFO mapreduce.Job:  map 1% reduce 0%
> 15/10/23 06:01:26 INFO mapreduce.Job:  map 3% reduce 0%
> 15/10/23 06:01:27 INFO mapreduce.Job:  map 5% reduce 0%
> 15/10/23 06:01:28 INFO mapreduce.Job:  map 8% reduce 0%
> 15/10/23 06:01:29 INFO mapreduce.Job:  map 11% reduce 0%
> 15/10/23 06:01:31 INFO mapreduce.Job:  map 12% reduce 0%
> 15/10/23 06:01:33 INFO mapreduce.Job:  map 13% reduce 0%
> 15/10/23 06:01:34 INFO mapreduce.Job:  map 14% reduce 0%
> 15/10/23 06:01:35 INFO mapreduce.Job:  map 16% reduce 0%
> 15/10/23 06:01:36 INFO mapreduce.Job:  map 19% reduce 0%
> 15/10/23 06:01:37 INFO mapreduce.Job:  map 22% reduce 0%
> 15/10/23 06:01:38 INFO mapreduce.Job:  map 30% reduce 0%
> 15/10/23 06:01:39 INFO mapreduce.Job:  map 36% reduce 0%
> 15/10/23 06:01:40 INFO mapreduce.Job:  map 45% reduce 0%
> 15/10/23 06:01:41 INFO mapreduce.Job:  map 52% reduce 0%
> 15/10/23 06:01:42 INFO mapreduce.Job:  map 61% reduce 0%
> 15/10/23 06:01:43 INFO mapreduce.Job:  map 72% reduce 1%
> 15/10/23 06:01:44 INFO mapreduce.Job:  map 80% reduce 1%
> 15/10/23 06:01:45 INFO mapreduce.Job:  map 86% reduce 1%
> 15/10/23 06:01:46 INFO mapreduce.Job:  map 92% reduce 2%
> 15/10/23 06:01:48 INFO mapreduce.Job:  map 96% reduce 2%
> 15/10/23 06:01:49 INFO mapreduce.Job:  map 99% reduce 3%
> 15/10/23 06:01:50 INFO mapreduce.Job:  map 99% reduce 4%
> 15/10/23 06:01:51 INFO mapreduce.Job:  map 100% reduce 5%
> 15/10/23 06:01:52 INFO mapreduce.Job:  map 100% reduce 6%
> 15/10/23 06:01:53 INFO mapreduce.Job:  map 100% reduce 9%
> 15/10/23 06:01:54 INFO mapreduce.Job:  map 100% reduce 12%
> 15/10/23 06:01:55 INFO mapreduce.Job:  map 100% reduce 15%
> 15/10/23 06:01:56 INFO mapreduce.Job:  map 100% reduce 19%
> 15/10/23 06:01:57 INFO mapreduce.Job:  map 100% reduce 24%
> 15/10/23 06:01:58 INFO mapreduce.Job:  map 100% reduce 30%
> 15/10/23 06:01:59 INFO mapreduce.Job:  map 100% reduce 34%
> 15/10/23 06:02:00 INFO mapreduce.Job:  map 100% reduce 37%
> 15/10/23 06:02:01 INFO mapreduce.Job:  map 100% reduce 39%
> 15/10/23 06:02:02 INFO mapreduce.Job:  map 100% reduce 40%
> 15/10/23 06:02:03 INFO mapreduce.Job:  map 100% reduce 42%
> 15/10/23 06:02:04 INFO mapreduce.Job:  map 100% reduce 45%
> 15/10/23 06:02:05 INFO mapreduce.Job:  map 100% reduce 47%
> 15/10/23 06:02:06 INFO mapreduce.Job:  map 100% reduce 50%
> 15/10/23 06:02:07 INFO mapreduce.Job:  map 100% reduce 53%
> 15/10/23 06:02:08 INFO mapreduce.Job:  map 100% reduce 56%
> 15/10/23 06:02:09 INFO mapreduce.Job:  map 100% reduce 62%
> 15/10/23 06:02:10 INFO mapreduce.Job:  map 100% reduce 70%
> 15/10/23 06:02:11 INFO mapreduce.Job:  map 100% reduce 76%
> 15/10/23 06:02:12 INFO mapreduce.Job:  map 100% reduce 80%
> 15/10/23 06:02:13 INFO mapreduce.Job:  map 100% reduce 85%
> 15/10/23 06:02:14 INFO mapreduce.Job:  map 100% reduce 91%
> 15/10/23 06:02:15 INFO mapreduce.Job:  map 100% reduce 97%
> 15/10/23 06:02:16 INFO mapreduce.Job:  map 100% reduce 100%
> 15/10/23 06:02:19 INFO mapreduce.Job: Job job_1445519089673_0106 completed successfully
> 15/10/23 06:02:19 INFO mapreduce.Job: Counters: 51
>         File System Counters
>                 FILE: Number of bytes read=6255686056
>                 FILE: Number of bytes written=12606758779
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=614122608
>                 HDFS: Number of bytes written=78045672
>                 HDFS: Number of read operations=2886
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=1416
>         Job Counters
>                 Launched map tasks=97
>                 Launched reduce tasks=513
>                 Data-local map tasks=84
>                 Rack-local map tasks=13
>                 Total time spent by all maps in occupied slots (ms)=3350846
>                 Total time spent by all reduces in occupied slots (ms)=6020424
>                 Total time spent by all map tasks (ms)=3350846
>                 Total time spent by all reduce tasks (ms)=6020424
>                 Total vcore-seconds taken by all map tasks=3350846
>                 Total vcore-seconds taken by all reduce tasks=6020424
>                 Total megabyte-seconds taken by all map tasks=24018864128
>                 Total megabyte-seconds taken by all reduce tasks=43154399232
>         Map-Reduce Framework
>                 Map input records=4957008
>                 Map output records=54527088
>                 Map output bytes=6146628802
>                 Map output materialized bytes=6255981544
>                 Input split bytes=12901
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=4957008
>                 Reduce shuffle bytes=6255981544
>                 Reduce input records=54527088
>                 Reduce output records=54527088
>                 Spilled Records=109054176
>                 Shuffled Maps =49761
>                 Failed Shuffles=0
>                 Merged Map outputs=49761
>                 GC time elapsed (ms)=432301
>                 CPU time spent (ms)=6820780
>                 Physical memory (bytes) snapshot=420588826624
>                 Virtual memory (bytes) snapshot=4303403663360
>                 Total committed heap usage (bytes)=1420226461696
>         Phoenix MapReduce Import
>                 Upserts Done=4957008
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=612651700
>         File Output Format Counters
>                 Bytes Written=78045672
> 15/10/23 06:02:19 INFO mapreduce.CsvBulkLoadTool: Loading HFiles from /tmp/caa620d5-f3fe-4523-a04e-51cf163f690d/<Table
Name>
> 15/10/23 06:02:19 WARN hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
> 15/10/23 06:02:19 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://surus/tmp/caa620d5-f3fe-4523-a04e-51cf163f690d/<Table
Name>/_SUCCESS
> 15/10/23 06:02:19 WARN hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
> 15/10/23 06:02:19 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
> 15/10/23 06:02:19 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
> 15/10/23 06:02:19 INFO compress.CodecPool: Got brand-new decompressor [.gz]
> 15/10/23 06:02:19 INFO compress.CodecPool: Got brand-new decompressor [.gz]
> 15/10/23 06:02:19 INFO compress.CodecPool: Got brand-new decompressor [.gz]
> 15/10/23 06:02:19 INFO compress.CodecPool: Got brand-new decompressor [.gz]
> 15/10/23 06:02:19 INFO compress.CodecPool: Got brand-new decompressor [.gz]
> 15/10/23 06:02:19 INFO compress.CodecPool: Got brand-new decompressor [.gz]
> 15/10/23 06:02:19 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://surus/tmp/caa620d5-f3fe-4523-a04e-51cf163f690d/<Table
Name>/0/0078777f27b24fbda6bb1db054e5f915 first=268-CSTX_B77653169_80221136\x00\xD5[z  last=268-CSTX_B78594909_82187464\x00\xD6')\xE0
>
>
>
> -----Original Message-----
> From: Gabriel Reid [mailto:gabriel.reid@gmail.com]
> Sent: Friday, October 23, 2015 9:39 AM
> To: user@phoenix.apache.org
> Subject: Re: Help With CSVBulkLoadTool
>
> Do you have a stack trace from the log output from when you got this error?
>
> And could you tell me if the table name that is being complained about there is an index
table name?
>
> Tracing through the code, it looks like you could get this exception if an index table
doesn't exist (or somehow isn't available), which would explain how the data would be getting
into your main table.
>
> - Gabriel
>
>
> On Fri, Oct 23, 2015 at 11:18 AM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:
>> Thanks Gabriel,
>>
>> From what I can see in the logs, it happens consistently on most if not all tables
that we import to.
>>
>> However,  it does not appear to actually prevent the data from getting to the table.
>>
>> When I first raised this question, I noticed missing data and saw the error, but
I have since found another error (bad data of the wrong type) that I think was the root cause
of my job actually failing.
>>
>> I certainly never noticed this before, though.
>>
>> The main things that we have changed since these scripts worked cleanly were upgrading
our stack and adding new region servers.
>>
>> Does that help at all?
>>
>> -----Original Message-----
>> From: Gabriel Reid [mailto:gabriel.reid@gmail.com]
>> Sent: Friday, October 23, 2015 1:19 AM
>> To: user@phoenix.apache.org
>> Subject: Re: Help With CSVBulkLoadTool
>>
>> Hi Zack,
>>
>> I can't give you any information about compatibility of a given Phoenix version with
a given version of HDP (because I don't know).
>>
>> However, could you give a bit more info on what you're seeing? Are all import jobs
failing with this error for a given set of tables? Or is this a random failure that can happen
on any table?
>>
>> This error looks to me like it would be some kind of configuration issue with your
cluster(s), but if that's the case then I would expect that you'd be getting the same error
every time.
>>
>> - Gabriel
>>
>> On Wed, Oct 21, 2015 at 2:42 PM, Riesland, Zack <Zack.Riesland@sensus.com>
wrote:
>>> Hello,
>>>
>>>
>>>
>>> We recently upgraded our Hadoop stack from HDP 2.2.0 to 2.2.8
>>>
>>>
>>>
>>> The phoenix version (phoenix-4.2.0.2.2.8.0) and HBase version
>>> (0.98.4.2.2.8.0) did not change (from what I can tell).
>>>
>>>
>>>
>>> However, some of our CSVBulkLoadTool jobs have started to fail.
>>>
>>>
>>>
>>> I’m not sure whether this is related to the upgrade or not, but the
>>> timing seems suspicious.
>>>
>>>
>>>
>>> The particular error I’m seeing is like this:
>>>
>>>
>>>
>>> ERROR mapreduce.CsvBulkLoadTool: Import job on table=<table name>
>>> failed due to exception:java.lang.IllegalArgumentException: No
>>> regions passed
>>>
>>>
>>>
>>> The Phoenix table in question has 7 regions and millions of rows in
>>> it already.
>>>
>>>
>>>
>>> The syntax I’m using is
>>>
>>>
>>>
>>> HADOOP_CLASSPATH=/<classpath stuff> hadoop jar <path to
>>> phoenix-client.jar> org.apache.phoenix.mapreduce.CsvBulkLoadTool
>>> –Dfs.permissions.umask-mode=000 --z <zookeeper quarum> --table <my
>>> table> --input <my hdfs file>
>>>
>>>
>>>
>>> Can anyone help me understand the solution here?
>>>
>>>
>>>
>>> Also, does anyone know the most recent version of Phoenix that is
>>> compatible with HDP 2.2.8 / HBase 0.98.4.2.2 ?
>>>
>>>
>>>
>>> Thanks!

Mime
View raw message