phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil Gupta <anilgupt...@gmail.com>
Subject Re: Exception from RowCounter
Date Mon, 27 Jul 2015 00:24:26 GMT
That sounds like a Phoenix bug to me if you can run row_counter Mapreduce and still Phoenix
reports 0 rows.

Sent from my iPhone

> On Jul 26, 2015, at 10:22 AM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:
> 
> Thanks Anil,
> 
> Any idea why the 'select count(*)' query would return '0' when I can query other ways
and get data?
> 
> From: Anil Gupta [anilgupta84@gmail.com]
> Sent: Saturday, July 25, 2015 1:36 PM
> To: user@phoenix.apache.org
> Subject: Re: Exception from RowCounter
> 
> Hey Zack, 
> 
> As per the row_counter job error, it seems like your row counter job does not have hbase-site.xml
file in its classpath and due to  that it's not connecting to correct cluster. Check which
zookeeper cluster that job is connecting?
> 
> Sent from my iPhone
> 
> On Jul 25, 2015, at 1:23 PM, James Taylor <jamestaylor@apache.org> wrote:
> 
>> Hi Zack,
>> If you find an issue, the best way for us is to come up with the smallest set of
data the demonstrates the issue and a small script that consistently reproduces it. It's just
not feasible for us to work from a billion row data set.
>> 
>> RowCounter is not  a Phoenix API, so you'll need to go to the HBase email lists for
help on that.
>> 
>> From a schema design perspective, it seems a bit unwieldy what you're doing. Why
not just salt the data table instead of manually salting it and managing that yourself?
>> 
>> Thanks,
>> James
>> 
>>> On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack <Zack.Riesland@sensus.com>
wrote:
>>> I decided to start from scratch with my table schema in attempt to get a better
distribution across my regions/region servers.
>>> 
>>>  
>>> 
>>> So, I created a table like this:
>>> 
>>>  
>>> 
>>> CREATE TABLE fma.er_keyed_gz_hashed_indexed_meterkey_immutable (
>>> 
>>>     hashed_key varchar not null,
>>> 
>>>     meter_key varchar ,
>>> 
>>>     …
>>> 
>>>     endpoint_id integer,
>>> 
>>>     sample_point  integer not null,
>>> 
>>>    …
>>> 
>>>     CONSTRAINT pk_fma_er_keyed_gz_hashed_indexed_meterkey_immutable PRIMARY KEY
(hashed_key, sample_point)
>>> 
>>>     )
>>> 
>>>     COMPRESSION='GZ'
>>> 
>>>     SPLIT ON ('0-', '1-', '2-', '3-', '4-', '5-', '6-', '7-', '8-', '9-', '10-',
'11-', '12-', '13-', '14-', '15-', '16-', '17-', '18-', '19-', '20-', '21-', '22-', '23-',
'24-', '25-', '26-', '27-', '28-', '29-', '30-', '31-', '32-', '33-', '34-', '35-', '36-',
'37-', '38-', '39-', '40-', '41-', '42-', '43-', '44-', '45-', '46-', '47-', '48-', '49-',
'50-', '51-', '52-', '53-', '54-', '55-', '56-', '57-', '58-', '59-', '60-', '61-', '62-',
'63-', '64-', '65-', '66-', '67-', '68-', '69-', '70-', '71-', '72-', '73-', '74-', '75-',
'76-', '77-', '78-', '79-', '80-', '81-', '82-', '83-', '84-', '85-', '86-', '87-', '88-',
'89-', '90-', '91-', '92-', '93-', '94-', '95-', '96-', '97-', '98-', '99-', '100-', '101-',
'102-', '103-', '104-', '105-', '106-', '107-', '108-', '109-', '110-', '111-', '112-', '113-',
'114-', '115-', '116-', '117-', '118-', '119-', '120-', '121-', '122-', '123-', '124-', '125-',
'126-', '127-', '128-', '129-', '130-', '131-', '132-', '133-', '134-', '135-', '136-', '137-',
'138-', '139-', '140-', '141-', '142-', '143-', '144-', '145-', '146-', '147-', '148-', '149-',
'150-', '151-', '152-', '153-', '154-', '155-', '156-', '157-', '158-', '159-', '160-', '161-',
'162-', '163-', '164-', '165-', '166-', '167-', '168-', '169-', '170-', '171-', '172-', '173-',
'174-', '175-', '176-', '177-', '178-', '179-', '180-', '181-', '182-', '183-', '184-', '185-',
'186-', '187-', '188-', '189-', '190-', '191-', '192-', '193-', '194-', '195-', '196-', '197-',
'198-', '199-', '200-', '201-', '202-', '203-', '204-', '205-', '206-', '207-', '208-', '209-',
'210-', '211-', '212-', '213-', '214-', '215-', '216-', '217-', '218-', '219-', '220-', '221-',
'222-', '223-', '224-', '225-', '226-', '227-', '228-', '229-', '230-', '231-', '232-', '233-',
'234-', '235-', '236-', '237-', '238-', '239-', '240-', '241-', '242-', '243-', '244-', '245-',
'246-', '247-', '248-', '249-', '250-', '251-', '252-', '253-', '254-', '255-', '256-', '257-',
'258-', '259-', '260-', '261-', '262-', '263-', '264-', '265-', '266-', '267-', '268-', '269-',
'270-', '271-', '272-', '273-', '274-', '275-', '276-', '277-', '278-', '279-', '280-', '281-',
'282-', '283-', '284-', '285-', '286-', '287-', '288-', '289-', '290-', '291-', '292-', '293-',
'294-', '295-', '296-', '297-', '298-', '299-', '300-', '301-', '302-', '303-', '304-', '305-',
'306-', '307-', '308-', '309-', '310-', '311-', '312-', '313-', '314-', '315-', '316-', '317-',
'318-', '319-', '320-', '321-', '322-', '323-', '324-', '325-', '326-', '327-', '328-', '329-',
'330-', '331-', '332-', '333-', '334-', '335-', '336-', '337-', '338-', '339-', '340-', '341-',
'342-', '343-', '344-', '345-', '346-', '347-', '348-', '349-', '350-', '351-', '352-', '353-',
'354-', '355-', '356-', '357-', '358-', '359-', '360-', '361-', '362-', '363-', '364-', '365-',
'366-', '367-', '368-', '369-', '370-', '371-', '372-', '373-', '374-', '375-', '376-', '377-',
'378-', '379-', '380-', '381-', '382-', '383-', '384-', '385-', '386-', '387-', '388-', '389-',
'390-', '391-', '392-', '393-', '394-', '395-', '396-', '397-', '398-', '399-', '400-', '401-',
'402-', '403-', '404-', '405-', '406-', '407-', '408-', '409-', '410-', '411-', '412-', '413-',
'414-', '415-', '416-', '417-', '418-', '419-', '420-', '421-', '422-', '423-', '424-', '425-',
'426-', '427-', '428-', '429-', '430-', '431-', '432-', '433-', '434-', '435-', '436-', '437-',
'438-', '439-', '440-', '441-', '442-', '443-', '444-', '445-', '446-', '447-', '448-', '449-',
'450-', '451-', '452-', '453-', '454-', '455-', '456-', '457-', '458-', '459-', '460-', '461-',
'462-', '463-', '464-', '465-', '466-', '467-', '468-', '469-', '470-', '471-', '472-', '473-',
'474-', '475-', '476-', '477-', '478-', '479-', '480-', '481-', '482-', '483-', '484-', '485-',
'486-', '487-', '488-', '489-', '490-', '491-', '492-', '493-', '494-', '495-', '496-', '497-',
'498-', '499-', '500-', '501-', '502-', '503-', '504-', '505-', '506-', '507-', '508-', '509-',
'510-', '511-')
>>> 
>>>  
>>> 
>>> The hashed key is the endpoint_id % 511, which is why the splitting is done this
way.
>>> 
>>>  
>>> 
>>> I also added 2 secondary indexes, because I need to be able to query based on
meter_key or endpoint_id + sample_point:
>>> 
>>>  
>>> 
>>> CREATE INDEX fma_er_keyed_gz_hashed_indexed_endpoint_include_sample_point on
fma.er_keyed_gz_hashed_indexed_meterkey_immutable (endpoint_id) include (sample_point) SALT_BUCKETS
= 256; --256 is max
>>> 
>>>  
>>> 
>>> CREATE INDEX fma_er_keyed_gz_hashed_indexed_meterkey on fma.er_keyed_gz_hashed_indexed_meterkey_immutable
(meter_key) SALT_BUCKETS = 256; --256 is max
>>> 
>>>  
>>> 
>>> This all seemed to work well.
>>> 
>>>  
>>> 
>>> Then I use the bulk import from CSV tool and imported about 1 billion rows.
>>> 
>>>  
>>> 
>>> This also seemed to work.
>>> 
>>>  
>>> 
>>> I can query by hashed_key and get immediate results.
>>> 
>>>  
>>> 
>>> If I query by endpoint_id or meter_key, it is MUCH slower (clue 1 that there’s
a problem), but it gives me results eventually.
>>> 
>>>  
>>> 
>>> However, when I try to get a count, it returns 0:
>>> 
>>>  
>>> 
>>>  
>>> 
>>> > select count(*) from fma.er_keyed_gz_hashed_indexed_meterkey_immutable;
>>> 
>>> +------------+
>>> 
>>> |  COUNT(1)  |
>>> 
>>> +------------+
>>> 
>>> | 0          |
>>> 
>>> +------------+
>>> 
>>> 1 row selected (0.075 seconds)
>>> 
>>>  
>>> 
>>> And when I try to do a RowCounter job, it fails completely (see below).
>>> 
>>>  
>>> 
>>> Can anyone help me diagnose what is going on here?
>>> 
>>>  
>>> 
>>> This is a cluster with 6 large region servers, that each have about 600 regions
(from this and other tables).
>>> 
>>>  
>>> 
>>> This table has about 12 columns and 1 billion rows currently.
>>> 
>>>  
>>> 
>>> Thanks!
>>> 
>>>  
>>> 
>>> #  hbase org.apache.hadoop.hbase.mapreduce.RowCounter fma.er_keyed_gz_hashed_indexed_meterkey_immutable;
>>> 
>>>  
>>> 
>>>  
>>> 
>>> 2015-07-25 06:53:36,425 DEBUG [main] util.RegionSizeCalculator: Region sizes
calculated
>>> 
>>> 2015-07-25 06:53:36,427 WARN  [main] hbase.HBaseConfiguration: Config option
"hbase.regionserver.lease.period" is deprecated. Instead, use "hbase.client.scanner.timeout.period"
>>> 
>>> 2015-07-25 06:53:36,468 WARN  [main] client.ConnectionManager$HConnectionImplementation:
Encountered problems when prefetch hbase:meta table:
>>> 
>>> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in hbase:meta
for table: fma.er_keyed_gz_hashed_indexed_meterkey_immutable, row=fma.er_keyed_gz_hashed_indexed_meterkey_immutable,,99999999999999
>>> 
>>>         at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:164)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.prefetchRegionCache(ConnectionManager.java:1222)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1286)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1135)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1118)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1075)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:909)
>>> 
>>>         at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:528)
>>> 
>>>         at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:165)
>>> 
>>>         at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
>>> 
>>>         at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
>>> 
>>>         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>>> 
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>> 
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>> 
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
>>> 
>>>         at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)
>>> 
>>> 2015-07-25 06:53:36,471 INFO  [main] mapreduce.JobSubmitter: Cleaning up the
staging area /user/root/.staging/job_1437395072897_1775
>>> 
>>> Exception in thread "main" org.apache.hadoop.hbase.TableNotFoundException: fma.er_keyed_gz_hashed_indexed_meterkey_immutable
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1319)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1135)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1118)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1075)
>>> 
>>>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:909)
>>> 
>>>         at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:528)
>>> 
>>>         at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:165)
>>> 
>>>         at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
>>> 
>>>         at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
>>> 
>>>         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>>> 
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>> 
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>> 
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>>> 
>>>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
>>> 
>>>         at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)

Mime
View raw message