phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Riesland, Zack" <Zack.Riesl...@sensus.com>
Subject RE: Exception from RowCounter
Date Sun, 26 Jul 2015 14:22:58 GMT
Thanks Anil,

Any idea why the 'select count(*)' query would return '0' when I can query other ways and
get data?

________________________________
From: Anil Gupta [anilgupta84@gmail.com]
Sent: Saturday, July 25, 2015 1:36 PM
To: user@phoenix.apache.org
Subject: Re: Exception from RowCounter

Hey Zack,

As per the row_counter job error, it seems like your row counter job does not have hbase-site.xml
file in its classpath and due to  that it's not connecting to correct cluster. Check which
zookeeper cluster that job is connecting?

Sent from my iPhone

On Jul 25, 2015, at 1:23 PM, James Taylor <jamestaylor@apache.org<mailto:jamestaylor@apache.org>>
wrote:

Hi Zack,
If you find an issue, the best way for us is to come up with the smallest set of data the
demonstrates the issue and a small script that consistently reproduces it. It's just not feasible
for us to work from a billion row data set.

RowCounter is not  a Phoenix API, so you'll need to go to the HBase email lists for help on
that.

>From a schema design perspective, it seems a bit unwieldy what you're doing. Why not just
salt the data table instead of manually salting it and managing that yourself?

Thanks,
James

On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack <Zack.Riesland@sensus.com<mailto:Zack.Riesland@sensus.com>>
wrote:
I decided to start from scratch with my table schema in attempt to get a better distribution
across my regions/region servers.

So, I created a table like this:

CREATE TABLE fma.er_keyed_gz_hashed_indexed_meterkey_immutable (
    hashed_key varchar not null,
    meter_key varchar ,
    …
    endpoint_id integer,
    sample_point  integer not null,
   …
    CONSTRAINT pk_fma_er_keyed_gz_hashed_indexed_meterkey_immutable PRIMARY KEY (hashed_key,
sample_point)
    )
    COMPRESSION='GZ'
    SPLIT ON ('0-', '1-', '2-', '3-', '4-', '5-', '6-', '7-', '8-', '9-', '10-', '11-', '12-',
'13-', '14-', '15-', '16-', '17-', '18-', '19-', '20-', '21-', '22-', '23-', '24-', '25-',
'26-', '27-', '28-', '29-', '30-', '31-', '32-', '33-', '34-', '35-', '36-', '37-', '38-',
'39-', '40-', '41-', '42-', '43-', '44-', '45-', '46-', '47-', '48-', '49-', '50-', '51-',
'52-', '53-', '54-', '55-', '56-', '57-', '58-', '59-', '60-', '61-', '62-', '63-', '64-',
'65-', '66-', '67-', '68-', '69-', '70-', '71-', '72-', '73-', '74-', '75-', '76-', '77-',
'78-', '79-', '80-', '81-', '82-', '83-', '84-', '85-', '86-', '87-', '88-', '89-', '90-',
'91-', '92-', '93-', '94-', '95-', '96-', '97-', '98-', '99-', '100-', '101-', '102-', '103-',
'104-', '105-', '106-', '107-', '108-', '109-', '110-', '111-', '112-', '113-', '114-', '115-',
'116-', '117-', '118-', '119-', '120-', '121-', '122-', '123-', '124-', '125-', '126-', '127-',
'128-', '129-', '130-', '131-', '132-', '133-', '134-', '135-', '136-', '137-', '138-', '139-',
'140-', '141-', '142-', '143-', '144-', '145-', '146-', '147-', '148-', '149-', '150-', '151-',
'152-', '153-', '154-', '155-', '156-', '157-', '158-', '159-', '160-', '161-', '162-', '163-',
'164-', '165-', '166-', '167-', '168-', '169-', '170-', '171-', '172-', '173-', '174-', '175-',
'176-', '177-', '178-', '179-', '180-', '181-', '182-', '183-', '184-', '185-', '186-', '187-',
'188-', '189-', '190-', '191-', '192-', '193-', '194-', '195-', '196-', '197-', '198-', '199-',
'200-', '201-', '202-', '203-', '204-', '205-', '206-', '207-', '208-', '209-', '210-', '211-',
'212-', '213-', '214-', '215-', '216-', '217-', '218-', '219-', '220-', '221-', '222-', '223-',
'224-', '225-', '226-', '227-', '228-', '229-', '230-', '231-', '232-', '233-', '234-', '235-',
'236-', '237-', '238-', '239-', '240-', '241-', '242-', '243-', '244-', '245-', '246-', '247-',
'248-', '249-', '250-', '251-', '252-', '253-', '254-', '255-', '256-', '257-', '258-', '259-',
'260-', '261-', '262-', '263-', '264-', '265-', '266-', '267-', '268-', '269-', '270-', '271-',
'272-', '273-', '274-', '275-', '276-', '277-', '278-', '279-', '280-', '281-', '282-', '283-',
'284-', '285-', '286-', '287-', '288-', '289-', '290-', '291-', '292-', '293-', '294-', '295-',
'296-', '297-', '298-', '299-', '300-', '301-', '302-', '303-', '304-', '305-', '306-', '307-',
'308-', '309-', '310-', '311-', '312-', '313-', '314-', '315-', '316-', '317-', '318-', '319-',
'320-', '321-', '322-', '323-', '324-', '325-', '326-', '327-', '328-', '329-', '330-', '331-',
'332-', '333-', '334-', '335-', '336-', '337-', '338-', '339-', '340-', '341-', '342-', '343-',
'344-', '345-', '346-', '347-', '348-', '349-', '350-', '351-', '352-', '353-', '354-', '355-',
'356-', '357-', '358-', '359-', '360-', '361-', '362-', '363-', '364-', '365-', '366-', '367-',
'368-', '369-', '370-', '371-', '372-', '373-', '374-', '375-', '376-', '377-', '378-', '379-',
'380-', '381-', '382-', '383-', '384-', '385-', '386-', '387-', '388-', '389-', '390-', '391-',
'392-', '393-', '394-', '395-', '396-', '397-', '398-', '399-', '400-', '401-', '402-', '403-',
'404-', '405-', '406-', '407-', '408-', '409-', '410-', '411-', '412-', '413-', '414-', '415-',
'416-', '417-', '418-', '419-', '420-', '421-', '422-', '423-', '424-', '425-', '426-', '427-',
'428-', '429-', '430-', '431-', '432-', '433-', '434-', '435-', '436-', '437-', '438-', '439-',
'440-', '441-', '442-', '443-', '444-', '445-', '446-', '447-', '448-', '449-', '450-', '451-',
'452-', '453-', '454-', '455-', '456-', '457-', '458-', '459-', '460-', '461-', '462-', '463-',
'464-', '465-', '466-', '467-', '468-', '469-', '470-', '471-', '472-', '473-', '474-', '475-',
'476-', '477-', '478-', '479-', '480-', '481-', '482-', '483-', '484-', '485-', '486-', '487-',
'488-', '489-', '490-', '491-', '492-', '493-', '494-', '495-', '496-', '497-', '498-', '499-',
'500-', '501-', '502-', '503-', '504-', '505-', '506-', '507-', '508-', '509-', '510-', '511-')

The hashed key is the endpoint_id % 511, which is why the splitting is done this way.

I also added 2 secondary indexes, because I need to be able to query based on meter_key or
endpoint_id + sample_point:

CREATE INDEX fma_er_keyed_gz_hashed_indexed_endpoint_include_sample_point on fma.er_keyed_gz_hashed_indexed_meterkey_immutable
(endpoint_id) include (sample_point) SALT_BUCKETS = 256; --256 is max

CREATE INDEX fma_er_keyed_gz_hashed_indexed_meterkey on fma.er_keyed_gz_hashed_indexed_meterkey_immutable
(meter_key) SALT_BUCKETS = 256; --256 is max

This all seemed to work well.

Then I use the bulk import from CSV tool and imported about 1 billion rows.

This also seemed to work.

I can query by hashed_key and get immediate results.

If I query by endpoint_id or meter_key, it is MUCH slower (clue 1 that there’s a problem),
but it gives me results eventually.

However, when I try to get a count, it returns 0:


> select count(*) from fma.er_keyed_gz_hashed_indexed_meterkey_immutable;
+------------+
|  COUNT(1)  |
+------------+
| 0          |
+------------+
1 row selected (0.075 seconds)

And when I try to do a RowCounter job, it fails completely (see below).

Can anyone help me diagnose what is going on here?

This is a cluster with 6 large region servers, that each have about 600 regions (from this
and other tables).

This table has about 12 columns and 1 billion rows currently.

Thanks!

#  hbase org.apache.hadoop.hbase.mapreduce.RowCounter fma.er_keyed_gz_hashed_indexed_meterkey_immutable;


2015-07-25 06:53:36,425 DEBUG [main] util.RegionSizeCalculator: Region sizes calculated
2015-07-25 06:53:36,427 WARN  [main] hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period"
is deprecated. Instead, use "hbase.client.scanner.timeout.period"
2015-07-25 06:53:36,468 WARN  [main] client.ConnectionManager$HConnectionImplementation: Encountered
problems when prefetch hbase:meta table:
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in hbase:meta for table: fma.er_keyed_gz_hashed_indexed_meterkey_immutable,
row=fma.er_keyed_gz_hashed_indexed_meterkey_immutable,,99999999999999
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:164)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.prefetchRegionCache(ConnectionManager.java:1222)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1286)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1135)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1118)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1075)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:909)
        at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:528)
        at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:165)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)
2015-07-25 06:53:36,471 INFO  [main] mapreduce.JobSubmitter: Cleaning up the staging area
/user/root/.staging/job_1437395072897_1775
Exception in thread "main" org.apache.hadoop.hbase.TableNotFoundException: fma.er_keyed_gz_hashed_indexed_meterkey_immutable
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1319)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1135)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1118)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1075)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:909)
        at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:528)
        at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:165)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)


Mime
View raw message