phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: Copy table between hbase clusters
Date Mon, 14 Dec 2015 13:38:29 GMT
Great!

I realized that my confusion was that I forgot about the salt byte at
the first position in the key.

Here's the working code if someone needs to do the same thing.

Thanks again James.

    HBaseClient client = new HBaseClient("zookeeper");
    Scanner scanner = client.newScanner("T1");
    ArrayList<ArrayList<KeyValue>> rows;
    while (null != (rows = scanner.nextRows(100).join())) {
      for (ArrayList<KeyValue> list : rows) {
        for (KeyValue kv : list) {
          byte[] key = kv.key();
          // skip salt byte
          int offset = 1;
          // flip integer sign bit
          int c1 =  (Bytes.getInt(key, offset) ^ (1 << 31));
          offset += 4;
          int c2 = (Bytes.getInt(key, offset) ^ (1 << 31));
          offset += 4;
          // flip long sign bit
          long c3 = (Bytes.getInt(key, offset) ^ (1 << 31));  // 16
          offset += 4;
          long c4 = (Bytes.getLong(key, offset) ^ (1L << 63));  // 32
          offset += 8;
          System.out.println("c1=" + c1 + ",c2=" + c2 + ",c3=" + c3 +
",c4=" + c4 +  " " + new String(key, offset, 2));
        }
      }
    }

On Sat, Dec 12, 2015 at 7:02 AM, James Taylor <jamestaylor@apache.org> wrote:
> My mistake - I thought you were going to do an in-place upgrade. No, no post
> processing will be required if the data is UPSERT'ed through the standard
> Phoenix APIs.
>
> I don't know how/why your row keys aren't the correct number of bytes. Maybe
> your row key is different than what you think?
>
> Thanks,
> James
>
> On Fri, Dec 11, 2015 at 1:36 PM, Kristoffer Sjögren <stoffe@gmail.com>
> wrote:
>>
>> My concern regarding the row key is that they are 23 bytes in reality
>> instead of 26 bytes? Hmm. Guess I could write a value manually using
>> SQL and inspect it afterwards using asynchbase to find where those 3
>> missing bytes went.
>>
>> Not sure I understand the post procedure. I'm not trying to do an
>> in-place upgrade. The table will be created from scratch in Phoenix
>> 4.4.0 / HBase 1.1.2. The data will be read from the old installation
>> using asynchbase and manually UPSERT'ed from that raw data using SQL.
>>
>> The reason I do it this way is to avoid such manual post procedures.
>> So just to be clear, will this approach still require those post
>> procedures?
>>
>>
>> Thanks James.
>>
>> On Fri, Dec 11, 2015 at 5:28 PM, James Taylor <jamestaylor@apache.org>
>> wrote:
>> > Your analysis of the row key structure is correct. Those are all fixed
>> > types
>> > (4 + 4 + 8 +8 + 2 = 26 bytes for the key).
>> >
>> > If you're going from 0.94 to 0.98, there's stuff you need to do to get
>> > your
>> > data into the new format. Best to ask about this on the HBase user list
>> > or
>> > look it up in the reference docs.
>> >
>> > Once you get your data moved over, I'd recommend reissuing your DDL,
>> > specifying a CURRENT_SCN at connection time of a timestamp in millis
>> > prior
>> > to any of your cell timestamps (so that Phoenix doesn't set empty key
>> > value
>> > markers for every row). If your using any date/time types then your DDL
>> > should specify them as their UNSIGNED equivalent. If you have DESC row
>> > key
>> > declarations for variable length types or BINARY, there's some post
>> > processing steps you'll need to do, but otherwise you should be good to
>> > go
>> > (FWIW, we had an internal team successfully go through this about 6mo
>> > back).
>> >
>> > Thanks,
>> > James
>> >
>> > On Friday, December 11, 2015, Kristoffer Sjögren <stoffe@gmail.com>
>> > wrote:
>> >>
>> >> My plan is to try use asynchbase to read the raw data and then upsert
>> >> it using Phoenix SQL.
>> >>
>> >> However, when I read the old table the data types for the row key
>> >> doesn't add up.
>> >>
>> >> CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER NOT NULL, C3 BIGINT
>> >> NOT NULL, C4 BIGINT NOT NULL, C5 CHAR(2) NOT NULL, V BIGINT CONSTRAINT
>> >> PK PRIMARY KEY ( C1, C2, C3, C4, C5 ))
>> >>
>> >> That's 4 + 4 + 8 +8 + 2 = 26 bytes for the key. But the actual key
>> >> that I read from HBase is only 23 bytes.
>> >>
>> >> [0, -128, 0, 0, 0, -44, 4, 123, -32, -128, 0, 0, 10, -128, 0, 0, 0, 0,
>> >> 0, 0, 0, 32, 32]
>> >>
>> >> Maybe the data type definitions as described on the phoenix site have
>> >> changed since version 2.2.3? Or some data type may be variable in
>> >> size?
>> >>
>> >>
>> >> On Thu, Dec 10, 2015 at 4:49 PM, Kristoffer Sjögren <stoffe@gmail.com>
>> >> wrote:
>> >> > Hi
>> >> >
>> >> > We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
>> >> > Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
>> >> > way to copy data from old-to-new table.
>> >> >
>> >> > The tables contain only a few hundred million rows so it's OK to
>> >> > export locally and then upsert.
>> >> >
>> >> > Cheers,
>> >> > -Kristoffer
>
>

Mime
View raw message