phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: row qualifier issue
Date Tue, 16 Sep 2014 17:23:29 GMT
Good question on first versus last column. Believe it or not, the
order of column qualifiers has a bigger impact than you might think.
Anoop added a very nice optimization in our prior release for this and
Lars has done work at the HBase level to improve things as well. The
case it matters most is when you're filtering on a column qualifier.
HBase will be fastest if you're filtering on the first column
qualifier in a column family (because it doesn't have to do any
additional processing after it's positioned at the row). Since we
never explicitly filter on the empty key value (it's used more as a
way of ensuring that a row is not filtered out for queries with
clauses like IS NULL), I think it's best to have it be the last one.
However, there may be edge cases where it's better at the beginning -
would be interesting to run a full perf test with it at the beginning
to see what the impact would be.

Thanks,
James

On Tue, Sep 16, 2014 at 9:55 AM, Abe Weinograd <abe@flonet.com> wrote:
> Great.  Thanks for  the feedback James.  Is it weird that the column is the
> last one and not the first one?  We are just using 1 column family across
> the board anyways.
>
> Additionally, is there an easy way to add this value to an existing table?
> I can just write a quick app to scan and put the value in myself.
>
> On Tue, Sep 16, 2014 at 11:35 AM, Abe Weinograd <abe@flonet.com> wrote:
>>
>> Hello,
>>
>> I am trying to figure out an issue which I think is based on how we are
>> loading data into our tables.
>>
>> We are loading data via a Map Reduce process into HFiles.  We haven't been
>> adding anything for the qualifier column _0 and our performance has
>> suffered.  Thanks to a quick look from a colleague, we noticed that it
>> wasn't there. I am trying to figure out the best way to do this, but what
>> should we actually put in the value for the qualifier?  Looking at the
>> Phoenix code base it looks like the value should be Bytes.toBytes(_0).  If
>> this changes will we have issues going forward?  Does it matter what the
>> value is?
>>
>> Also, would we get better performance having the row qualifier in a
>> different column family than all of our other columns for performance
>> reasons?
>>
>> I would think that the _0 would be first int he column list, but when i do
>> a scan in the HBase shell it comes up last.  Is that right?  Our columns
>> don't start with anything that would make this appear this way.
>>
>> Thanks in advance for your help.  Is there somewhere i can read up on
>> anything I may be missing here?
>>
>> Thanks,
>> Abe
>
>

Mime
View raw message