phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Riesland, Zack" <Zack.Riesl...@sensus.com>
Subject RE: Understanding keys
Date Thu, 23 Jul 2015 18:59:49 GMT
Thanks a lot for that explanation, Gabriel.

From: Gabriel Reid [mailto:gabriel.reid@gmail.com]
Sent: Thursday, July 23, 2015 10:39 AM
To: user@phoenix.apache.org
Subject: Re: Understanding keys

Filtering a query on the leading columns of the primary key (i.e. [A], [A,B], or [A,B,C])
will give optimal performance. This is because the records are in sorted order based on the
combination of [A,B,C], so filtering on a leading subset of the primary key is basically the
same as filtering on the full primary key. The set of records that pass the query filter are
all within a single defined range of records (not considering salt buckets and multi-tenant
tables).

Filtering on [B] or [B,C] may offer relatively decent performance, depending on the cardinality
and data type of primary key column A. This is due to the way that Phoenix makes use of skip
scans [1].

The farther you get away from using the leading column(s) in a filter, the more rows that
Phoenix will need to check in order to return a result, which equals slower query performance.

The difference between having a primary key of (A,B,C) compared to a primary key of (A, B)
and a secondary index on C is that the secondary index on C will allow much faster lookups
if you're searching for a row with a given value for C. I expect that the performance of a
query where you specify a filter on A, B, and C in this case will also be slower than if you
have a primary key on (A, B, C), as again Phoenix will have to scan over more rows than if
the primary key (A, B, C) were defined.

- Gabriel

1. http://phoenix.apache.org/skip_scan.html


On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack <Zack.Riesland@sensus.com<mailto:Zack.Riesland@sensus.com>>
wrote:
This is probably a silly question… please humor me: I’m a Java/JS developer learning about
databases as I go.

Suppose I have a table with columns A-Z, and declare the primary key to be (A, B, C).

I understand that that forces each row to have a unique A, B, C combination.

But what does it mean for querying? Can I very quickly query again column A and/or B and/or
C, or just the combination of the 3?

What is the difference between primary key (A, B, C) and primary key (A, B) with a secondary
index on C?

Thanks!
Mime
View raw message