phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Understanding keys
Date Thu, 23 Jul 2015 14:39:03 GMT
Filtering a query on the leading columns of the primary key (i.e. [A],
[A,B], or [A,B,C]) will give optimal performance. This is because the
records are in sorted order based on the combination of [A,B,C], so
filtering on a leading subset of the primary key is basically the same as
filtering on the full primary key. The set of records that pass the query
filter are all within a single defined range of records (not considering
salt buckets and multi-tenant tables).

Filtering on [B] or [B,C] may offer relatively decent performance,
depending on the cardinality and data type of primary key column A. This is
due to the way that Phoenix makes use of skip scans [1].

The farther you get away from using the leading column(s) in a filter, the
more rows that Phoenix will need to check in order to return a result,
which equals slower query performance.

The difference between having a primary key of (A,B,C) compared to a
primary key of (A, B) and a secondary index on C is that the secondary
index on C will allow much faster lookups if you're searching for a row
with a given value for C. I expect that the performance of a query where
you specify a filter on A, B, and C in this case will also be slower than
if you have a primary key on (A, B, C), as again Phoenix will have to scan
over more rows than if the primary key (A, B, C) were defined.

- Gabriel

1. http://phoenix.apache.org/skip_scan.html


On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack <Zack.Riesland@sensus.com>
wrote:

>  This is probably a silly question… please humor me: I’m a Java/JS
> developer learning about databases as I go.
>
>
>
> Suppose I have a table with columns A-Z, and declare the primary key to be
> (A, B, C).
>
>
>
> I understand that that forces each row to have a unique A, B, C
> combination.
>
>
>
> But what does it mean for querying? Can I very quickly query again column
> A and/or B and/or C, or just the combination of the 3?
>
>
>
> What is the difference between primary key (A, B, C) and primary key (A,
> B) with a secondary index on C?
>
>
>
> Thanks!
>

Mime
View raw message