phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Secondary index row explosion due to N key combos to handle ad-hoc queries?
Date Thu, 27 Mar 2014 22:02:08 GMT
Hi,

I wanted to extract the following in a separate thread:

I was going to ask about partitioning as a way to handle (querying against)
> large volumes of data.  This is related to my Q above about date-based
> partitioning.  But I'm wondering if one can go further.  Partitioning by
> date, partitioning by tenant, but then also partitioning by some other
> columns, which would be different for each type of data being inserted.
> e.g. for sales data maybe the partitions would be date, tenantID, but then
> also customerCountry, customerGender, etc.  For performance metrics data
> maybe it would be date, tenantID, but then also environment (prod vs. dev),
> or applicationType (e.g. my HBase cluster performance metrics vs. my Tomcat
> performance metrics), and so on.
>

> Essentially, a secondary index is declaring a partitioning. The indexed
columns make up the row > key which in HBase determines the partitioning.

Aha!  Hmmm.  But, as far as I know, how one constructs the key is.... the
key.  That is, doesn't one typically construct the key based on access
patterns?

How would that work in the the scenario I described in my other email -
unknown number of columns and ad-hoc SQL queries?

How do you handle the above without having to create all possible
combinations of columns (to anticipate any sort of query) and having to
insert N rows in the index table for each 1 row in the primary table?
 Don't you have to do that in order to handle any ad-hoc query one may
choose to run?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Mime
View raw message