phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: understanding execution semantics...
Date Wed, 09 Dec 2015 06:29:20 GMT
On Tue, Dec 8, 2015 at 9:33 PM, ashish tapdiya <ashishtapdiya@gmail.com>
wrote:

> Hello James,
>
> Thanks for the response. I will look into adding relevant TPC-H indexes.
>
> We are doing a performance study comparing Impala, Phoenix using TPC-H
> queries. For scale factor 1, I am not able to get numbers exhibiting trend
> similar to performance graph on our website comparing Impala and Phoenix
> .
>
> I have few questions regarding the performance graph comparing Phoenix and
> Impala for aggregation query (select count(1))
>
> Q1: Was the Impala data stored in parquet/text format?
>

No, the data is stored in HBase for both Impala and Phoenix as stated by
the title "Phoenix vs Impala (running over HBase)"


> Q2: What and how performance parameters (hadoop, hbase) were tuned for
> phoenix.
>

No tuning, but I'd recommend doing a major compaction prior to running the
queries.


> Q3. Can I get the same population script so that I can report numbers from
> the local cluster.
>

You can use our bin/performance.py script to generate the data.

>
> Thanks,
> Ashish
>
> On Thu, Dec 3, 2015 at 5:35 PM, James Taylor <jamestaylor@apache.org>
> wrote:
>
>> Hi Ashish,
>> Please make sure to add a secondary index on l_shipdate + l_discount + l_quantity
>> for that query. There's a kind of well-known, canonical description of the
>> optimal indexes to add for the TPC benchmarks that I can't seem to find
>> (but maybe someone else can point you to it). You should include the
>> time to call resultset.next() as that call will block until the result is
>> calculated. Also, subsequent runs of the query will always be faster due to
>> various caching that happens (OS, HBase block cache, etc.) as well as the
>> JIT compiler kicking in.
>>
>> Thanks,
>> James
>>
>> On Thu, Dec 3, 2015 at 12:34 PM, ashish tapdiya <ashishtapdiya@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am profiling TPC-H queries using Phoenix. For query no. 6 and db size
>>> 1GB (lineitem table size is around 760 MB),
>>>
>>> Query 6 : select sum(l_extendedprice * l_discount) as revenue from
>>> lineitem_sf1 where l_shipdate >= TO_DATE('1993-01-01') and l_shipdate <
>>> TO_DATE('1994-01-01') and l_discount between 0.06 and 0.08 and l_quantity <
>>> 24
>>>
>>> the execution time recorded using following code:
>>>
>>> long startTime = System.currentTimeMillis();
>>> rset = stmt.executeQuery();
>>> long stopTime = System.currentTimeMillis();
>>> long elapsedTime = stopTime - startTime;
>>>
>>> first run - 240 ms
>>> second run onwards - 80 ms
>>>
>>> However, when I iterate the result set (single row), query response time
>>> including result set iteration shoots up to 19 seconds.
>>>
>>> Does the query gets executed with stmt.executeQuery() or phoenix does
>>> not execute query until first resultset.next() is invoked?
>>>
>>> Cluster includes 4 slaves nodes. Phoenix version is 4.3.0
>>>
>>> Thanks,
>>> ~Ashish
>>>
>>
>>
>

Mime
View raw message