phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrianos Dadis <add...@gmail.com>
Subject Re: Slow
Date Thu, 07 Apr 2016 15:30:40 GMT
Please ignore the current thread as I forgot to set email subject :(

Please use the new one with the correct subject: "Slow performance on
PRIMARY KEY Queries compared to HBase API"

Adrianos Dadis.

On Thu, Apr 7, 2016 at 6:20 PM, Adrianos Dadis <adddev@gmail.com> wrote:

> Hello all,
>
> as part of a larger project, we are evaluating Phoenix performance
> compared to pure HBase API.
>
> Demo cluster have 6 slave nodes (plus required nodes for additional helper
> nodes) and all nodes are real machines (not VMs). Cluster is running
> Cloudera 5.4.5 distribution (Hadoop, HBase, etc.).
> Phoenix distribution (4.5.2 with some changes by CDH) is provided by
> Cloudera too, but it has some minor alterations in order to be compatible
> with CDH 5.4.5.
>
> Our code use phoenix version required by CDH.
> <dependency>
>       <groupId>org.apache.phoenix</groupId>
>       <artifactId>phoenix-core</artifactId>
>       <version>4.5.2-cdh5.4.5</version>
> </dependency>
>
> For Phoenix and Cloudera details please check this:
> https://blog.cloudera.com/blog/2015/11/new-apache-phoenix-4-5-2-package-from-cloudera-labs/
>
>
> When running our test, we saw a considerably slower performance on the
> select query on row key.
>
> Our test scenario is the following:
> 1) WRITE to HBase 4.000.000 rows using Phoenix
> 2) When WRITE is completed, then FETCH random (existed) ROWKEYs from HBase
> (using Phoenix). Code that run FETCH is running on the same nodes with
> HBase region servers (using Apache Storm streaming engine).
>
>
> The CREATE statement used on my example and the select query code are
> below:
> CREATE TABLE IF NOT EXISTS TA (ID varchar not null primary key, TA1
> varchar, TA2 varchar, TA3 varchar, TA4 varchar, ta5 varchar) SPLIT ON
> ('666833', '1333666', '2000500','2667333', '3334166');
>
> The only code related to Phoenix is the following:
>
> public void findById(String id) {
>   Connection conn = DriverManager.getConnection(databaseURL);
>   PreparedStatement ps = null;
>   ResultSet rs = null;
>   try {
>     ps = conn.prepareStatement("select TA1,TA2,TA4 from TA where ID = ?");
>     ps.setString(1, id);
>     rs = ps.executeQuery();
>     while (rs.next()) {
>       LOG.debug("Fetched row OK");
>     }
>   } catch (SQLException ex) {
>     LOG.error("Could not fetch data for Id {}", Id, ex);
>   } finally {
>     rs.close();
>     ps.close();
>     conn.close();
>   }
> }
>
> The explain query (explain select TA1,TA2,TA4 from TA where ID = '10')
> returns the following plan:
> | CLIENT 1-CHUNK PARALLEL 1-WAY POINT LOOKUP ON 1 KEY OVER TA |
>
> We have used metrics to get the mean execution time of each part of
> previous code:
> - ps.executeQuery(): 0.3ms
> - rs.next(): 1.2ms
>
> The query returns only one row and it seems that the slow part is
> rs.next().
> The total execution of the same query with HBase API is about 0.3ms ,
> making phoenix 5 times slower.
>
> Just to say that the upsert query of a row (not shown in code) takes about
> the same time on both implementations (Phoenix and HBase API).
>
> Is it normal or there is something that I should improve?
>
> Thanks,
> Adrianos Dadis.
>

Mime
View raw message