phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrianos Dadis <add...@gmail.com>
Subject Slow
Date Thu, 07 Apr 2016 15:20:43 GMT
Hello all,

as part of a larger project, we are evaluating Phoenix performance compared
to pure HBase API.

Demo cluster have 6 slave nodes (plus required nodes for additional helper
nodes) and all nodes are real machines (not VMs). Cluster is running
Cloudera 5.4.5 distribution (Hadoop, HBase, etc.).
Phoenix distribution (4.5.2 with some changes by CDH) is provided by
Cloudera too, but it has some minor alterations in order to be compatible
with CDH 5.4.5.

Our code use phoenix version required by CDH.
<dependency>
      <groupId>org.apache.phoenix</groupId>
      <artifactId>phoenix-core</artifactId>
      <version>4.5.2-cdh5.4.5</version>
</dependency>

For Phoenix and Cloudera details please check this:
https://blog.cloudera.com/blog/2015/11/new-apache-phoenix-4-5-2-package-from-cloudera-labs/


When running our test, we saw a considerably slower performance on the
select query on row key.

Our test scenario is the following:
1) WRITE to HBase 4.000.000 rows using Phoenix
2) When WRITE is completed, then FETCH random (existed) ROWKEYs from HBase
(using Phoenix). Code that run FETCH is running on the same nodes with
HBase region servers (using Apache Storm streaming engine).


The CREATE statement used on my example and the select query code are below:
CREATE TABLE IF NOT EXISTS TA (ID varchar not null primary key, TA1
varchar, TA2 varchar, TA3 varchar, TA4 varchar, ta5 varchar) SPLIT ON
('666833', '1333666', '2000500','2667333', '3334166');

The only code related to Phoenix is the following:

public void findById(String id) {
  Connection conn = DriverManager.getConnection(databaseURL);
  PreparedStatement ps = null;
  ResultSet rs = null;
  try {
    ps = conn.prepareStatement("select TA1,TA2,TA4 from TA where ID = ?");
    ps.setString(1, id);
    rs = ps.executeQuery();
    while (rs.next()) {
      LOG.debug("Fetched row OK");
    }
  } catch (SQLException ex) {
    LOG.error("Could not fetch data for Id {}", Id, ex);
  } finally {
    rs.close();
    ps.close();
    conn.close();
  }
}

The explain query (explain select TA1,TA2,TA4 from TA where ID = '10')
returns the following plan:
| CLIENT 1-CHUNK PARALLEL 1-WAY POINT LOOKUP ON 1 KEY OVER TA |

We have used metrics to get the mean execution time of each part of
previous code:
- ps.executeQuery(): 0.3ms
- rs.next(): 1.2ms

The query returns only one row and it seems that the slow part is
rs.next().
The total execution of the same query with HBase API is about 0.3ms ,
making phoenix 5 times slower.

Just to say that the upsert query of a row (not shown in code) takes about
the same time on both implementations (Phoenix and HBase API).

Is it normal or there is something that I should improve?

Thanks,
Adrianos Dadis.

Mime
View raw message