phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: JDBC result iteration is slow
Date Mon, 10 Mar 2014 21:19:05 GMT
Hi Abe,
I don't think this is a Phoenix issue, as your query doesn't even have a
where clause. All this would boil down to in HBase would be a regular Scan
with a PageFilter (like the explain plan says). We parallelize it, but that
just amounts to running N scans each with discrete ranges over your row key.

I suspect there's a problem with one of your region servers. Check your
server logs for exceptions. You can also check the Phoenix logs to see how
long each parallel scan is taking to isolate if any regions are slower than
others.

Thanks,
James


On Mon, Mar 10, 2014 at 9:49 AM, Abe Weinograd <abe@flonet.com> wrote:

> Hi James,
>
> Thanks.  Here is the info you requested.  Additionally, I assumed it was a
> client side thing because a COUNT(1) on the whole table is < 2sec after
> rows are in the cache.  The first time running COUNT(1) is usually a bit
> longer.  My table has about 3.7M rows in it.  A SELECT * (listing most
> columns in the table) is what takes longer with the CPU spiking on the
> client while the result set is being iterated over.
>
> Thanks for your help.
> Abe
>
> - HBase version: 0.94.15 (CDH 4.6)
> - Phoenix version: 2.2.3 (using tarball from
> - size of cluster: 1 Master 4 RS (each 15GB of RAM, 4 Cores)
> - setting for JVM max heap size: 4GiB
> - create table statement: attached
> - query: attached
> - explain plan:
> CLIENT PARALLEL 48-WAY FULL SCAN OVER MY_TABLE
>     SERVER FILTER BY PageFilter 100000
> CLIENT 100000 ROW LIMIT
>
> - number of rows in table: 3.7 Million (just testing with this.  this will
> be much larger over time)
>
>
> On Mon, Mar 10, 2014 at 11:44 AM, James Taylor <jamestaylor@apache.org>wrote:
>
>> Hi Abe,
>> There's likely something wrong with your installation, as this is not
>> expected behavior. Please let us know the following:
>> - HBase version
>> - Phoenix version
>> - size of cluster
>> - setting for JVM max heap size
>> - create table statement
>> - query
>> - explain plan
>> - number of rows in table
>> Thanks,
>> James
>>
>>
>> On Monday, March 10, 2014, Abe Weinograd <abe@flonet.com> wrote:
>>
>>> I spent a little more time with this and am still unable to tune the
>>> client properly.  I am testing using sqlline, Squirrel and just using the
>>> JDBC driver in code.  I tried setting the hbase scanner caching in the JDBC
>>> connection, in addition to putting it in the hbase-site.xml in the same dir
>>> as the jar for sqlline.  I think my client is bottlenecked, partly cause
>>> the CPU spikes and ~30 secs to retrieve 1,000 rows.
>>>
>>> I expect to retrieve a lot more than this in our use cases.  Is this a
>>> tuning issue on my end or is this expected behavior.
>>>
>>> Thanks,
>>> Abe
>>>
>>>
>>> On Fri, Mar 7, 2014 at 10:19 AM, Abe Weinograd <abe@flonet.com> wrote:
>>>
>>>> Trying to pull around 100k rows through the JDBC driver.  I
>>>> set hbase.client.scanner.caching to 10000 in the JDBC connection options.
>>>>  Additionally, its very slow with even 1,000 rows (about 30 seconds to
>>>> iterate over it).
>>>>
>>>> I assume this is a client side issue, but not sure what else I can
>>>> tweak.
>>>>
>>>> Thanks,
>>>> Abe
>>>>
>>>
>>>
>

Mime
View raw message