phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Getting too many open files during table scan
Date Tue, 20 Jun 2017 18:54:20 GMT
I think this is more of an issue of your 78 salt buckets than the width 
of your table. Each chunk, running in parallel, is spilling incremental 
counts to disk.

I'd check your ulimit settings on the node which you run this query from 
and try to increase the number of open files allowed before going into 
this one in more depth :)

On 6/16/17 2:31 PM, Michael Young wrote:
> 
> We are running a 13-node hbase cluster.  One table uses 78 SALT BUCKETS 
> which seems to work reasonable well for both read and write.  This table 
> has 130 columns with a PK having 30 columns (fairly wide table).
> 
> However, after adding several new tables we are seeing errors about too 
> many open files when running a full table scan.
> 
> 
> Caused by: org.apache.phoenix.exception.PhoenixIOException: Too many 
> open files
>          at 
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)
>          at 
> org.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:152)
>          at 
> org.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:84)
>          at 
> org.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:63)
>          at 
> org.apache.phoenix.iterate.SpoolingResultIterator$SpoolingResultIteratorFactory.newIterator(SpoolingResultIterator.java:79)
>          at 
> org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:112)
>          at 
> org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:103)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>          at 
> org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
>          at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>          at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Too many open files
>          at java.io.UnixFileSystem.createFileExclusively(Native Method)
>          at java.io.File.createTempFile(File.java:2024)
>          at 
> org.apache.phoenix.shaded.org.apache.commons.io.output.DeferredFileOutputStream.thresholdReached(DeferredFileOutputStream.java:176)
>          at 
> org.apache.phoenix.iterate.SpoolingResultIterator$1.thresholdReached(SpoolingResultIterator.java:116)
>          at 
> org.apache.phoenix.shaded.org.apache.commons.io.output.ThresholdingOutputStream.checkThreshold(ThresholdingOutputStream.java:224)
>          at 
> org.apache.phoenix.shaded.org.apache.commons.io.output.ThresholdingOutputStream.write(ThresholdingOutputStream.java:92)
>          at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
>          at 
> org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)
>          at 
> org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
>          at org.apache.phoenix.util.TupleUtil.write(TupleUtil.java:149)
>          at 
> org.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:127)
>          ... 10 more
> 
> 
> When running an explain plan:
> explain select count(1) from MYBIGTABLE
> 
> +------------------------------------------------------------------------------------------------------------------+
> |                                                       
> PLAN                                                       |
> +------------------------------------------------------------------------------------------------------------------+
> | CLIENT 8728-CHUNK 674830174 ROWS 2721056772632 BYTES PARALLEL 78-WAY 
> FULL SCAN OVER ATT.PRE_ENG_CONVERSION_OLAP  |
> |     ROW TIMESTAMP FILTER [0, 
> 9223372036854775807)                                                                
> |
> |     SERVER FILTER BY FIRST KEY 
> ONLY                                                                              
> |
> |     SERVER AGGREGATE INTO SINGLE 
> ROW                                                                             
> |
> +------------------------------------------------------------------------------------------------------------------+
> 
> I has a lot of chunks.  Normally this query would return at least some 
> result after running for a few minutes.  With appropriate filters in the 
> WHERE clause, the queries run fine.
> 
> Any suggestions on how to avoid this error and get better performance 
> from the table scans?  Realizing that we don't need to run full table 
> scans regularly, just trying to understand better best practices for 
> Phoenix Hbase.
> 
> Thank you,
> Michael

Mime
View raw message