phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Best strategy for UPSERT SELECT in large table
Date Fri, 16 Jun 2017 15:27:54 GMT
Hi Pedro,
Before 4.10, it will be single client (though multi threaded). With 4.10
and above, the statement would run distributed across your cluster so
performance should improve. Note that if the source table is taking writes
while the UPSERT SELECT is running, the statement would miss those writes.

Another alternative would be to write your own MR job to do the population.

Thanks,
James

On Fri, Jun 16, 2017 at 7:51 AM Pedro Boado <pedro.boado@gmail.com> wrote:

> Hi guys,
>
> We are trying to populate a Phoenix table based on a 1:1 projection of
> another table with around 15.000.000.000 records via an UPSERT SELECT in
> phoenix client. We've noticed a very poor performance ( I suspect the
> client is using a single-threaded approach ) and lots of issues with client
> timeouts.
>
> Is there a better way of approaching this problem?
>
> Cheers!
> Pedro
>

Mime
View raw message