Before 4.10, it will be single client (though multi threaded). With 4.10 and above, the statement would run distributed across your cluster so performance should improve. Note that if the source table is taking writes while the UPSERT SELECT is running, the statement would miss those writes.
Another alternative would be to write your own MR job to do the population.
We are trying to populate a Phoenix table based on a 1:1 projection of another table with around 15.000.000.000 records via an UPSERT SELECT in phoenix client. We've noticed a very poor performance ( I suspect the client is using a single-threaded approach ) and lots of issues with client timeouts.
Is there a better way of approaching this problem?