phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Leech <jonat...@gmail.com>
Subject Re: Best strategy for UPSERT SELECT in large table
Date Mon, 19 Jun 2017 20:01:08 GMT
I think you could add additional pk columns, but not change or remove existing ones.

> On Jun 19, 2017, at 11:58 AM, Michael Young <yomaiquin@gmail.com> wrote:
> 
> Regarding your idea to use the snapshot/restore method (with a new name).  Is it possible
to add a PK column with that approach?  For example, if I wanted to change a PK column type
from VARCHAR to FLOAT, is this possible?
> 
> 
> 
>> On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech <jonathaz@gmail.com> wrote:
>> Also, if you're updating that many values and not doing it in bulk / mapreduce /
straight to hfiles, you'll want to give the region servers as much heap as possible, set store
files and blocking store files astronomically high, and set the memory size for the table
before Hbase flushes to disk as large as possible. This is to avoid compactions slowing you
down and causing timeouts. You can also break up the upsert selects into smaller chunks and
manually compact in between to mitigate. The above strategy also applies for other large updates
in the regular Hbase write path, such as building or rebuilding indexes.
>> 
>> > On Jun 18, 2017, at 11:41 AM, Jonathan Leech <jonathaz@gmail.com> wrote:
>> >
>> > Another thing to consider, but only if your 1:1 mapping keeps the primary keys
the same, is to snapshot the table and restore it with the new name, and a schema that is
the union of the old and new schemas. I would put the new columns in a new column family.
Then use upsert select, mapreduce, or Spark to transform the data, then drop the columns from
the old schema. This strategy could cut the amount of work to be done by half and not send
data over the network.
>> >
>> >> On Jun 17, 2017, at 5:06 PM, Randy Hu <ruweih@gmail.com> wrote:
>> >>
>> >> If I count the number of tailing zeros correctly, it's 15 billion records,
>> >> any solution based on HBase PUT interaction (UPSERT SELECT) would probably
>> >> take way more time than your expectation. It would be better to use the
>> >> map/reduce based bulk importer provided by Phoenix:
>> >>
>> >> https://phoenix.apache.org/bulk_dataload.html
>> >>
>> >> The importer leverages HBase bulk mode to convert all data into HBase
>> >> storage file, then hand it over to HBase in the final stage, thus avoids
>> >> all network and disk random access cost when going through HBase region
>> >> servers.
>> >>
>> >> Randy
>> >>
>> >> On Fri, Jun 16, 2017 at 9:51 AM, Pedro Boado [via Apache Phoenix User List]
>> >> <ml+s1124778n3675h74@n5.nabble.com> wrote:
>> >>
>> >>> Hi guys,
>> >>>
>> >>> We are trying to populate a Phoenix table based on a 1:1 projection
of
>> >>> another table with around 15.000.000.000 records via an UPSERT SELECT
in
>> >>> phoenix client. We've noticed a very poor performance ( I suspect the
>> >>> client is using a single-threaded approach ) and lots of issues with
client
>> >>> timeouts.
>> >>>
>> >>> Is there a better way of approaching this problem?
>> >>>
>> >>> Cheers!
>> >>> Pedro
>> >>>
>> >>>
>> >>> ------------------------------
>> >>> If you reply to this email, your message will be added to the discussion
>> >>> below:
>> >>> http://apache-phoenix-user-list.1124778.n5.nabble.com/
>> >>> Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675.html
>> >>> To start a new topic under Apache Phoenix User List, email
>> >>> ml+s1124778n1h80@n5.nabble.com
>> >>> To unsubscribe from Apache Phoenix User List, click here
>> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cnV3ZWloQGdtYWlsLmNvbXwxfC04OTI3ODY3NTc=>
>> >>> .
>> >>> NAML
>> >>> <http://apache-phoenix-user-list.1124778.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context: http://apache-phoenix-user-list.1124778.n5.nabble.com/Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675p3683.html
>> >> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
> 

Mime
View raw message