phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Marcel <cmar...@salesforce.com>
Subject Re: How to add lots of dummy data to a table efficiently
Date Wed, 09 Sep 2015 15:06:09 GMT
Have you looked at the pherf module in Phoenix? You can predefined
rules and have it generate data based on that.

Sent from my iPhone

> On Sep 9, 2015, at 3:55 AM, James Heather <james.heather@mendeley.com> wrote:
>
> I've been trying to create a table and then bung lots of random data into it, but without
generating the data client-side and then upserting it. I'd rather find a way to have it generated
on the server.
>
> My table has three columns, all BIGINT, and the first is the primary key.
>
> I'd ideally like to write something like
>
>    UPSERT into linktab (id, first_val, second_val) SELECT next value for linktab_next_id,
random_func, random_func ...
>
> and complete the query so that the SELECT generates its values without needing to read
from a table. But there doesn't seem to be a way of writing a SELECT that doesn't need a table
parameter.
>
> So I've ended up doing this. First I upsert a single row into the table; then, I double
the number of rows in it by a self-referential UPSERT SELECT:
>
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 8 rows affected (0.193 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 16 rows affected (0.202 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 32 rows affected (0.205 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 64 rows affected (0.235 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 128 rows affected (0.28 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 256 rows affected (0.378 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 512 rows affected (0.53 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 1,024 rows affected (0.848 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 2,048 rows affected (1.513 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 4,096 rows affected (2.817 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 8,192 rows affected (5.314 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 16,384 rows affected (10.807 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 32,768 rows affected (21.306 seconds)
> 0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select
next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from
linktab;
> 65,536 rows affected (45.713 seconds)
> 0: jdbc:phoenix:172.31.31.143> explain upsert into linktab (id, first_val, second_val)
select next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000)
from linktab;
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | UPSERT SELECT                            |
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER LINKTAB |
> |     SERVER FILTER BY FIRST KEY ONLY      |
> | CLIENT RESERVE VALUES FROM 1 SEQUENCE    |
> +------------------------------------------+
> 4 rows selected (0.192 seconds)
> 0: jdbc:phoenix:172.31.31.143>
>
>
> This works, but as you can see from the timings, it's getting quite slow, and this seems
to be because it has to do a full scan of the table to retrieve the existing rows in order
to construct the dummy data to put in. This seems a bit of a shame when it doesn't actually
need the data!
>
> Is there a way round this?
>
> Could we have a SELECT ... LIMIT n command that doesn't require a table to be specified
as long as none of the columns comes from a table?
>
> James

Mime
View raw message