phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Heather <james.heat...@mendeley.com>
Subject How to add lots of dummy data to a table efficiently
Date Wed, 09 Sep 2015 10:55:17 GMT
I've been trying to create a table and then bung lots of random data 
into it, but without generating the data client-side and then upserting 
it. I'd rather find a way to have it generated on the server.

My table has three columns, all BIGINT, and the first is the primary key.

I'd ideally like to write something like

     UPSERT into linktab (id, first_val, second_val) SELECT next value 
for linktab_next_id, random_func, random_func ...

and complete the query so that the SELECT generates its values without 
needing to read from a table. But there doesn't seem to be a way of 
writing a SELECT that doesn't need a table parameter.

So I've ended up doing this. First I upsert a single row into the table; 
then, I double the number of rows in it by a self-referential UPSERT SELECT:

0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
8 rows affected (0.193 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
16 rows affected (0.202 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
32 rows affected (0.205 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
64 rows affected (0.235 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
128 rows affected (0.28 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
256 rows affected (0.378 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
512 rows affected (0.53 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
1,024 rows affected (0.848 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
2,048 rows affected (1.513 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
4,096 rows affected (2.817 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
8,192 rows affected (5.314 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
16,384 rows affected (10.807 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
32,768 rows affected (21.306 seconds)
0: jdbc:phoenix:172.31.31.143> upsert into linktab (id, first_val, second_val) select next
value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000) from linktab;
65,536 rows affected (45.713 seconds)
0: jdbc:phoenix:172.31.31.143> explain upsert into linktab (id, first_val, second_val)
select next value for linktab_next_id, round(rand()*1000000000000), round(rand()*1000000000000)
from linktab;
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| UPSERT SELECT                            |
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER LINKTAB |
|     SERVER FILTER BY FIRST KEY ONLY      |
| CLIENT RESERVE VALUES FROM 1 SEQUENCE    |
+------------------------------------------+
4 rows selected (0.192 seconds)
0: jdbc:phoenix:172.31.31.143>


This works, but as you can see from the timings, it's getting quite 
slow, and this seems to be because it has to do a full scan of the table 
to retrieve the existing rows in order to construct the dummy data to 
put in. This seems a bit of a shame when it doesn't actually need the data!

Is there a way round this?

Could we have a SELECT ... LIMIT n command that doesn't require a table 
to be specified as long as none of the columns comes from a table?

James

Mime
View raw message