phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject Pig vs Bulk Load record count
Date Mon, 02 Feb 2015 18:46:16 GMT
Hi, I’ve run into a peculiar issue between loading data using Pig vs the CsvBulkLoadTool.
 I have 42M csv records to load and I am comparing the performance.

In both cases the MR jobs are successful, and there are no errors.
In both cases the MR job counters state there are 42M Map input and output records

However, when I run count on the table when the jobs are complete something is terribly off.
After the bulk load, select count shows all 42M recs in Phoenix as is expected.
After the pig load there are only 3M recs in Phoenix – not even close.

I have no errors to send.  I have run the same test multiple times and gotten the same results.
   The pig script is not doing any transformations.  It is a simple LOAD and STORE
I get the same result using client jars from 4.2.2 and 4.2.3-SNAPSHOT.  4.2.3-SNAPSHOT is
running on the region servers.

Thanks,
Ralph


Mime
View raw message