Thanks for the quick response.  Here is what I have below:

========================================
Pig script:
-------------------------------
register $phoenix_jar;

Z = load '$data' USING PigStorage(',') as (
  file_name,
  rec_num,
  epoch_time,
  timet,
  site,
  proto,
  saddr,
  daddr,
  sport,
  dport,
  mf,
  cf,
  dur,
  sdata,
  ddata,
  sbyte,
  dbyte,
  spkt,
  dpkt,
  siopt,
  diopt,
  stopt,
  dtopt,
  sflags,
  dflags,
  flags,
  sfseq,
  dfseq,
  slseq,
  dlseq,
  category);

STORE Z into 'hbase://$table_name/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY' using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 5000');

=========================

I cannot find the upsert statement you are referring to in either the MR logs or Pig output but I do have this below Pig thinks it output the correct number of records

Input(s):
Successfully read 42871627 records (1479463169 bytes) from: "/data/incoming/201501124931/SAMPLE"

Output(s):
Successfully stored 42871627 records in: "hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY"


Count command:
select count(1) from TEST;

__________________________________________________
Ralph Perko 
Pacific Northwest National Laboratory
(509) 375-2272
ralph.perko@pnnl.gov

From: Ravi Kiran <maghamravikiran@gmail.com>
Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
Date: Monday, February 2, 2015 at 11:01 AM
To: "user@phoenix.apache.org" <user@phoenix.apache.org>
Subject: Re: Pig vs Bulk Load record count

Hi Ralph,

   That's definitely a cause of worry. Can you please share the UPSERT query being built by Phoenix . You should see it in the logs with an entry "Phoenix Generic Upsert Statement: ..
Also, what do the MapReduce counters say for the job.  If possible can you share the pig script as sometimes the order of columns in the STORE command impacts. 

Regards
Ravi


On Mon, Feb 2, 2015 at 10:46 AM, Perko, Ralph J <Ralph.Perko@pnnl.gov> wrote:
Hi, Ive run into a peculiar issue between loading data using Pig vs the CsvBulkLoadTool.  I have 42M csv records to load and I am comparing the performance. 

In both cases the MR jobs are successful, and there are no errors.
In both cases the MR job counters state there are 42M Map input and output records

However, when I run count on the table when the jobs are complete something is terribly off.
After the bulk load, select count shows all 42M recs in Phoenix as is expected.
After the pig load there are only 3M recs in Phoenix not even close.

I have no errors to send.  I have run the same test multiple times and gotten the same results.    The pig script is not doing any transformations.  It is a simple LOAD and STORE
I get the same result using client jars from 4.2.2 and 4.2.3-SNAPSHOT.  4.2.3-SNAPSHOT is running on the region servers.

Thanks,
Ralph