phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Question about IndexTool
Date Tue, 15 Sep 2015 18:46:21 GMT
The upsert statements in the MR jobs are used to convert data into the
appropriate encoding for writing to an HFile -- the data doesn't actually
get pushed to Phoenix from within the MR job. Instead, the created
KeyValues are extracted from the "output" of the upsert statement, and the
statement is rolled-back within the MR job. The extracted KeyValues are
then written to the HFile.

- Gabriel

On Tue, Sep 15, 2015 at 2:12 PM Yiannis Gkoufas <johngouf85@gmail.com>
wrote:

> Hi there,
>
> I was going through the code related to index creation via MapReduce job
> (IndexTool) and I have some questions.
> If I am not mistaken, for a global secondary index Phoenix creates a new
> HBase table which has the appropriate key (the column value of the original
> table you want to index) and loads the column values you have in your
> INCLUDE statement.
> In the PhoenixIndexImportMapper I can see that an Upsert statement is
> executed, but also HFiles are written.
> My question is the following: why is the Upsert statement needed if the
> table containing the secondary index will be populated from the HFiles
> written?
>
> Thanks a lot
>

Mime
View raw message