phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Wang <simon.w...@airbnb.com>
Subject Re: Bulk loading and index
Date Mon, 04 Jul 2016 03:00:03 GMT
Thanks, James. 

I created JIRA created at PHOENIX-3032 <https://issues.apache.org/jira/browse/PHOENIX-3032>.
I am currently looking into the code and see if I can make this change. How would you suggest
the logic should be? Having spent a few hours reading the code, I am considering a workflow
like this:

1. Record the timestamps at `ALTER INDEX .. REBUILD ASYNC`.
2. In `PhoenixIndexImportMapper.map`, process iff the record’s timestamp is newer than the
recorded timestamp.

Could you share some thoughts if this is the correct approach? If so, what are the other classes
I should look into (to add the `REBUILD ASYNC` token in parser, etc.)

Best,
Tongzhou


> On Jun 27, 2016, at 12:51 PM, James Taylor <jamestaylor@apache.org> wrote:
> 
> Tongzhou,
> Please file a JIRA for supporting ALTER INDEX .... REBUILD ASYNC. This would be a good
addition and not very difficult to implement. Contributions are, of course, always welcome.
> Regards,
> James
> 
> On Sun, Jun 26, 2016 at 2:45 AM, Ankit Singhal <ankitsinghal59@gmail.com <mailto:ankitsinghal59@gmail.com>>
wrote:
> HI Tongzhou,
> 
> May be you can trying dropping the current index and after your upload is completed,
you can create a async index. Then you can use IndexTool to rebuild your index from start.
> 
> source:- https://phoenix.apache.org/secondary_indexing.html <https://phoenix.apache.org/secondary_indexing.html>
> 
> CREATE INDEX async_index ON my_schema.my_table (v) ASYNC
> 
> But if you are only using CSVBulkLoadTool for bulk load, then it will automatically prepare
and bulk load index data also. So Index maintaining would not be required.
> 
> Regards,
> Ankit Singhal
> 
> On Sat, Jun 25, 2016 at 4:13 PM, Tongzhou Wang (Simon) <tongzhou.wang.1994@gmail.com
<mailto:tongzhou.wang.1994@gmail.com>> wrote:
> Hi Josh,
> 
> First, thanks for the response.
> 
> As far as I can tell, a disabled index cannot be directly changed to USABLE. It must
be rebuilt first. I am aware that I can do ALTER INDEX .... REBUILD. But, if I understand
correctly, this is single thread and slow. I'm wondering if I can use the IndexTool map reduce
job in this case.
> 
> About TTL, I did some experiments. Turns out that Phoenix do not automatically remove
index entry when the table entry dies from TTL setting. However, it is possible to set index
table with same TTL so that index can be in sync.
> 
> Best,
> Tongzhou
> 
> > On Jun 25, 2016, at 15:31, Josh Elser <josh.elser@gmail.com <mailto:josh.elser@gmail.com>>
wrote:
> >
> > Hi Tongzhou,
> >
> > Maybe you can try `ALTER INDEX index ON table DISABLE`. And then the same command
with USABLE after you update the index. Are you attempting to do this incrementally? Like,
a bulk load of data then a bulk load of index data, repeat?
> >
> > Regarding the TTL, I assume so, but I'm not certain.
> >
> > Tongzhou Wang wrote:
> >> Hi all,
> >>
> >> I am writing to ask if there is a way to disable an index, then update
> >> it through the MapReduce job (IndexTool). I want to bulk load a huge
> >> amount of data, but index maintaining makes it very slow. It would be
> >> great if I can disable an index, load data, then use a MapReduce job to
> >> update it to usable state.
> >>
> >> Also, does Phoenix's secondary index maintaining take TTL into account?
> >>
> >> Thanks,
> >> Tongzhou
> 
> 


Mime
View raw message