lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Palmer" <>
Subject RE: Storing primary key / Change lucene's document ID
Date Mon, 30 Oct 2006 14:22:50 GMT


Can you give a few more details of how you are searching lucene. Maybe
some pseudo code of the method that is fast and the one that is slow. I
think you suggesting that there is a very large performance hit for
doing this:


DocID = Hits.Doc(i).Get("ID")


rather than:


DocID = Hits.ID(i)





P.S. Your numbers suggested that your problem is mostly linear. It looks
like you method has some setup cost and then processes approx 300 Id's a


18260 ID's - 72.2 s  -avg 253/s

3000 ID's - 10.02s  -avg 294/s

830 ID's - 2.25s  -avg 368/s

352 ID's - 1.08s  -avg 325/s

350 ID's - 0.98s  -avg 357/s

278 ID's - 0.48s  -avg 162/s

96 ID's - 1.05s  -avg 91/s

29 ID's - 0.66s  -avg 43/s


Given this linear-ish behavior are you sure that the bottle neck is not
writing back to file or to SQL?




-----Original Message-----
From: Kaufmann M. [] 
Sent: Monday, October 30, 2006 5:11 AM
Subject: Re: Storing primary key / Change lucene's document ID


Hello George,

The Problem is the speed, some samples:


All Counts include writing IDs to file and BULK Insert to SQL:

18260 ID's - 72.2 s

352 ID's - 1.08s

96 ID's - 1.05s

29 ID's - 0.66s

3000 ID's - 10.02s

350 ID's - 0.98s

278 ID's - 0.48s

830 ID's - 2.25s


As you can see - the time it takes for Records >500 is absolutely

If I write back the internal ID - it's a LOT faster...


I'm not using the lucene-ordering because this also slowed down the

returning process a lot.

And I'd like to count the results in different ways (which I was not
able to

do in lucene) so I have to give back all ID's into SQL...


Thanks for helpin'!



On 10/30/06, George Aroush <> wrote:


> Hi Marc,


> You can't depend on Lucene's internal ID, it will change every time

> you

> update the index -- this is something you can't control.  The way you

> currently doing it, by storing an ID in a field named "id" is the

> way

> to do it.  Don't worry about slowing down Lucene if you call the API

> get

> the ID of your field "id".  Lucene is supper fast.


> Regards,


> -- George Aroush


> -----Original Message-----

> From: Kaufmann M. []

> Sent: Friday, October 27, 2006 4:20 PM

> To:

> Subject: Storing primary key / Change lucene's document ID


> Hello everybody,

> I've got a little question concerning the unique ID stored in the

> index (hits.ID(i)).

> Is it possible to change this ID, or set it on doc.add?


> Currently I'm running a test-project wich stores an external primary

> in

> a field named 'id', but if I call it from the search-engine I have to

> the get-method - wich slows it down.

> If I could use this primary key as lucene-ID the whole engine would be

> lot

> faster because I just need the ID's returned...


> Does anybody know if this is possible?


> Thanks!

> Best Regards, Marc




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message