lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Palmer" <jpal...@contactnetworks.com>
Subject RE: Storing primary key / Change lucene's document ID
Date Mon, 30 Oct 2006 14:22:50 GMT
Marc,

 

Can you give a few more details of how you are searching lucene. Maybe
some pseudo code of the method that is fast and the one that is slow. I
think you suggesting that there is a very large performance hit for
doing this:

 

DocID = Hits.Doc(i).Get("ID")

 

rather than:

 

DocID = Hits.ID(i)

 

 

JP

 

P.S. Your numbers suggested that your problem is mostly linear. It looks
like you method has some setup cost and then processes approx 300 Id's a
second

 

18260 ID's - 72.2 s  -avg 253/s

3000 ID's - 10.02s  -avg 294/s

830 ID's - 2.25s  -avg 368/s

352 ID's - 1.08s  -avg 325/s

350 ID's - 0.98s  -avg 357/s

278 ID's - 0.48s  -avg 162/s

96 ID's - 1.05s  -avg 91/s

29 ID's - 0.66s  -avg 43/s

 

Given this linear-ish behavior are you sure that the bottle neck is not
writing back to file or to SQL?

 

 

 

-----Original Message-----
From: Kaufmann M. [mailto:kaufmannma@gmail.com] 
Sent: Monday, October 30, 2006 5:11 AM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Storing primary key / Change lucene's document ID

 

Hello George,

The Problem is the speed, some samples:

 

All Counts include writing IDs to file and BULK Insert to SQL:

18260 ID's - 72.2 s

352 ID's - 1.08s

96 ID's - 1.05s

29 ID's - 0.66s

3000 ID's - 10.02s

350 ID's - 0.98s

278 ID's - 0.48s

830 ID's - 2.25s

 

As you can see - the time it takes for Records >500 is absolutely
slow...

If I write back the internal ID - it's a LOT faster...

 

I'm not using the lucene-ordering because this also slowed down the

returning process a lot.

And I'd like to count the results in different ways (which I was not
able to

do in lucene) so I have to give back all ID's into SQL...

 

Thanks for helpin'!

 

 

On 10/30/06, George Aroush <george@aroush.net> wrote:

> 

> Hi Marc,

> 

> You can't depend on Lucene's internal ID, it will change every time
when

> you

> update the index -- this is something you can't control.  The way you
are

> currently doing it, by storing an ID in a field named "id" is the
right

> way

> to do it.  Don't worry about slowing down Lucene if you call the API
to

> get

> the ID of your field "id".  Lucene is supper fast.

> 

> Regards,

> 

> -- George Aroush

> 

> -----Original Message-----

> From: Kaufmann M. [mailto:kaufmannma@gmail.com]

> Sent: Friday, October 27, 2006 4:20 PM

> To: lucene-net-dev@incubator.apache.org

> Subject: Storing primary key / Change lucene's document ID

> 

> Hello everybody,

> I've got a little question concerning the unique ID stored in the
Lucene

> index (hits.ID(i)).

> Is it possible to change this ID, or set it on doc.add?

> 

> Currently I'm running a test-project wich stores an external primary
key

> in

> a field named 'id', but if I call it from the search-engine I have to
use

> the get-method - wich slows it down.

> If I could use this primary key as lucene-ID the whole engine would be
a

> lot

> faster because I just need the ID's returned...

> 

> Does anybody know if this is possible?

> 

> Thanks!

> Best Regards, Marc

> 

> 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message