lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DIGY" <digyd...@gmail.com>
Subject RE: Question(problem?) about PriorityQueue
Date Tue, 25 Sep 2007 20:47:19 GMT
Hi,
I am not sure that I understand exactly what you are trying to do.

Is it possible, for example, to precalculate the "distance" at index
time(may require two passes on the document) and store it in a field which
can later be used in searching?

In addition, changing the Lucene codes will make you stick to one version,
and with every version changes you will have to update your code
accordingly.

DIGY.



-----Original Message-----
From: Hongwei Shen [mailto:Hongwei.Shen@emedia.com] 
Sent: Tuesday, September 25, 2007 4:51 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Question(problem?) about PriorityQueue

Thank you for checking. I missed it. The code doesn't have problem, the
least relevant element is at the top and when Doc Collector collecting them,
it is reversed.

We are using a custom sorting on a field that is string type, however the
comparison is based on parsing the string with some calculation. I figured
out that it is the searching which used FieldSortedHitQueue sorts the result
by string instead of by calculation and thus only return top 100 result
sorted by string order is returned from search to the TopFieldDocCollector
which uses FieldDocSortedHitQueue which I modified to use the calculation to
sort.

I have to find a way to do it.

Below is my code to sort by distance which works fine. What we want to
achieve is to sort by the combination of distance and score. This means that
we cannot calculate the distance in GetComparable(), instead, we have to
retain all the information and wait until score is available.

    [Serializable]
    public class DistanceSortComparator : SortComparator
    {
        private float longitude;
        private float latitude;

        /// <summary>
        /// Constructor that take the geographic loation of the center
        /// </summary>
        /// <param name="longitude">
        /// The lontitude of the geo-center
        /// </param>
        /// <param name="latitude">
        /// The latitude of the geo-center
        /// </param>
        public DistanceSortComparator(float latitude, float longitude)
        {
            this.longitude = longitude;
            this.latitude = latitude;
        }

        #region Overriden methods

        /// <summary>
        /// Returns the distance
        /// </summary>
        /// <param name="termtext"></param>
        /// <returns></returns>
        public override IComparable GetComparable(string termtext)
        {
            string[] loc = termtext.Split(',');
            double lat = double.Parse(loc[0]);
            double lon = double.Parse(loc[1]);
            return calculateDistance(this.latitude, this.longitude, lat,
lon);
        }

        override public string ToString()
        {
            return "Distance from (" + longitude + "," + latitude + ")";
        }
}


-----Original Message-----
From: DIGY [mailto:digydigy@gmail.com]
Sent: Monday, September 24, 2007 3:38 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Question(problem?) about PriorityQueue

Hi,
I checked the java and .net code and they look the same. It seems like it is
a coding preference not to use the index 0.

                protected internal void  Initialize(int maxSize)
                {
                        size = 0;
                        int heapSize = maxSize + 1; <<<<<<<<<
                        heap = new System.Object[heapSize];
                        this.maxSize = maxSize;
                }


Can you send a sample code where "some top results are missing"?

DIGY

-----Original Message-----
From: Hongwei Shen [mailto:Hongwei.Shen@emedia.com]
Sent: Monday, September 24, 2007 9:06 PM
To: lucene-net-dev@incubator.apache.org
Subject: Question(problem?) about PriorityQueue

Hello there,

The problem we have is that some top results are missing. My debugging led
me to the following piece of code in the PriorityQueue.cs file(line 69). I
simply cannot believe this might be wrong, so I'd like somebody to verify
it.

                public virtual bool Insert(System.Object element)
                {
                        if (size < maxSize)
                        {
                                Put(element);
                                return true;
                        }
                        else if (size > 0 && !LessThan(element, Top()))
                        {
                                heap[1] = element;
                                AdjustTop();
                                return true;
                        }
                        else
                                return false;
                }

Let's assume that maxSize is 100, when size is larger or equal to 100, the
element is compared with the top element which is heap[1], if it is not less
than the top, then the top is being replaced by the element instead of being
bumped down. It seems to me that this is not the right logic here.

If maxSize is 100, the actual heap size is 101 and the document collector
will collect top docs starting from index 1, so index 0 is never used. I
suspect that the original design of the queue is to insert the new element
in the index 0 and then sort it down.

Please let me know what do you think.

Hongwei


Mime
View raw message