lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hongwei Shen <Hongwei.S...@emedia.com>
Subject RE: Question(problem?) about PriorityQueue
Date Tue, 25 Sep 2007 13:51:27 GMT
Thank you for checking. I missed it. The code doesn't have problem, the least relevant element
is at the top and when Doc Collector collecting them, it is reversed.

We are using a custom sorting on a field that is string type, however the comparison is based
on parsing the string with some calculation. I figured out that it is the searching which
used FieldSortedHitQueue sorts the result by string instead of by calculation and thus only
return top 100 result sorted by string order is returned from search to the TopFieldDocCollector
which uses FieldDocSortedHitQueue which I modified to use the calculation to sort.

I have to find a way to do it.

Below is my code to sort by distance which works fine. What we want to achieve is to sort
by the combination of distance and score. This means that we cannot calculate the distance
in GetComparable(), instead, we have to retain all the information and wait until score is
available.

    [Serializable]
    public class DistanceSortComparator : SortComparator
    {
        private float longitude;
        private float latitude;

        /// <summary>
        /// Constructor that take the geographic loation of the center
        /// </summary>
        /// <param name="longitude">
        /// The lontitude of the geo-center
        /// </param>
        /// <param name="latitude">
        /// The latitude of the geo-center
        /// </param>
        public DistanceSortComparator(float latitude, float longitude)
        {
            this.longitude = longitude;
            this.latitude = latitude;
        }

        #region Overriden methods

        /// <summary>
        /// Returns the distance
        /// </summary>
        /// <param name="termtext"></param>
        /// <returns></returns>
        public override IComparable GetComparable(string termtext)
        {
            string[] loc = termtext.Split(',');
            double lat = double.Parse(loc[0]);
            double lon = double.Parse(loc[1]);
            return calculateDistance(this.latitude, this.longitude, lat, lon);
        }

        override public string ToString()
        {
            return "Distance from (" + longitude + "," + latitude + ")";
        }
}


-----Original Message-----
From: DIGY [mailto:digydigy@gmail.com]
Sent: Monday, September 24, 2007 3:38 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Question(problem?) about PriorityQueue

Hi,
I checked the java and .net code and they look the same. It seems like it is
a coding preference not to use the index 0.

                protected internal void  Initialize(int maxSize)
                {
                        size = 0;
                        int heapSize = maxSize + 1; <<<<<<<<<
                        heap = new System.Object[heapSize];
                        this.maxSize = maxSize;
                }


Can you send a sample code where "some top results are missing"?

DIGY

-----Original Message-----
From: Hongwei Shen [mailto:Hongwei.Shen@emedia.com]
Sent: Monday, September 24, 2007 9:06 PM
To: lucene-net-dev@incubator.apache.org
Subject: Question(problem?) about PriorityQueue

Hello there,

The problem we have is that some top results are missing. My debugging led
me to the following piece of code in the PriorityQueue.cs file(line 69). I
simply cannot believe this might be wrong, so I'd like somebody to verify
it.

                public virtual bool Insert(System.Object element)
                {
                        if (size < maxSize)
                        {
                                Put(element);
                                return true;
                        }
                        else if (size > 0 && !LessThan(element, Top()))
                        {
                                heap[1] = element;
                                AdjustTop();
                                return true;
                        }
                        else
                                return false;
                }

Let's assume that maxSize is 100, when size is larger or equal to 100, the
element is compared with the top element which is heap[1], if it is not less
than the top, then the top is being replaced by the element instead of being
bumped down. It seems to me that this is not the right logic here.

If maxSize is 100, the actual heap size is 101 and the document collector
will collect top docs starting from index 1, so index 0 is never used. I
suspect that the original design of the queue is to insert the new element
in the index 0 and then sort it down.

Please let me know what do you think.

Hongwei


Mime
View raw message