lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roethinger, Alexander" <aroethin...@affili.net>
Subject AW: Possible Issue with DoubleField
Date Thu, 17 Aug 2017 11:54:00 GMT
Hi Shad,

thank you very much for taking the time to look into this and fixing the
FieldType and GetStringValue issues!
I will apply your suggestions to my code.

Do you have any idea when these changes will be pushed to the next beta
release (4.8.0-beta00005) on Github?

Cheers
Alexander


-----Urspr√ľngliche Nachricht-----
Von: Shad Storhaug [mailto:shad@shadstorhaug.com] 
Gesendet: Dienstag, 15. August 2017 01:42
An: Roethinger, Alexander <aroethinger@affili.net>
Cc: dev@lucenenet.apache.org
Betreff: RE: Possible Issue with DoubleField

Alexander,

I have done a bit of refactoring to fix the issues:

a) Overloads were added for GetStringValue() that allow passing format
string and IFormatProvider, so you can convert a numeric Field to string
using any culture and numeric format string. Note these parameter values are
simply ignored if the field type is not numeric.
b) Reading FieldType after a search is an invalid expectation. FieldType is
a configuration type that instructs how to store a field, but it is not
retrieved from the index during a search. I have confirmed this behavior is
identical in Lucene 4.8.0 using the tests I added here
(https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests/Docume
nt/TestField.cs#L626-L734). See the FieldType documentation for usage:
https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/document/FieldTy
pe.html. Note the usage of the phrase "should be" rather than "is" on the
corresponding methods.

For the FieldType casting issue, the FieldType property of IIndexableField
was renamed to IndexableFieldType, and a new property named FieldType added
to Field. This new property returns the correct FieldType type so there is
no longer a need to cast when you have a Field type.

Also, to prevent boxing/unboxing of numeric field types, I have duplicated
simplified versions of the Number types from Java. These are only used
internally inside of Field (and are protected so they can be used by
subclasses), since it isn't very intuitive to have a Number (reference) type
system in .NET. Instead, I have duplicated the conversion methods of Number,
and added them to Field. So, for example, there is a GetInt32Value() method
that returns a nullable int and a GetInt32ValueOrDefault() extension method
that returns int.

I also added a property IIndexableField.NumericType that returns a new
enumeration type NumericFieldType (that includes BYTE and INT16, which are
not on NumericType). This value can be used to determine what type of number
is in the field (whether before writing or after reading):

	if (field.NumericType == NumericFieldType.INT32)
             	return field.GetInt32ValueOrDefault();
	if (field.NumericType == NumericFieldType.INT64)
		return field.GetInt64ValueOrDefault();

This is the way you are supposed to determine the type of the field, because
the FieldType information is not retrieved from the index.

Note that in Java, the way it was done was different - they use the analogue
to our GetNumericValue() method to check the type of value.

Number number = field.numericValue();
if (number != null)
{
      if (number instanceof Byte || number instanceof Short || number
instanceof Integer) {
        bits = NUMERIC_INT;
      } else if (number instanceof Long) {
        bits = NUMERIC_LONG;
      } else if (number instanceof Float) {
        bits = NUMERIC_FLOAT;
      } else if (number instanceof Double) {
        bits = NUMERIC_DOUBLE;
      } else {
        throw new IllegalArgumentException("cannot store numeric type " +
number.getClass());
      }
}

This is the way it can be done in Lucene.Net 4.8.0-beta00004, by
substituting the value types byte, short, int, long, float, and double and
using "is" instead of "instanceof".

However, I have deprecated the GetNumericValue() method because the
boxing/unboxing overhead of its usage makes it less than ideal. It isn't
going anywhere because it is used in tests, but its use is discouraged going
forward. Instead, you should use the Field.NumericType property to determine
what the type is, and call the appropriate GetXXXValue() method to retrieve
the numeric value cast to the correct value type.

The CI server is currently running on these changes, but in a couple of
hours you can try the new API on version 4.8.0-ci0000001115 on the CI MyGet
feed: https://www.myget.org/gallery/lucene-net-ci

I would appreciate any feedback you may have on the new API.

Thanks,
Shad Storhaug (NightOwl888)



-----Original Message-----
From: Roethinger, Alexander [mailto:aroethinger@affili.net]
Sent: Friday, August 11, 2017 5:40 PM
To: Shad Storhaug
Cc: dev@lucenenet.apache.org
Subject: AW: Possible Issue with DoubleField

Hi Shad,

thanks for getting back to me.

Regarding the casting:

Using your extension solves the issue of not having to cast IIndexableField
to Field.
Nevertheless, I still need to cast FieldType in order to access the
.NumericType value.
Probably for the same reason as FieldType is of type IIndexableFieldType.

Here's an example of what I mean

This works:
var field = doc.GetField<Field>(fieldName); //need to do this cast in order
to access the .NumericType property var fieldType =
(FieldType)field.FieldType;
Console.WriteLine("Numreric: {0}", fieldType.NumericType);


This fails:
var field = doc.GetField<Field>(fieldName);
//trying to access the Field.FieldType.NumericType property fails ( 
Console.WriteLine("Numreric: {0}", field.FieldType.NumericType);
// -> " IIndexableFieldType does not contain a definition for NumericType"

It's not a major issue and esay to get around, but does feel strange from a
usage perspective.
One could probably solve this by using a similar extension method for
retrieving the FieldType (which does the casting).


Now regarding the Double/Single issue:
Sorry for being unclear. I had added the unittest after upgrading.
But in the meantime I reverted the sample code back to v4.8.0.770 and it
shows the same behavior:
a) The StringValue does not properly represent the NumericValue.
b) FieldType properties on searching are not the same as during indexing.

The later point applies to any type of field.
Am I missing something here? Shouldn't the FieldType properties represent
the values as they were set dung indexing? 
The sample code illustrates both issues.

Kind regards
Alexander

-----Urspr√ľngliche Nachricht-----
Von: Shad Storhaug [mailto:shad@shadstorhaug.com] 
Gesendet: Freitag, 11. August 2017 06:58
An: Roethinger, Alexander <aroethinger@affili.net>
Cc: dev@lucenenet.apache.org
Betreff: RE: Possible Issue with DoubleField

Hi Alexander,

Thanks for the report.

To answer your second question, Lucene's design has changed to accommodate a
"LazyField" (in the Lucene.Net.Misc package) that is readonly. The return
type of Document.GetField(string) is IIndexableField, not Field and
therefore typically requires a cast in Java in order to use. To make this
easier to work with in .NET, I have added a generic extension method
overload so the cast is more obvious.

var field = doc.GetField<Field>(fieldName);

I am trying to avoid using something like public Field GetFieldAsField(),
which would result in an exception in the rare case the underlying
IIndexableField type could not be cast to a Field type, and it would not be
so obvious that you have to call GetField(string) and cast to an alternative
type.

You can try this approach out by copying the extension method into your
project.
https://github.com/apache/lucenenet/blob/master/src/Lucene.Net/Support/Docum
ent/DocumentExtensions.cs. 


For the Single/Double field issue, thanks for putting together the code
sample. It is a bit unclear from your question though - was this something
that worked in 4.8.0.770-beta that quit working in 4.8.0-beta00004?


Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Roethinger, Alexander [mailto:aroethinger@affili.net] 
Sent: Friday, August 11, 2017 5:32 AM
To: dev@lucenenet.apache.org
Subject: Possible Issue with DoubleField

Dear Devs,

I just updated my search application from v4.8.0.770-beta to
v4.8.0-beta00004.
It required some plausible code adjustments but otherwise worked just fine
with all of my unit tests now passing.

I did notice something though with DoubleField (and possibly with
SingleField as well, Int32Field and Int64Field both work fine) after
extending my unit tests for some edge cases using  the MaxValues for those
types.
Maybe I'm just missing something, but I thought I would raise it to the
community and would appreciate your thoughts:

a) When storing a DoubleField with a value of double.MaxValue, the string
representation of the field value is incorrect. Could it be that the
Round-Trip Format Specifier "R" is missing?
b) When retrieving the same field, the FieldType properties of the retrieved
field are not the same as when the field was stored.

This results in two challenges:
1) I can't use the Document.Get() method to retrieve the precise value.
Instead I have to use GetNumericValue().
2) When examining the values of FieldType for the retrieved field, the
properties do not match those of the stored field, ie. NumericType is set to
NONE even though it should be DOUBLE or the value of IsTokenized is changed.
I'm not sure if this is expected behavior or not. I would have assumed that
FieldType retrieves the values according to the way the field was originally
created.
The problem with these two points is that I can't easily deduct how to
properly retrieve the value based on the value of NumericType just  from
reading the field.

Another point that confuses me:
Why do I need to explicitly cast FieldType to access the NumericType
property instead of just  accessing the FieldType property of the Field?
(see line 34: var fieldType = (FieldType)field.FieldType;)

The sample ConsoleApplication code below illustrates the behavior.

Any feedback is welcome!
And thanks for all the great work you have been doing!

Kind regards
Alexander

CODE:

using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using System;

namespace LuceneTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Directory dir = new RAMDirectory();
            Analyzer analyzer = new
StandardAnalyzer(LuceneVersion.LUCENE_48);
            IndexWriterConfig iwc = new
IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);

            double value = double.MaxValue;
            string fieldName = "DoubleField";

            FieldType type = new FieldType();
            type.IsIndexed = true;
            type.IsStored = true;
            type.IsTokenized = false;
            type.NumericType = NumericType.DOUBLE;


            using (IndexWriter writer = new IndexWriter(dir, iwc))
            {
                Document doc = new Document();
                var field = new DoubleField(fieldName, value, type);
                var fieldType = (FieldType)field.FieldType;

                Console.WriteLine("DoubleField values for indexed value");
                Console.WriteLine("StringValue: {0}",
field.GetStringValue());
                Console.WriteLine("NumericValue: {0:R}",
field.GetNumericValue());
                Console.WriteLine("IsIndexed: {0}", fieldType.IsIndexed);
                Console.WriteLine("IsStored: {0}", fieldType.IsStored);
                Console.WriteLine("IsTokenized: {0}",
fieldType.IsTokenized);
                Console.WriteLine("Numreric: {0}", fieldType.NumericType);

                doc.Add(field);
                writer.AddDocument(doc);
                writer.Commit();
            }

            Console.WriteLine();

            using (IndexReader reader = DirectoryReader.Open(dir))
            {
                IndexSearcher searcher = new IndexSearcher(reader);
                var hits = searcher.Search(new MatchAllDocsQuery(),
10).ScoreDocs;

                Document doc = searcher.Doc(hits[0].Doc);
                var field = doc.GetField(fieldName);
                var fieldType = (FieldType)field.FieldType;

                Console.WriteLine("DoubleField values for searched value");
                Console.WriteLine("StringValue: {0}",
field.GetStringValue());
                Console.WriteLine("NumericValue: {0:R}",
field.GetNumericValue());
                Console.WriteLine("IsIndexed: {0}", fieldType.IsIndexed);
                Console.WriteLine("IsStored: {0}", fieldType.IsStored);
                Console.WriteLine("IsTokenized: {0}",
fieldType.IsTokenized);
                Console.WriteLine("Numreric: {0}", fieldType.NumericType);

            }

            Console.ReadKey();

            dir.Dispose();
        }

    }
}










Mime
View raw message