lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (LUCENENET-607) InvalidCastException PendingTerm cannot be cast to PendingBlock
Date Tue, 13 Aug 2019 05:25:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENENET-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shad Storhaug resolved LUCENENET-607.
-------------------------------------
    Resolution: Fixed

Thanks for the PR.

{quote}There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is used to
randomize the seed, but DateTime.Now.Millisecond could return 0 and this value is treated
an "uninitialized" and the second GOOD_FAST_HASH_SEED call will return another value.{quote}

This was due to a second bug that was made during translation of the code from Java. [{{System.currentTimeMillis()}}|https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#currentTimeMillis--]
returns the number of milliseconds since January 1, 1970, not the number of milliseconds of
the current time. I have replaced {{DateTime.Now.Millisecond}} with {{Time.CurrentTimeMilliseconds()}},
which relies on {{System.Diagnostics.Timestamp}} to generate the value, making it a number
much higher than 999 that rarely repeats.


> InvalidCastException PendingTerm cannot be cast to PendingBlock
> ---------------------------------------------------------------
>
>                 Key: LUCENENET-607
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-607
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Khindikaynen Aleksey
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Here is exception call stack:
> {code:java}
> at Lucene.Net.Codecs.BlockTreeTermsWriter.TermsWriter.Finish(Int64 sumTotalTermFreq,
Int64 sumDocFreq, Int32 docCount, TermsHashPerField termsHashPerField)
> at Lucene.Net.Index.FreqProxTermsWriterPerField.Flush(String fieldName, FieldsConsumer
consumer, SegmentWriteState state)
> at Lucene.Net.Index.FreqProxTermsWriter.Flush(IDictionary`2 fieldsToFlush, SegmentWriteState
state)
> at Lucene.Net.Index.TermsHash.Flush(IDictionary`2 fieldsToFlush, SegmentWriteState state)
> at Lucene.Net.Index.DocInverter.Flush(IDictionary`2 fieldsToFlush, SegmentWriteState
state)
> at Lucene.Net.Index.DocFieldProcessor.Flush(SegmentWriteState state)
> at Lucene.Net.Index.DocumentsWriterPerThread.Flush()
> at Lucene.Net.Index.DocumentsWriter.DoFlush(DocumentsWriterPerThread flushingDWPT)
> at Lucene.Net.Index.DocumentsWriter.FlushAllThreads(IndexWriter indexWriter)
> at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes)
> at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit commit)
> at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher referenceToRefresh)
> at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh()
> at Lucene.Net.Search.ReferenceManager`1.MaybeRefreshBlocking()
> at Lucene.Net.Search.ControlledRealTimeReopenThread`1.Run()
> {code}
> Issue is quite "hard-to-reproduce" and appears only when adding documents with the same
terms concurrently. I have not managed to make a clear test that reproduces the issue.
> I've made some research and found out that the cause of the issue are duplicate terms
in BytesRefHash structure. BytesRefHash using the Murmurhash3_x86_32 hashing algorithm with
the random seed (see StringHelper.GOOD_FAST_HASH_SEED property). StringHelper.GOOD_FAST_HASH_SEED
property is not thread-safe and could return different values if called in severeal threads
in one moment, so it could result in duplicate values in BytesRefHash (same values return
different hashes because hashes were calcucated with different seeds).
> There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is used to
randomize the seed, but DateTime.Now.Millisecond could return 0 and this value is treated
an "uninitialized" and the second GOOD_FAST_HASH_SEED call will return another value.
> The issue could be easely fixed by moving the GOOD_FAST_HASH_SEED initialization to
the static ctor of StringHelper. It will make it thread-safe and will fix 0-value issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message