lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [lucenenet] bongohrtech commented on pull request #313: Fix/random seed simple
Date Fri, 24 Jul 2020 04:42:10 GMT

bongohrtech commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-663343798


   > Its a bit strange, but although this seems to have fixed the `TestRandomStrings` tests
for `TestICUFoldingFilter` and `TestThaiAnalyzer`, the `TestThaiAnalzyer::TestRandomHugeStrings()`
test still fails. But digging into it, they both terminate in the same place, the only difference
is the `maxWordLength` parameter is increased.
   > 
   > I suspect we may have a difference in behavior somewhere in `TestUtil.RandomAnalysisString(Random,
int, bool)` that may be causing some rare weirdness. Sadly, `TestUtil` has no tests to verify
the behavior is doing what it should be doing.
   > 
   > `TestUtil.RandomAnalysisString()` is also called by `Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()`
and `Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range()`, both which are also
randomly failing. Perhaps one of the paths that `TestUtil.RandomSubString()` is going down
is broken, which would explain the randomness. I suggest to divide and conquer - keep excluding
the random paths until you find the one that causes the failure to stop happening. That would
probably be a bit quicker than reviewing every one of those methods and comparing them against
the Java implementation.
   
   I am not seeing the failing tests for TestRandomHugeStrings - can you rerun and perhaps
send the seed failing? Ive run this several times and havent hit this issue. However, maybe
this is because im using the FindFirstFailingSeed attribute - this sets NUnit.Framework.Internal.Randomizer.InitialSeed
and NUnit.Framework.Internal.TestExecutionContext.CurrentContext.CurrentTest.Seed to the same
seed which are not under normal conditions. 
   
   RandomAnalysisString is producing the same result each time, I verified this by writing
the hashed outputs to a file with the iteration and comparing them on subsequent runs. All
identical. I can include this code in the pull request if you want to review?
   
   I will take a look at those other tests now (Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()/Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range())
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message