fixing flaky test: DocWordSplitCountTest #241
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of this PR
This PR addresses a flakiness issue observed in the
DocWordSplitCountTest
test case, where the test outcomes were inconsistent across different test runs due to the nondeterministic order of result rows. By introducing anORDER BY
clause in the SQL query used within the test, we ensure the deterministic ordering of the output and eliminating the observed flakiness.Why the test fails
The test fails intermittently because the order of rows returned by the SQL query varies between test runs. This variability in row order leads to assertion failures, as the test expects a specific order of rows that was not guaranteed in the original implementation.
How to reproduce the test failure
The test failure can be reproduced by running the DocWordSplitCountTest multiple times using the NonDex tool. In some runs, the order of the output rows
[a,2, b,2, c,2, d,1]
changes to[c,2, a,2, d,1, b,2]
, leading to assertion failures.To reproduce this failure, run the test with NonDex plugin
mvn -pl core edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=com.alibaba.alink.operator.common.nlp.DocWordSplitCountTest#test
Expected results
The expected result of the test is a consistent order of output rows across all test runs, specifically
[a, 2L], [b, 2L], [c, 2L], [d, 1L]
, indicating the correct count of each word present in the input string.Actual results
The actual results vary between test runs with the rows returned in a nondeterministic order, e.g.,
[a,2, b,2, c,2, d,1]
in some runs and[c,2, a,2, d,1, b,2]
in others, due to the lack of an explicit ordering mechanism in the query.Here is the NonDex output of a pass and subsequent failure.
Description of fix
The fix involves modifying the SQL query within the test to include an
ORDER BY
clause that sorts the results based on theword (w) column
. This modification ensures that the output rows are returned in a consistent, deterministic order, regardless of the execution path or the internal behavior of the SQL processing environment. This change effectively resolves the flakiness issue by aligning the actual test results with the expected, ordered results, thus making the test outcome reliable and consistent across multiple runs.I was able to verify the fix by even setting the NonDex run flag with the following:
-DnondexRuns=100
and all 100 runs were passing after this change was made.