fixing flaky test: DocWordSplitCountTest #241

jeromesteve202 · 2024-03-17T22:54:24Z

What is the purpose of this PR

This PR addresses a flakiness issue observed in the DocWordSplitCountTest test case, where the test outcomes were inconsistent across different test runs due to the nondeterministic order of result rows. By introducing an ORDER BY clause in the SQL query used within the test, we ensure the deterministic ordering of the output and eliminating the observed flakiness.

Why the test fails

The test fails intermittently because the order of rows returned by the SQL query varies between test runs. This variability in row order leads to assertion failures, as the test expects a specific order of rows that was not guaranteed in the original implementation.

How to reproduce the test failure

The test failure can be reproduced by running the DocWordSplitCountTest multiple times using the NonDex tool. In some runs, the order of the output rows [a,2, b,2, c,2, d,1] changes to [c,2, a,2, d,1, b,2], leading to assertion failures.

To reproduce this failure, run the test with NonDex plugin

mvn -pl core edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=com.alibaba.alink.operator.common.nlp.DocWordSplitCountTest#test

Expected results

The expected result of the test is a consistent order of output rows across all test runs, specifically [a, 2L], [b, 2L], [c, 2L], [d, 1L], indicating the correct count of each word present in the input string.

Actual results

The actual results vary between test runs with the rows returned in a nondeterministic order, e.g., [a,2, b,2, c,2, d,1] in some runs and [c,2, a,2, d,1, b,2] in others, due to the lack of an explicit ordering mechanism in the query.

Here is the NonDex output of a pass and subsequent failure.

-------------------------------------------------------
Running com.alibaba.alink.operator.common.nlp.DocWordSplitCountTest
log4j:WARN No appenders could be found for logger (org.apache.flink.api.java.typeutils.TypeExtractor).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.878 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

INFO: Adding excluded groups to newly created one
INFO: Creating new argline for Surefire
CONFIG: nondexFilter=.*
nondexMode=FULL
nondexSeed=933178
nondexStart=0
nondexEnd=9223372036854775807
nondexPrintstack=false
nondexDir=/home/ubuntu/Alink/core/.nondex
nondexJarDir=/home/ubuntu/Alink/core/.nondex
nondexExecid=GZw8JcVMV5RqPmPNMJyzKZExxLQTsyXyIAW1xVgdgQ=
nondexLogging=CONFIG
test=
[INFO] Surefire report directory: /home/ubuntu/Alink/core/.nondex/GZw8JcVMV5RqPmPNMJyzKZExxLQTsyXyIAW1xVgdgQ=

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Concurrency config is parallel='none', perCoreThreadCount=true, threadCount=2, useUnlimitedThreads=false
Running com.alibaba.alink.operator.common.nlp.DocWordSplitCountTest
log4j:WARN No appenders could be found for logger (org.apache.flink.api.java.typeutils.TypeExtractor).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.611 sec <<< FAILURE!
test(com.alibaba.alink.operator.common.nlp.DocWordSplitCountTest)  Time elapsed: 0.004 sec  <<< FAILURE!
arrays first differed at element [0]; expected:<c,2> but was:<a,2>

Description of fix

The fix involves modifying the SQL query within the test to include an ORDER BY clause that sorts the results based on the word (w) column. This modification ensures that the output rows are returned in a consistent, deterministic order, regardless of the execution path or the internal behavior of the SQL processing environment. This change effectively resolves the flakiness issue by aligning the actual test results with the expected, ordered results, thus making the test outcome reliable and consistent across multiple runs.

I was able to verify the fix by even setting the NonDex run flag with the following: -DnondexRuns=100 and all 100 runs were passing after this change was made.

CLAassistant · 2024-03-17T22:54:30Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Steve Sahayadarlin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

fixing flaky test: DocWordSplitCountTest

d2a5d60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing flaky test: DocWordSplitCountTest #241

fixing flaky test: DocWordSplitCountTest #241

jeromesteve202 commented Mar 17, 2024

CLAassistant commented Mar 17, 2024

fixing flaky test: DocWordSplitCountTest #241

Are you sure you want to change the base?

fixing flaky test: DocWordSplitCountTest #241

Conversation

jeromesteve202 commented Mar 17, 2024

What is the purpose of this PR

Why the test fails

How to reproduce the test failure

Expected results

Actual results

Description of fix

CLAassistant commented Mar 17, 2024