Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: be more permissive with errors in stress fibers test #18463

Merged
merged 1 commit into from
May 14, 2024

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented May 14, 2024

The test could previously fail after enabling stress fibers because the admin endpoint could become unresponsive with heavy stress. This commit attempts to fix this in a couple ways:

  • ignoring errors when enabling stress fibers: the tests condition on seeing a specific log line to ensure stress is enabled so the response from the HTTP endpoint doesn't matter for the correctness of the test
  • retrying on failure when trying to stop stress fibers

Fixes #13701

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

The test could previously fail after enabling stress fibers because the
admin endpoint could become unresponsive with heavy stress. This commit
attempts to fix this in a couple ways:
- ignoring errors when enabling stress fibers: the tests condition on
  seeing a specific log line to ensure stress is enabled so the response
  from the HTTP endpoint doesn't matter for the correctness of the test
- retrying on failure when trying to stop stress fibers

Fixes redpanda-data#13701
@WillemKauf
Copy link
Contributor

WillemKauf commented May 14, 2024

Would this approach also fix a similar issue in #16703?

@andrwng
Copy link
Contributor Author

andrwng commented May 14, 2024

Would this approach also fix a similar issue in #16703?

I don't think so. #16703 appears to be a crash in Redpanda

rptest.services.utils.NodeCrash: <NodeCrash ip-172-31-47-25: ERROR 2024-02-23 00:25:28,208 [shard 1:main] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-0d9b050be0642d8a4-1/redpanda/redpanda/src/v/cloud_storage/remote_segment.cc:1588) 'false' "4696f899/kafka/reader-stress/95_24/7008-7169-2671777-1-v1.log.1.index" is already in progress

@andrwng
Copy link
Contributor Author

andrwng commented May 14, 2024

/ci-repeat 10
debug
skip-units
dt-repeat=10
tests/rptest/tests/cpu_stress_injection_test.py::CpuStressInjectionTest.test_stress_fibers_ms

@piyushredpanda piyushredpanda merged commit 36e3d74 into redpanda-data:dev May 14, 2024
17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants