Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (key symptom) in RedpandaUpgradeTest.test_workloads_through_releases #18452

Open
vbotbuildovich opened this issue May 13, 2024 · 5 comments
Assignees
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/infra CI Root Cause Analysis - Infrastructure Issue

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 13, 2024

https://buildkite.com/redpanda/vtools/builds/13676

Module: rptest.tests.workload_upgrade_runner_test
Class: RedpandaUpgradeTest
Method: test_workloads_through_releases
Arguments: {
    "cloud_storage_type": 1
}
test_id:    RedpandaUpgradeTest.test_workloads_through_releases
status:     FAIL
run time:   537.213 seconds

RemoteCommandError({'ssh_config': {'host': 'ip-172-31-11-46', 'hostname': '172.31.11.46', 'user': 'root', 'port': 22, 'password': None, 'identityfile': '/home/ubuntu/.ssh/id_rsa'}, 'hostname': 'ip-172-31-11-46', 'ssh_hostname': '172.31.11.46', 'user': 'root', 'externally_routable_ip': '35.162.166.49', '_logger': <Logger rptest.tests.workload_upgrade_runner_test.RedpandaUpgradeTest.test_workloads_through_releases.cloud_storage_type=CloudStorageType.S3-820 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0x7fd8a5a22710>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0x7fd8a5a4f100>, '_custom_ssh_exception_checks': None}, 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz', 35, b'')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/workload_upgrade_runner_test.py", line 278, in test_workloads_through_releases
    for current_version in self.upgrade_through_versions(
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_test.py", line 247, in upgrade_through_versions
    current_version = install_next()
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_test.py", line 174, in install_next
    self.redpanda._installer.install(self.redpanda.nodes, v)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 609, in install
    self._install_unlocked(nodes, install_target)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 658, in _install_unlocked
    raise e
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 638, in _install_unlocked
    self.wait_for_async_ssh(self._redpanda.logger,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda_installer.py", line 165, in wait_for_async_ssh
    for l in ssh_out_per_node[node]:
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 687, in next
    return next(self.iter_obj)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 354, in output_generator
    raise RemoteCommandError(self, cmd, exit_status, stderr.read())
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-11-46: Command 'curl -fsSL https://vectorized-public.s3.us-west-2.amazonaws.com/releases/redpanda/23.3.15/redpanda-23.3.15-amd64.tar.gz --retry 3 --retry-connrefused --retry-delay 2 --create-dir -o /opt/redpanda_installs/v23.3.15/redpanda.tar.gz && gunzip -c /opt/redpanda_installs/v23.3.15/redpanda.tar.gz | tar -xf - -C /opt/redpanda_installs/v23.3.15 && rm /opt/redpanda_installs/v23.3.15/redpanda.tar.gz' returned non-zero exit status 35.

JIRA Link: CORE-2940

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels May 13, 2024
@piyushredpanda piyushredpanda added the ci-rca/infra CI Root Cause Analysis - Infrastructure Issue label May 14, 2024
@rpdevmp rpdevmp self-assigned this May 14, 2024
@rpdevmp
Copy link
Contributor

rpdevmp commented May 14, 2024

This one doesn't look like infra issue.. Originally that is what I thought, since it shows ssh issue and timeout of 20 seconds.

But I looked at this test code and the logic was not changed for almost two years.

It is related to upgrade, we wait for service to come up

Error message is:
ducktape.errors.TimeoutError: Redpanda service ip-172-31-6-14 failed to start within 20 sec

We could increase a timeout and see what happens, but since this code is not new, looks like there is a possible degradation with the product.. I will assign it to myslef and investigate more. Let's see if we can reproduce it

@rpdevmp
Copy link
Contributor

rpdevmp commented May 16, 2024

Duplicate of #13306

@rpdevmp rpdevmp marked this as a duplicate of #13306 May 16, 2024
@rpdevmp rpdevmp closed this as completed May 16, 2024
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/infra CI Root Cause Analysis - Infrastructure Issue
Projects
None yet
Development

No branches or pull requests

3 participants