Bug 1646986 - Tempest test test_pod_vm_ping is stuck
Summary: Tempest test test_pod_vm_ping is stuck
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-kuryr-tests-tempest
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 15.0 (Stein)
Assignee: Yossi Boaron
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-06 13:14 UTC by Itzik Brown
Modified: 2019-11-04 12:22 UTC (History)
6 users (show)

Fixed In Version: python-kuryr-tests-tempest-0.4.1-0.20190401185124.0d51e99.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:19:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 618635 0 None master: MERGED kuryr-tempest-plugin: Add timeout parameter to 'connect_get_namespaced_pod_exec' (Iea480269f7623e687fee41fb859537c026227... 2019-03-07 20:18:07 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:19:45 UTC

Description Itzik Brown 2018-11-06 13:14:18 UTC
Description of problem:
The following test is stuck when running it on OSP14 with OCP 3.11
kuryr_tempest_plugin.tests.scenario.test_cross_ping.TestCrossPingScenario.test_pod_vm_ping

It seems that it's stuck in exec_command_in_pod

Version-Release number of selected component (if applicable):
OSP14
openshift v3.11.39
kubernetes v1.11.0+d4cacc0


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Itzik Brown 2018-11-07 14:17:49 UTC
There is a bug at https://github.com/kubernetes/kubernetes/issues/67457
One of the comments mentions a workaround:
Adding a _request_timeout=10 to the dict in exec_command_in_pod:

kwargs = dict(command=command, stdin=False, stdout=True, tty=False,
              stderr=stderr,_request_timeout=10)

With this workaround it works ~30% of the time.

Comment 2 Yossi Boaron 2018-11-07 20:54:27 UTC
In Kuryr upstream we use kubernetes v1.9.1 (openshift v3.9.0) and the same K8S python client version (8.0.0)  - and all work fine.
While in OCP 3.11 (K8S 1.11.0) we are hitting this issue.
Seems that with interactive shell approach [1] - things work OK.
As Itzik mentioned, other people also reported this issue [2]

Our next steps:
1. re-write relevant tempest test to use interactive approach - and recheck.
2. ask for assistance/more info in the kubernetes-client slack channel

[1] https://github.com/kubernetes-client/python/blob/3459c173cddc9252f7eb803da9e86aaae08ee653/examples/exec.py#L56
[2] https://github.com/kubernetes/kubernetes/issues/67457

Comment 3 Yossi Boaron 2018-11-18 07:58:56 UTC
Sometimes the 'connect_get_namespaced_pod_exec' call is hanging from some reason (on OS select) although the command completed. 
It seems that setting the '_request_timeout' parameter solved the problem for the pod2pod test.

Comment 4 Yossi Boaron 2018-11-19 06:50:57 UTC
Tested with the patch that adds support for request_timeout [1], run 50 times the pod2pod test and all is fine.



[1] https://review.openstack.org/#/c/618635/1

Comment 5 Jon Schlueter 2019-01-02 12:22:14 UTC
the above mentioned patch is not in a build for OSP.  RDO Rocky Trunk [1] has it pinned before this patch as well.

[1] https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml#L7025

Comment 8 GenadiC 2019-06-05 07:59:05 UTC
The fix will be in OSP15 only

Comment 16 errata-xmlrpc 2019-09-21 11:19:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.