Bug 1646986

Summary: Tempest test test_pod_vm_ping is stuck
Product: Red Hat OpenStack Reporter: Itzik Brown <itbrown>
Component: python-kuryr-tests-tempestAssignee: Yossi Boaron <yboaron>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: asegurap, gcheresh, jschluet, ltomasbo, tsedovic, yboaron
Target Milestone: betaKeywords: Triaged
Target Release: 15.0 (Stein)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-kuryr-tests-tempest-0.4.1-0.20190401185124.0d51e99.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:19:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Itzik Brown 2018-11-06 13:14:18 UTC
Description of problem:
The following test is stuck when running it on OSP14 with OCP 3.11
kuryr_tempest_plugin.tests.scenario.test_cross_ping.TestCrossPingScenario.test_pod_vm_ping

It seems that it's stuck in exec_command_in_pod

Version-Release number of selected component (if applicable):
OSP14
openshift v3.11.39
kubernetes v1.11.0+d4cacc0


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Itzik Brown 2018-11-07 14:17:49 UTC
There is a bug at https://github.com/kubernetes/kubernetes/issues/67457
One of the comments mentions a workaround:
Adding a _request_timeout=10 to the dict in exec_command_in_pod:

kwargs = dict(command=command, stdin=False, stdout=True, tty=False,
              stderr=stderr,_request_timeout=10)

With this workaround it works ~30% of the time.

Comment 2 Yossi Boaron 2018-11-07 20:54:27 UTC
In Kuryr upstream we use kubernetes v1.9.1 (openshift v3.9.0) and the same K8S python client version (8.0.0)  - and all work fine.
While in OCP 3.11 (K8S 1.11.0) we are hitting this issue.
Seems that with interactive shell approach [1] - things work OK.
As Itzik mentioned, other people also reported this issue [2]

Our next steps:
1. re-write relevant tempest test to use interactive approach - and recheck.
2. ask for assistance/more info in the kubernetes-client slack channel

[1] https://github.com/kubernetes-client/python/blob/3459c173cddc9252f7eb803da9e86aaae08ee653/examples/exec.py#L56
[2] https://github.com/kubernetes/kubernetes/issues/67457

Comment 3 Yossi Boaron 2018-11-18 07:58:56 UTC
Sometimes the 'connect_get_namespaced_pod_exec' call is hanging from some reason (on OS select) although the command completed. 
It seems that setting the '_request_timeout' parameter solved the problem for the pod2pod test.

Comment 4 Yossi Boaron 2018-11-19 06:50:57 UTC
Tested with the patch that adds support for request_timeout [1], run 50 times the pod2pod test and all is fine.



[1] https://review.openstack.org/#/c/618635/1

Comment 5 Jon Schlueter 2019-01-02 12:22:14 UTC
the above mentioned patch is not in a build for OSP.  RDO Rocky Trunk [1] has it pinned before this patch as well.

[1] https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml#L7025

Comment 8 GenadiC 2019-06-05 07:59:05 UTC
The fix will be in OSP15 only

Comment 16 errata-xmlrpc 2019-09-21 11:19:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811