Bug 1646986

Summary:	Tempest test test_pod_vm_ping is stuck
Product:	Red Hat OpenStack	Reporter:	Itzik Brown <itbrown>
Component:	python-kuryr-tests-tempest	Assignee:	Yossi Boaron <yboaron>
Status:	CLOSED ERRATA	QA Contact:	Itzik Brown <itbrown>
Severity:	high	Docs Contact:
Priority:	high
Version:	14.0 (Rocky)	CC:	asegurap, gcheresh, jschluet, ltomasbo, tsedovic, yboaron
Target Milestone:	beta	Keywords:	Triaged
Target Release:	15.0 (Stein)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	python-kuryr-tests-tempest-0.4.1-0.20190401185124.0d51e99.el8ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-09-21 11:19:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Itzik Brown 2018-11-06 13:14:18 UTC

Description of problem:
The following test is stuck when running it on OSP14 with OCP 3.11
kuryr_tempest_plugin.tests.scenario.test_cross_ping.TestCrossPingScenario.test_pod_vm_ping

It seems that it's stuck in exec_command_in_pod

Version-Release number of selected component (if applicable):
OSP14
openshift v3.11.39
kubernetes v1.11.0+d4cacc0


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Itzik Brown 2018-11-07 14:17:49 UTC

There is a bug at https://github.com/kubernetes/kubernetes/issues/67457
One of the comments mentions a workaround:
Adding a _request_timeout=10 to the dict in exec_command_in_pod:

kwargs = dict(command=command, stdin=False, stdout=True, tty=False,
              stderr=stderr,_request_timeout=10)

With this workaround it works ~30% of the time.

Comment 2 Yossi Boaron 2018-11-07 20:54:27 UTC

In Kuryr upstream we use kubernetes v1.9.1 (openshift v3.9.0) and the same K8S python client version (8.0.0)  - and all work fine.
While in OCP 3.11 (K8S 1.11.0) we are hitting this issue.
Seems that with interactive shell approach [1] - things work OK.
As Itzik mentioned, other people also reported this issue [2]

Our next steps:
1. re-write relevant tempest test to use interactive approach - and recheck.
2. ask for assistance/more info in the kubernetes-client slack channel

[1] https://github.com/kubernetes-client/python/blob/3459c173cddc9252f7eb803da9e86aaae08ee653/examples/exec.py#L56
[2] https://github.com/kubernetes/kubernetes/issues/67457

Comment 3 Yossi Boaron 2018-11-18 07:58:56 UTC

Sometimes the 'connect_get_namespaced_pod_exec' call is hanging from some reason (on OS select) although the command completed. 
It seems that setting the '_request_timeout' parameter solved the problem for the pod2pod test.

Comment 4 Yossi Boaron 2018-11-19 06:50:57 UTC

Tested with the patch that adds support for request_timeout [1], run 50 times the pod2pod test and all is fine.



[1] https://review.openstack.org/#/c/618635/1

Comment 5 Jon Schlueter 2019-01-02 12:22:14 UTC

the above mentioned patch is not in a build for OSP.  RDO Rocky Trunk [1] has it pinned before this patch as well.

[1] https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml#L7025

Comment 8 GenadiC 2019-06-05 07:59:05 UTC

The fix will be in OSP15 only

Comment 16 errata-xmlrpc 2019-09-21 11:19:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811