Bug 1387536

Summary: [ocp-on-osp]Retries setting in a intermittent network
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Jan Provaznik <jprovazn>
Status: CLOSED CURRENTRELEASE QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, jokerman, jprovazn, mmccomas
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-20 08:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gan Huang 2016-10-21 08:26:24 UTC
Description of problem:
Should be able to set retries when the stack is updating/installing/subscribe rpm packages from remote server in a intermittent/unstable network

Version-Release number of selected component (if applicable):
openshift-on-openstack-0.9.4-1.el7.centos.noarch

How reproducible:
50% in my test env

Steps to Reproduce:
1. Create a stack (3 master + 1node) using RHN account
  rhn_username: "xxx"
  rhn_password: "xxxxxxxxxxxx"

2.
3.

Actual results:
1)Stack failed at

Oct 21 03:24:11 localhost cloud-init: No Presto metadata available for rhel-7-server-rpms
Oct 21 03:25:41 localhost cloud-init: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/Packages/libuuid-2.23.2-26.el7_2.3.x86_64.rpm: [Errno 14] curl#35 - "Encountered end of file"
Oct 21 03:25:41 localhost cloud-init: Trying other mirror.
Oct 21 03:25:57 localhost cloud-init: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/Packages/libteam-1.17-7.el7_2.x86_64.rpm: [Errno 12] Timeout on https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/Packages/libteam-1.17-7.el7_2.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')
Oct 21 03:25:57 localhost cloud-init: Trying other mirror.
Oct 21 03:27:43 localhost cloud-init: Error downloading packages:
Oct 21 03:27:43 localhost cloud-init: libuuid-2.23.2-26.el7_2.3.x86_64: [Errno 256] No more mirrors to try.
Oct 21 03:27:43 localhost cloud-init: + notify_failure 'could not update RPMs'

Install the packages manually on the host successed

2) Stack failed at

Oct 21 00:34:00 localhost cloud-init: Registering to: subscription.rhn.redhat.com:443/subscription
Oct 21 00:34:00 localhost cloud-init: The system has been registered with ID: 33ec5146-f2e8-4f8d-af16-db184ba2c2fb
Oct 21 00:34:00 localhost cloud-init: + '[' -n '' ']'
Oct 21 00:34:00 localhost cloud-init: + subscription-manager attach --auto
Oct 21 00:35:22 localhost cloud-init: Installed Product Current Status:
Oct 21 00:35:22 localhost cloud-init: Product Name: Red Hat Enterprise Linux Server
Oct 21 00:35:22 localhost cloud-init: Status:       Not Subscribed
Oct 21 00:35:22 localhost cloud-init: Unable to find available subscriptions for all your installed products.

Actually the server has been subscribed successfully

Expected results:
Retries could be configurable in such tasks which might be failed due to network.(download, subscirbe, etc..)

Additional info:

Comment 1 Jan Provaznik 2016-10-24 07:15:56 UTC
From logs it seems that the failure occurred during "yum update" and "subscription-manager" operations. The PR bellow adds retries to all yum operations and to all subscription-manager operations too.

https://github.com/redhat-openstack/openshift-on-openstack/pull/286

Comment 2 Jan Provaznik 2016-10-28 17:55:43 UTC
Fixed in 0.9.5

Comment 3 Gan Huang 2016-11-02 06:46:37 UTC
Verified with v0.9.5

Stack can be created successfully under random networking issues.

We can see obvious network issues from cloud-init logs, but it would not break the creation of the stack due to the retries policy.

# cat /var/log/cloud-init.log
<--snip-->
Nov  1 23:09:04 localhost cloud-init: Downloading packages:
Nov  1 23:09:04 localhost cloud-init: No Presto metadata available for rhel-7-server-rpms
Nov  1 23:10:35 localhost cloud-init: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/Packages/mariadb-libs-5.5.50-1.el7_2.x86_64.rpm: [Errno 14] curl#35 - "Encountered end of file"
Nov  1 23:10:35 localhost cloud-init: Trying other mirror.
Nov  1 23:10:51 localhost cloud-init: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/Packages/logrotate-3.8.6-7.el7_2.x86_64.rpm: [Errno 12] Timeout on https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/Packages/logrotate-3.8.6-7.el7_2.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')
Nov  1 23:10:51 localhost cloud-init: Trying other mirror.
Nov  1 23:12:26 localhost cloud-init: Error downloading packages:
Nov  1 23:12:26 localhost cloud-init: 1:mariadb-libs-5.5.50-1.el7_2.x86_64: [Errno 256] No more mirrors to try.
Nov  1 23:12:28 localhost cloud-init: Loaded plugins: product-id, search-disabled-repos, subscription-manager
Nov  1 23:12:32 localhost cloud-init: Resolving Dependencies
<--snip-->