Bug 1812407
Summary: | Executin of ansible playbook for cluster expansion returns RC 2 and strderr is: [WARNING]: The value 1 (type int) in a string field was converted to u'1' (type string). | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Petr Balogh <pbalogh> |
Component: | Installer | Assignee: | Russell Teague <rteague> |
Installer sub component: | openshift-ansible | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | vavuthu |
Version: | 4.3.0 | Keywords: | Automation |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-11 12:40:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Petr Balogh
2020-03-11 09:35:33 UTC
The playbook did not complete successfully for all hosts. Two of the nodes failed because of an issue pulling the release image. An exit code of '2' is expected when there are failed hosts. TASK [openshift_node : Pull release image] ************************************* fatal: [rhel1-1.nberry-dc7-rhcos.qe.rh-ocs.com]: FAILED! => {"changed": true, "cmd": ["podman", "pull", "--tls-verify=False", "--authfile", "/tmp/ansible.GCwW1y/pull-secret.json", "quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3"], "delta": "0:00:07.230095", "end": "2020-03-11 12:33:58.304901", "msg": "non-zero return code", "rc": 125, "start": "2020-03-11 12:33:51.074806", "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3...Getting image source signatures\nCopying blob sha256:964d57e311cc40e34746a347692678b801ad54144e367eef117e66d43d749c60\nCopying blob sha256:92d368863ef4e46361306339fae7473b2a569eedcda94f99f472a6545c70ef82\nCopying blob sha256:104cc276e39f95e92c9e08dd76b6c8db89b6f4093f4dd7d3975f3df1e683f78a\nCopying blob sha256:6d1814359c742980a065213f5968a5e80121728e122a7568e8d708087c42dcbc\nCopying blob sha256:875cc07082e20f772339a6f18de201d3640ebaf0021afba6c4abe5e21ed39379\nCopying blob sha256:e88ff14059f505f5faddad1237a07f4378fe0dcc0346fff7fd20576fe289638a\ntime=\"2020-03-11T12:33:58+05:30\" level=error msg=\"Error pulling image ref //quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: Error writing blob: error storing blob to file \\\"/var/tmp/storage372764296/1\\\": read tcp 10.46.27.217:40438->99.84.222.79:443: read: connection reset by peer\"\nFailed\nError: error pulling image \"quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3\": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: unable to pull image: Error writing blob: error storing blob to file \"/var/tmp/storage372764296/1\": read tcp 10.46.27.217:40438->99.84.222.79:443: read: connection reset by peer", "stderr_lines": ["Trying to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3...Getting image source signatures", "Copying blob sha256:964d57e311cc40e34746a347692678b801ad54144e367eef117e66d43d749c60", "Copying blob sha256:92d368863ef4e46361306339fae7473b2a569eedcda94f99f472a6545c70ef82", "Copying blob sha256:104cc276e39f95e92c9e08dd76b6c8db89b6f4093f4dd7d3975f3df1e683f78a", "Copying blob sha256:6d1814359c742980a065213f5968a5e80121728e122a7568e8d708087c42dcbc", "Copying blob sha256:875cc07082e20f772339a6f18de201d3640ebaf0021afba6c4abe5e21ed39379", "Copying blob sha256:e88ff14059f505f5faddad1237a07f4378fe0dcc0346fff7fd20576fe289638a", "time=\"2020-03-11T12:33:58+05:30\" level=error msg=\"Error pulling image ref //quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: Error writing blob: error storing blob to file \\\"/var/tmp/storage372764296/1\\\": read tcp 10.46.27.217:40438->99.84.222.79:443: read: connection reset by peer\"", "Failed", "Error: error pulling image \"quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3\": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: unable to pull image: Error writing blob: error storing blob to file \"/var/tmp/storage372764296/1\": read tcp 10.46.27.217:40438->99.84.222.79:443: read: connection reset by peer"], "stdout": "", "stdout_lines": []} fatal: [rhel1-2.nberry-dc7-rhcos.qe.rh-ocs.com]: FAILED! => {"changed": true, "cmd": ["podman", "pull", "--tls-verify=False", "--authfile", "/tmp/ansible.JGeE6n/pull-secret.json", "quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3"], "delta": "0:00:09.737676", "end": "2020-03-11 12:34:00.838081", "msg": "non-zero return code", "rc": 125, "start": "2020-03-11 12:33:51.100405", "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3...Getting image source signatures\nCopying blob sha256:104cc276e39f95e92c9e08dd76b6c8db89b6f4093f4dd7d3975f3df1e683f78a\nCopying blob sha256:6d1814359c742980a065213f5968a5e80121728e122a7568e8d708087c42dcbc\nCopying blob sha256:92d368863ef4e46361306339fae7473b2a569eedcda94f99f472a6545c70ef82\nCopying blob sha256:964d57e311cc40e34746a347692678b801ad54144e367eef117e66d43d749c60\nCopying blob sha256:e88ff14059f505f5faddad1237a07f4378fe0dcc0346fff7fd20576fe289638a\nCopying blob sha256:875cc07082e20f772339a6f18de201d3640ebaf0021afba6c4abe5e21ed39379\ntime=\"2020-03-11T12:34:00+05:30\" level=error msg=\"Error pulling image ref //quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: Error writing blob: error storing blob to file \\\"/var/tmp/storage101571689/2\\\": read tcp 10.46.27.219:48276->99.84.222.83:443: read: connection reset by peer\"\nFailed\nError: error pulling image \"quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3\": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: unable to pull image: Error writing blob: error storing blob to file \"/var/tmp/storage101571689/2\": read tcp 10.46.27.219:48276->99.84.222.83:443: read: connection reset by peer", "stderr_lines": ["Trying to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3...Getting image source signatures", "Copying blob sha256:104cc276e39f95e92c9e08dd76b6c8db89b6f4093f4dd7d3975f3df1e683f78a", "Copying blob sha256:6d1814359c742980a065213f5968a5e80121728e122a7568e8d708087c42dcbc", "Copying blob sha256:92d368863ef4e46361306339fae7473b2a569eedcda94f99f472a6545c70ef82", "Copying blob sha256:964d57e311cc40e34746a347692678b801ad54144e367eef117e66d43d749c60", "Copying blob sha256:e88ff14059f505f5faddad1237a07f4378fe0dcc0346fff7fd20576fe289638a", "Copying blob sha256:875cc07082e20f772339a6f18de201d3640ebaf0021afba6c4abe5e21ed39379", "time=\"2020-03-11T12:34:00+05:30\" level=error msg=\"Error pulling image ref //quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: Error writing blob: error storing blob to file \\\"/var/tmp/storage101571689/2\\\": read tcp 10.46.27.219:48276->99.84.222.83:443: read: connection reset by peer\"", "Failed", "Error: error pulling image \"quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3\": unable to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3: unable to pull image: Error writing blob: error storing blob to file \"/var/tmp/storage101571689/2\": read tcp 10.46.27.219:48276->99.84.222.83:443: read: connection reset by peer"], "stdout": "", "stdout_lines": []} changed: [rhel1-0.nberry-dc7-rhcos.qe.rh-ocs.com] => {"changed": true, "cmd": ["podman", "pull", "--tls-verify=False", "--authfile", "/tmp/ansible.XeiKYS/pull-secret.json", "quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3"], "delta": "0:00:14.166416", "end": "2020-03-11 12:34:05.233500", "rc": 0, "start": "2020-03-11 12:33:51.067084", "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3...Getting image source signatures\nCopying blob sha256:6d1814359c742980a065213f5968a5e80121728e122a7568e8d708087c42dcbc\nCopying blob sha256:104cc276e39f95e92c9e08dd76b6c8db89b6f4093f4dd7d3975f3df1e683f78a\nCopying blob sha256:964d57e311cc40e34746a347692678b801ad54144e367eef117e66d43d749c60\nCopying blob sha256:92d368863ef4e46361306339fae7473b2a569eedcda94f99f472a6545c70ef82\nCopying blob sha256:875cc07082e20f772339a6f18de201d3640ebaf0021afba6c4abe5e21ed39379\nCopying blob sha256:e88ff14059f505f5faddad1237a07f4378fe0dcc0346fff7fd20576fe289638a\nCopying config sha256:d06ffb37f9543ca2c57592912bbeef82ee225b3e3c7b8470cde4d5873584e571\nWriting manifest to image destination\nStoring signatures", "stderr_lines": ["Trying to pull quay.io/openshift-release-dev/ocp-release@sha256:ea7ac3ad42169b39fce07e5e53403a028644810bee9a212e7456074894df40f3...Getting image source signatures", "Copying blob sha256:6d1814359c742980a065213f5968a5e80121728e122a7568e8d708087c42dcbc", "Copying blob sha256:104cc276e39f95e92c9e08dd76b6c8db89b6f4093f4dd7d3975f3df1e683f78a", "Copying blob sha256:964d57e311cc40e34746a347692678b801ad54144e367eef117e66d43d749c60", "Copying blob sha256:92d368863ef4e46361306339fae7473b2a569eedcda94f99f472a6545c70ef82", "Copying blob sha256:875cc07082e20f772339a6f18de201d3640ebaf0021afba6c4abe5e21ed39379", "Copying blob sha256:e88ff14059f505f5faddad1237a07f4378fe0dcc0346fff7fd20576fe289638a", "Copying config sha256:d06ffb37f9543ca2c57592912bbeef82ee225b3e3c7b8470cde4d5873584e571", "Writing manifest to image destination", "Storing signatures"], "stdout": "d06ffb37f9543ca2c57592912bbeef82ee225b3e3c7b8470cde4d5873584e571", "stdout_lines": ["d06ffb37f9543ca2c57592912bbeef82ee225b3e3c7b8470cde4d5873584e571"]} Oh, I saw: failed=0 so thought that no failures there. Do we know why it could happen? Or is it possible to set some global retry for such failures to retry those tasks? Thanks. Vijay, only one thing I can think of is the same issue we had before with conflicting IP addresses in the network. Did you talk to PNT guys about this? Thanks. The PLAY RECAP shows the status of tasks for all the hosts for that playbook run. PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0 rhel1-0.nberry-dc7-rhcos.qe.rh-ocs.com : ok=37 changed=29 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0 rhel1-1.nberry-dc7-rhcos.qe.rh-ocs.com : ok=28 changed=20 unreachable=0 failed=1 skipped=7 rescued=0 ignored=0 rhel1-2.nberry-dc7-rhcos.qe.rh-ocs.com : ok=28 changed=20 unreachable=0 failed=1 skipped=7 rescued=0 ignored=0 You can see above that both rhel1-1 and rhel1-2 reported failed=1. As for retying, I recently added retries to `podman pull` tasks in an attempt to be more resilient to transient network issues. Example: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/config.yml#L129-L132 These changes will be in 4.5 and there is not a plan to backport to earlier releases. I was seeing occasional failures in CI for these tasks but it has not been a significant issue since 4.1 GA. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |