1563929 – Upgrade from 3.7 to 3.9 fail at the Task [Upgrade all storage]

Bug 1563929 - Upgrade from 3.7 to 3.9 fail at the Task [Upgrade all storage]

Summary: Upgrade from 3.7 to 3.9 fail at the Task [Upgrade all storage]

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Maciej Szulik
QA Contact:	Wang Haoran
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-05 05:16 UTC by Sudarshan Chaudhari
Modified:	2021-06-10 15:40 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-03 14:44:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3400381	0	None	None	None	2018-04-12 02:06:27 UTC

Description Sudarshan Chaudhari 2018-04-05 05:16:16 UTC

Description of problem:

Version-Release number of the following components:
:>rpm -qa | grep atomic
atomic-openshift-clients-3.9.14-1.git.0.4efa2ca.el7.x86_64
atomic-openshift-master-3.9.14-1.git.0.4efa2ca.el7.x86_64
atomic-openshift-utils-3.9.14-1.git.3.c62bc34.el7.noarch
atomic-openshift-3.9.14-1.git.0.4efa2ca.el7.x86_64
atomic-registries-1.20.1-9.git436cf5d.el7.x86_64
atomic-openshift-excluder-3.9.14-1.git.0.4efa2ca.el7.noarch
atomic-openshift-sdn-ovs-3.9.14-1.git.0.4efa2ca.el7.x86_64
atomic-openshift-docker-excluder-3.9.14-1.git.0.4efa2ca.el7.noarch
atomic-openshift-node-3.9.14-1.git.0.4efa2ca.el7.x86_64

Run the Automated In-Place Upgrade Playbook.
# ansible-playbook -i </path/to/inventory/file> /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml -vvv

Failing at with the error:

TASK [Upgrade all storage] ****************************************************************************************************************************************************************************
fatal: [mas-3-01.example.com]: FAILED! => {"changed": true, "cmd": ["oc", "adm", "--config=/etc/origin/master/admin.kubeconfig", "migrate", "storage", "--include=*", "--confirm"], "delta": "0:07:06.574833", "end": "2018-04-03 11:22:07.834827", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2018-04-03 11:15:01.259994", "stderr": "", "stderr_lines": [], "stdout": "E0403 11:22:04.100394 error:     -n oneid-rtp1 services/consul-ingress: Service \"consul-ingress\" is invalid: spec.ports[5]: Duplicate value: api.ServicePort{Name:\"\", Protocol:\"TCP\", Port:8500, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}\nsummary: total=37768 errors=1 ignored=0 unchanged=37767 migrated=0\ninfo: to rerun only failing resources, add --include=services\nerror: 1 resources failed to migrate", "stdout_lines": ["E0403 11:22:04.100394 error:     -n oneid-rtp1 services/consul-ingress: Service \"consul-ingress\" is invalid: spec.ports[5]: Duplicate value: api.ServicePort{Name:\"\", Protocol:\"TCP\", Port:8500, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}", "summary: total=37768 errors=1 ignored=0 unchanged=37767 migrated=0", "info: to rerun only failing resources, add --include=services", "error: 1 resources failed to migrate"]}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade_control_plane.retry


When manually ran the command: 
:>oc adm migrate storage --include=* --confirm


E0403 12:07:10.228771 error:     -n oneid-rtp1 services/consul-ingress: Service "consul-ingress" is invalid: spec.ports[5]: Duplicate value: api.ServicePort{Name:"", Protocol:"TCP", Port:8500, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:""}, NodePort:0}
summary: total=37769 errors=1 ignored=0 unchanged=37768 migrated=0
info: to rerun only failing resources, add --include=services
error: 1 resources failed to migrate


Expected results:
Then upgrade should be successfull

Comment 1 Scott Dodson 2018-04-05 12:51:22 UTC

Since the 3.7 to 3.9 upgrade is multipart, 3.7 to 3.8 to 3.9, can you check which version of openshift is currently running when this happens and/or provide the complete log which would show where in the upgrade phase this fails.

Assigning to master team to evaluate the nature of the failure.

Comment 13 Maciej Szulik 2018-04-17 08:08:22 UTC

I've created https://github.com/openshift/openshift-docs/pull/8767 which adds this issue to known issues. The fix is to manually edit the failed services removing the duplicate name+port pairs and re-run the storage.

Once the PR merges to docs, I'm going to close this issue as a won't fix.

Comment 14 openshift-github-bot 2018-05-01 13:39:35 UTC

Commits pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/be4ad40e866c425d1b36067017943cc219240578
Bug 1563929 - add a 3.9 upgrade known issues section

3.9 introduces tighter validation for Service objects. During storage
upgrade this might require administrators to invoke manual updates to the
erroneous objects.

https://github.com/openshift/openshift-docs/commit/1e5ceb5bb14a9ef36ec86c795546bfcdb225c0f5
Merge pull request #8781 from soltysh/bug1563929

Bug 1563929 - add a 3.9 upgrade known issues section

Note You need to log in before you can comment on or make changes to this bug.