Bug 1885558 - ceph-osd services do not come up after ‘upgrade run’ step as id_to_device function in ceph-osd-run.sh uses docker command
Summary: ceph-osd services do not come up after ‘upgrade run’ step as id_to_device fun...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: z6
: 3.3
Assignee: Guillaume Abrioux
QA Contact: Ameena Suhani S H
URL:
Whiteboard:
: 1885528 1891405 (view as bug list)
Depends On:
Blocks: 1578730
TreeView+ depends on / blocked
 
Reported: 2020-10-06 12:09 UTC by Jaison Raju
Modified: 2020-11-23 16:44 UTC (History)
15 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.52-1.el7cp Ubuntu: ceph-ansible_3.2.52-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-05 19:41:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5918 0 None closed ceph-osd: add missing container_binary 2021-01-30 12:18:37 UTC
Github ceph ceph-ansible pull 5923 0 None closed osd: add missing param to the container cli calls 2021-01-30 12:18:37 UTC
Red Hat Product Errata RHBA-2020:4963 0 None None None 2020-11-05 19:42:01 UTC

Description Jaison Raju 2020-10-06 12:09:05 UTC
Description of problem:
ceph-osd services do not come up after ‘upgrade run’ step as id_to_device function in  ceph-osd-run.sh uses docker command

This is due to the fact that although ceph-osd-run.sh used podman commands to start the containers, it uses docker command for id_to_device function. For this function to work, I had to install podman-docker via UpgradeInitCommand. 


There is another issue noticed at this point where ceph-osd-run.sh where docker command used does not use ‘--network host’ which leads to another error thrown by docker:
"Error: error configuring network namespace for container xxx Missing CNI default network".
After ‘upgrade run’ adding  ‘--network host’  to docker command in the script, immediately brought the osd services up.


Version-Release number of selected component (if applicable):
RHOS13z12
RHCEPH-3.3-RHEL-7-20200819.ci.0 / 3-46


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
ceph-osd services do not come up automatically after ‘upgrade run'

Expected results:
ceph-osd services come up automatically after ‘upgrade run'

Additional info:
http://perf1.perf.lab.eng.bos.redhat.com/pub/jaison/upgrades-osp13-16.1.1/osp16/backup/stack/sosreports/after-upgrade/ceph/sosreport-ceph-0-2020-09-26-uusfrep/var/log/messages
The ceph-0 Leapp and upgrade commands were executed between Sept 25 	16:28-17:13 .

The last occurence of 2nd error is seen below. After this event the ceph-osd-run.sh script's docker command was modified to use '--network host'

Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110618]: Error: error configuring network namespace for container 5657b3f3c6ec70d5f5703051c4c966aef63d50214318ecca93092723a4d60573: Missing CNI default network
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110618]: No data partition found for OSD
Sep 25 17:43:23 ceph-0 systemd[1]: ceph-osd: Main process exited, code=exited, status=1/FAILURE
Sep 25 17:43:23 ceph-0 systemd[1]: ceph-osd: Failed with result 'exit-code'.
Sep 25 17:43:23 ceph-0 podman[110626]: 2020-09-25 17:43:23.525956146 +0000 UTC m=+0.188836317 container remove 7ade979e7291abf514166311196ac4e46b2d4c8c90dd3f7f4c1ab5be2c10eba8 (image=f05-h21-000-1029p.ctlplane.localdomain:8787/rh-osbs/rhceph:3-46, name=quirky_archimedes)
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110624]: Error: error configuring network namespace for container 7ade979e7291abf514166311196ac4e46b2d4c8c90dd3f7f4c1ab5be2c10eba8: Missing CNI default network
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110624]: No data partition found for OSD
Sep 25 17:43:23 ceph-0 systemd[1]: ceph-osd: Main process exited, code=exited, status=1/FAILURE
Sep 25 17:43:23 ceph-0 systemd[1]: ceph-osd: Failed with result 'exit-code'.
Sep 25 17:43:23 ceph-0 podman[110679]: 2020-09-25 17:43:23.56119784 +0000 UTC m=+0.177467682 container remove 28e5a3871457e4a3159344bc8c291b61e898491491f949edf206f4f2fe2b80ed (image=f05-h21-000-1029p.ctlplane.localdomain:8787/rh-osbs/rhceph:3-46, name=naughty_davinci)
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110677]: Error: error configuring network namespace for container 28e5a3871457e4a3159344bc8c291b61e898491491f949edf206f4f2fe2b80ed: Missing CNI default network
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110677]: No data partition found for OSD
Sep 25 17:43:23 ceph-0 systemd[1]: ceph-osd: Main process exited, code=exited, status=1/FAILURE
Sep 25 17:43:23 ceph-0 systemd[1]: ceph-osd: Failed with result 'exit-code'.
Sep 25 17:43:23 ceph-0 podman[110745]: 2020-09-25 17:43:23.587414431 +0000 UTC m=+0.143135539 container create d7a7ff2179c005d1a012a7e44fda1475f03da5dacaa9122d35c9bc9de526042b (image=f05-h21-000-1029p.ctlplane.localdomain:8787/rh-osbs/rhceph:3-46, name=affectionate_mendeleev)
Sep 25 17:43:23 ceph-0 podman[110745]: 2020-09-25 17:43:23.641641975 +0000 UTC m=+0.197362954 container remove d7a7ff2179c005d1a012a7e44fda1475f03da5dacaa9122d35c9bc9de526042b (image=f05-h21-000-1029p.ctlplane.localdomain:8787/rh-osbs/rhceph:3-46, name=affectionate_mendeleev)
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110743]: Error: error configuring network namespace for container d7a7ff2179c005d1a012a7e44fda1475f03da5dacaa9122d35c9bc9de526042b: Missing CNI default network
Sep 25 17:43:23 ceph-0 ceph-osd-run.sh[110743]: No data partition found for OSD

Comment 1 RHEL Program Management 2020-10-06 13:26:50 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 8 Yogev Rabl 2020-11-02 14:22:54 UTC
verified

Comment 9 Francesco Pantano 2020-11-03 13:51:07 UTC
*** Bug 1891405 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2020-11-05 19:41:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 3.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4963

Comment 12 Francesco Pantano 2020-11-23 16:44:40 UTC
*** Bug 1885528 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.