Bug 1472409 - ceph: not all OSDs are up when ceph node is rebooted during major upgrade.
ceph: not all OSDs are up when ceph node is rebooted during major upgrade.
Status: ASSIGNED
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Disk (Show other bugs)
1.3.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 2.5
Assigned To: Loic Dachary
ceph-qe-bugs
: ZStream
Depends On:
Blocks: 1335596 1356451
  Show dependency treegraph
 
Reported: 2017-07-18 12:19 EDT by Alexander Chuzhoy
Modified: 2017-08-10 10:24 EDT (History)
24 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alexander Chuzhoy 2017-07-18 12:19:49 EDT
ceph: not all OSDs are up when ceph node is rebooted during major upgrade.

Environment:
python-cephfs-10.2.7-28.el7cp.x86_64
ceph-osd-10.2.7-28.el7cp.x86_64
ceph-common-10.2.7-28.el7cp.x86_64
ceph-selinux-10.2.7-28.el7cp.x86_64
puppet-ceph-2.3.0-5.el7ost.noarch
ceph-mon-10.2.7-28.el7cp.x86_64
libcephfs1-10.2.7-28.el7cp.x86_64
ceph-base-10.2.7-28.el7cp.x86_64
ceph-radosgw-10.2.7-28.el7cp.x86_64

openstack-tripleo-heat-templates-compat-2.0.0-41.el7ost.noarch
openstack-tripleo-heat-templates-5.2.0-21.el7ost.noarch
instack-undercloud-5.3.0-1.el7ost.noarch
openstack-puppet-modules-9.3.0-1.el7ost.noarch

Steps to reproduce:

1. Follow the procedure to upgrade OSP9 to OSP10 , reach the following stage:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Ceph

2. Reboot a ceph node and after reboot login to it and check ceph status.

Result:
[root@overcloud-cephstorage-1 ~]# ceph -s
    cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
     health HEALTH_WARN
            823 pgs degraded
            823 pgs stuck degraded
            823 pgs stuck unclean
            823 pgs stuck undersized
            823 pgs undersized
            recovery 6/57 objects degraded (10.526%)
            3/24 in osds are down
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
            election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e227: 24 osds: 21 up, 24 in; 823 remapped pgs
            flags noout,norebalance,require_jewel_osds
      pgmap v20481: 2240 pgs, 6 pools, 45659 kB data, 19 objects
            1341 MB used, 22331 GB / 22333 GB avail
            6/57 objects degraded (10.526%)
                1417 active+clean
                 823 active+undersized+degraded


[root@overcloud-cephstorage-1 ~]# systemctl|grep -i fail
● ceph-disk@dev-sdb2.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdb2
● ceph-disk@dev-sdb3.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdb3
● ceph-disk@dev-sdb4.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdb4
● ceph-disk@dev-sdc2.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdc2
● ceph-disk@dev-sdc4.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdc4
● ceph-disk@dev-sdd1.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdd1
● ceph-disk@dev-sde1.service                                                                         loaded failed failed    Ceph disk activation: /dev/sde1
● ceph-disk@dev-sdf1.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdf1
● ceph-disk@dev-sdh1.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdh1
● ceph-disk@dev-sdj1.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdj1
● ceph-disk@dev-sdk1.service                                                                         loaded failed failed    Ceph disk activation: /dev/sdk1
● ceph-osd@14.service                                                                                loaded failed failed    Ceph object storage daemon
● ceph-osd@17.service                                                                                loaded failed failed    Ceph object storage daemon
● ceph-osd@22.service                                                                                loaded failed failed    Ceph object storage daemon

[root@overcloud-cephstorage-1 ~]# journalctl -u ceph-disk@dev-sdb2.service
-- Logs begin at Mon 2017-07-17 17:08:21 UTC, end at Tue 2017-07-18 16:16:54 UTC. --
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org systemd[1]: Starting Ceph disk activation: /dev/sdb2...
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: main_trigger: main_trigger: Namespace(cluster='ceph', dev='/dev/sdb2', dmcrypt=None, dmcrypt_key_dir='/etc/ceph/dmcrypt-keys', fu
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: command: Running command: /usr/sbin/init --version
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: command_check_call: Running command: /usr/bin/chown ceph:ceph /dev/sdb2
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: command: Running command: /usr/sbin/blkid -o udev -p /dev/sdb2
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: command: Running command: /usr/sbin/blkid -o udev -p /dev/sdb2
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: main_trigger: trigger /dev/sdb2 parttype 45b0969e-9b03-4f30-b4c6-b4b80ceff106 uuid 461c3e2f-ccf0-43c8-9e2e-9d218ab2f66c
Jul 18 15:46:53 overcloud-cephstorage-1.fv1dci.org sh[1511]: command: Running command: /usr/sbin/ceph-disk --verbose activate-journal /dev/sdb2
Jul 18 15:48:53 overcloud-cephstorage-1.fv1dci.org systemd[1]: ceph-disk@dev-sdb2.service: main process exited, code=exited, status=124/n/a
Jul 18 15:48:53 overcloud-cephstorage-1.fv1dci.org systemd[1]: Failed to start Ceph disk activation: /dev/sdb2.
Jul 18 15:48:53 overcloud-cephstorage-1.fv1dci.org systemd[1]: Unit ceph-disk@dev-sdb2.service entered failed state.
Jul 18 15:48:53 overcloud-cephstorage-1.fv1dci.org systemd[1]: ceph-disk@dev-sdb2.service failed.




Workaround:
Running:
     systemctl start ceph-disk@dev-sdb2.service
     systemctl start ceph-disk@dev-sdb3.service
     systemctl start ceph-disk@dev-sdb4.service
     systemctl start ceph-disk@dev-sdc2.service
     systemctl start ceph-disk@dev-sdc4.service
     systemctl start ceph-disk@dev-sdd1.service
     systemctl start ceph-disk@dev-sde1.service
     systemctl start ceph-disk@dev-sdf1.service
     systemctl start ceph-disk@dev-sdj1.service
     systemctl start ceph-disk@dev-sdk1.service
     systemctl start ceph-disk@dev-sdh1.service

Resolved the situation:
[root@overcloud-cephstorage-1 ~]# ceph status
    cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
     health HEALTH_WARN
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
            election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e236: 24 osds: 24 up, 24 in
            flags noout,norebalance,require_jewel_osds
      pgmap v20518: 2240 pgs, 6 pools, 45659 kB data, 19 objects
            1353 MB used, 22331 GB / 22333 GB avail
                2240 active+clean
Comment 1 Alexander Chuzhoy 2017-07-18 12:39:15 EDT
The issue reproduced on all 3 ceph nodes. 

Exactly 3 osds were down after rebooting each:

3/24 in osds are down


[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -s
    cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
     health HEALTH_WARN
            808 pgs degraded
            808 pgs stuck degraded
            808 pgs stuck unclean
            808 pgs stuck undersized
            808 pgs undersized
            recovery 11/57 objects degraded (19.298%)
            3/24 in osds are down
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
            election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e201: 24 osds: 21 up, 24 in; 808 remapped pgs
            flags noout,norebalance,require_jewel_osds
      pgmap v20349: 2240 pgs, 6 pools, 45659 kB data, 19 objects
            1273 MB used, 22331 GB / 22333 GB avail
            11/57 objects degraded (19.298%)
                1432 active+clean
                 808 active+undersized+degraded


[root@overcloud-cephstorage-1 ~]# ceph -s
    cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
     health HEALTH_WARN
            823 pgs degraded
            823 pgs stuck degraded
            823 pgs stuck unclean
            823 pgs stuck undersized
            823 pgs undersized
            recovery 6/57 objects degraded (10.526%)
            3/24 in osds are down
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
            election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e227: 24 osds: 21 up, 24 in; 823 remapped pgs
            flags noout,norebalance,require_jewel_osds
      pgmap v20481: 2240 pgs, 6 pools, 45659 kB data, 19 objects
            1341 MB used, 22331 GB / 22333 GB avail
            6/57 objects degraded (10.526%)
                1417 active+clean
                 823 active+undersized+degraded




[heat-admin@overcloud-cephstorage-2 ~]$ sudo ceph status
    cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
     health HEALTH_WARN
            844 pgs degraded
            844 pgs stuck degraded
            844 pgs stuck unclean
            844 pgs stuck undersized
            844 pgs undersized
            recovery 10/57 objects degraded (17.544%)
            3/24 in osds are down
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
            election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e253: 24 osds: 21 up, 24 in; 844 remapped pgs
            flags noout,norebalance,require_jewel_osds
      pgmap v20615: 2240 pgs, 6 pools, 45659 kB data, 19 objects
            1361 MB used, 22331 GB / 22333 GB avail
            10/57 objects degraded (17.544%)
                1396 active+clean
                 844 active+undersized+degraded
Comment 3 Alexander Chuzhoy 2017-07-18 14:17:18 EDT
It could be that after a while the osds come up.
Comment 4 arkady kanevsky 2017-07-19 09:44:13 EDT
Sasha,
Are you proposing that we wait then check status, then run systemctl start ceph-disk on disks that are not up yet and then check results of these and then complete.
Comment 5 Alexander Chuzhoy 2017-07-20 18:20:02 EDT
Hi Arkady, 
I was hoping that osds come up if we wait longer (something I thought I saw on one machine), but trying to prove that part - I verified that they don't (waited for more than 1 hour):

[root@overcloud-cephstorage-0 ~]# uptime
 22:13:49 up  1:06,  1 user,  load average: 0.03, 0.03, 0.05


[root@overcloud-cephstorage-0 ~]# ceph -s
    cluster 9d071b3c-6d0d-11e7-91c2-525400141c5e
     health HEALTH_WARN
            612 pgs degraded
            612 pgs stuck degraded
            612 pgs stuck unclean
            612 pgs stuck undersized
            612 pgs undersized
            recovery 6/57 objects degraded (10.526%)
            2/24 in osds are down
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.123:6789/0,overcloud-controller-2=192.168.170.126:6789/0}
            election epoch 34, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e208: 24 osds: 22 up, 24 in; 612 remapped pgs
            flags noout,norebalance,require_jewel_osds
      pgmap v13321: 2368 pgs, 6 pools, 45659 kB data, 19 objects
            1313 MB used, 22331 GB / 22333 GB avail
            6/57 objects degraded (10.526%)
                1756 active+clean
                 612 active+undersized+degraded


So then I ran:
[root@overcloud-cephstorage-0 ~]# for i in `systemctl|awk '/ceph-disk/ {print $2}'`; do echo $i; systemctl start $i; done
ceph-disk@dev-sdb1.service
ceph-disk@dev-sdb2.service
ceph-disk@dev-sdb3.service
ceph-disk@dev-sdc1.service
ceph-disk@dev-sdc4.service
ceph-disk@dev-sdd1.service
ceph-disk@dev-sdf1.service
ceph-disk@dev-sdg1.service
ceph-disk@dev-sdh1.service
ceph-disk@dev-sdj1.service
ceph-disk@dev-sdk1.service



Checking the status again - all osds are up:
[root@overcloud-cephstorage-0 ~]# ceph -s
    cluster 9d071b3c-6d0d-11e7-91c2-525400141c5e
     health HEALTH_WARN
            65 pgs peering
            65 pgs stuck unclean
            noout,norebalance flag(s) set
     monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.123:6789/0,overcloud-controller-2=192.168.170.126:6789/0}
            election epoch 34, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e214: 24 osds: 24 up, 24 in
            flags noout,norebalance,require_jewel_osds
      pgmap v13335: 2368 pgs, 6 pools, 45659 kB data, 19 objects
            1320 MB used, 22331 GB / 22333 GB avail
                2303 active+clean
                  65 peering


So comment #3 can be disregarded.
Comment 6 seb 2017-08-02 10:56:30 EDT
Dup of: https://bugzilla.redhat.com/show_bug.cgi?id=1457231
Not a puppet-ceph bug.

Unfortunately, as Alfredo mentioned this is well-known.

This is taking care of in ceph-disk. So I suspect we can close this and leave this in Ceph itself.
Comment 9 Brett Niver 2017-08-09 11:36:14 EDT
Ian may have already tracked this down, but it appears to have been fixed in (not before) 2.3 per https://github.com/ceph/ceph/pull/12147/files.  @Loic, is there any plan to backport this into 1.3.X?
Comment 10 Loic Dachary 2017-08-10 10:24:51 EDT
I don't know that there are plans to do that.

Note You need to log in before you can comment on or make changes to this bug.