Description of problem: From an osp9 upgrade to osp10, after controller upgrade, running: . stackrc upgrade-non-controller.sh --upgrade overcloud-cephstorage-0 Fails Version-Release number of selected component (if applicable): How reproducible: Actual results: The osd cannot restart: 19:01:56 2016-09-07 17:01:55.040934 7f09f4389700 0 -- :/4004684784 >> 172.16.1.9:6789/0 pipe(0x7f09e40088e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f09e400d040).fault 19:01:59 2016-09-07 17:01:57.993955 7f09f7401700 0 monclient(hunting): authenticate timed out after 300 19:01:59 2016-09-07 17:01:57.994108 7f09f7401700 0 librados: client.admin authentication error (110) Connection timed out After verification all the mon are stopped on the 3 controllers node. Expected results: Additional info: On the controller the packages: ceph-base.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-mon ceph-common.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-mon ceph-mon.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-mon ceph-osd.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-osd ceph-selinux.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-mon libcephfs1.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-mon puppet-ceph.noarch 2.0.0-0.20160823145734.4e36628.1.el7ost @rhelosp-10.0-brew python-cephfs.x86_64 1:10.2.2-38.el7cp @rhelosp-10.0-ceph-2.0-mon The same on the cephstorage node.
Hi, the problem is during the controller-and-block-storage-upgrade, that is when using this template: environments/major-upgrade-pacemaker.yaml The ceph mon are not properly updated from ~0.9 to 2.0. This is the state of the mon: systemctl list-units --all *ceph* UNIT LOAD ACTIVE SUB DESCRIPTION ceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at once ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances at once ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 3 loaded units listed. To show all installed unit files use 'systemctl list-unit-files'. There is no ceph mon service started. This is a big upgrade and a lot has changed. We have to take it into account in the upgrade process.
Change to DFG:DF-Lifecycle to help with verification
verified with openstack-tripleo-heat-templates-5.1.0-3.el7ost.noarch On controller: ---------------- [root@controller-0 ~]# rpm -qa | grep ceph python-cephfs-10.2.2-41.el7cp.x86_64 ceph-osd-10.2.2-41.el7cp.x86_64 puppet-ceph-2.2.1-3.el7ost.noarch ceph-selinux-10.2.2-41.el7cp.x86_64 ceph-common-10.2.2-41.el7cp.x86_64 ceph-mon-10.2.2-41.el7cp.x86_64 libcephfs1-10.2.2-41.el7cp.x86_64 ceph-base-10.2.2-41.el7cp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html