Description of problem: Manila share create is failed with "Failed to schedule create_share: No valid host was found ..." It happens on all OSP version while using ceph-nfs-ganesha as backend with tls-everywhere. Version-Release number of selected component (if applicable): puppet-manila-15.4.1-0.20191203205319.30b9a07.el8ost.noarch python3-manilaclient-1.29.0-0.20190923115452.1b2cafb.el8ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP16 with ceph nfs ganesha as backend with tls-everywhere. 2. Try to create Manila share. Actual results: Manila share should be created successfully. Expected results: Manila share create is failed with "Failed to schedule create_share: No valid host was found ..." Additional info: From manila-scheduler.log ========================= ERROR manila.scheduler.manager [req-22ce47af-5685-4c7d-b9f4-b3b5372a6140 1e7f72b7c8b94e2685c57d99b89a49ba 8020d378fc074343a864035e87867d42 - - -] Failed to schedule create_share: No valid host was found. Failed to find a weighted host, the last executed filter was AvailabilityZoneFilter.: manila.exception.NoValidHost: No valid host was found. Failed to find a weighted host, the last executed filter was AvailabilityZoneFilter.
Created attachment 1645891 [details] Manila_logs
Liron, manila-share service has exited. It depends on ceph-nfs service. Is that runninng? What does 'pcs status' show? Also, your report says "It happens on all OSP version while using ceph-nfs-ganesha as backend with tls-everywhere." If this also happens on other releases than OSP16 (candidate), please raise separate BZs with logs for those deployments.
(In reply to Tom Barron from comment #2) > Liron, manila-share service has exited. It depends on ceph-nfs service. Is > that runninng? What does 'pcs status' show? ceph-nfs service is down. [root@controller-0 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-2 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum Last updated: Tue Dec 17 19:35:43 2019 Last change: Tue Dec 17 13:43:31 2019 by root via cibadmin on controller-0 15 nodes configured 51 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-1 galera-bundle-1@controller-2 galera-bundle-2@controller-0 ovn-dbs-bundle-0@controller-1 ovn-dbs-bundle-1@controller-2 ovn-dbs-bundle-2@controller-0 rabbitmq-bundle-0@controller-1 rabbitmq-bundle-1@controller-2 rabbitmq-bundle-2@controller-0 redis-bundle-0@controller-1 redis-bundle-1@controller-2 redis-bundle-2@controller-0 ] Full list of resources: ip-172.17.5.126 (ocf::heartbeat:IPaddr2): Stopped Container bundle set: galera-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-1 galera-bundle-1 (ocf::heartbeat:galera): Master controller-2 galera-bundle-2 (ocf::heartbeat:galera): Master controller-0 Container bundle set: rabbitmq-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 Container bundle set: redis-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-1 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-2 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-0 ip-192.168.24.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.102 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.1.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.3.101 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.4.101 (ocf::heartbeat:IPaddr2): Started controller-0 Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-haproxy:pcmklatest] haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-1 haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-2 haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-0 Container bundle set: ovn-dbs-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-1 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-2 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-0 ip-172.17.1.134 (ocf::heartbeat:IPaddr2): Started controller-1 ceph-nfs (systemd:ceph-nfs@pacemaker): Stopped Container bundle: openstack-cinder-backup [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:pcmklatest] openstack-cinder-backup-podman-0 (ocf::heartbeat:podman): Started controller-0 Container bundle: openstack-cinder-volume [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-volume:pcmklatest] openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-2 Container bundle: openstack-manila-share [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-manila-share:pcmklatest] openstack-manila-share-podman-0 (ocf::heartbeat:podman): Stopped Failed Resource Actions: * ceph-nfs_start_0 on controller-1 'unknown error' (1): call=144, status=complete, exitreason='', last-rc-change='Tue Dec 17 13:39:59 2019', queued=0ms, exec=2328ms * ceph-nfs_start_0 on controller-2 'unknown error' (1): call=142, status=complete, exitreason='', last-rc-change='Tue Dec 17 13:40:04 2019', queued=0ms, exec=2351ms * ceph-nfs_start_0 on controller-0 'unknown error' (1): call=142, status=complete, exitreason='', last-rc-change='Tue Dec 17 13:39:45 2019', queued=0ms, exec=2350ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled > Also, your report says "It happens on all OSP version while using > ceph-nfs-ganesha as backend with tls-everywhere." If this also happens on > other releases than OSP16 (candidate), please raise separate BZs with logs > for those deployments. Will do.
Thanks, Liron. This is likely the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1784562, which is due to a recent change in the ceph-container image.
*** This bug has been marked as a duplicate of bug 1784562 ***