Description of problem: ceph-nfs pacemaker service fails to start after deployment. The error after tripleo deployment is: pcs status ceph-nfs (systemd:ceph-nfs@pacemaker): Started controller-1 Failed Actions: * ceph-nfs_monitor_60000 on controller-1 'not running' (7): call=359, status=complete, exitreason='', last-rc-change='Thu Apr 11 02:59:54 2019', queued=0ms, exec=0ms # pcs resource show ceph-nfs Resource: ceph-nfs (class=systemd type=ceph-nfs@pacemaker) Operations: monitor interval=60 timeout=100 (ceph-nfs-monitor-interval-60) start interval=0s timeout=200s (ceph-nfs-start-interval-0s) stop interval=0s timeout=200s (ceph-nfs-stop-interval-0s) # systemctl status ceph-nfs@pacemaker Process: 672006 ExecStart=/usr/bin/docker run --rm --net=host -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/lib/nfs/ganesha:/var/lib/nfs/ganesha:z -v /etc/ganesha:/etc/ganesha:z -v /var/run/ceph:/var/run/ceph:z --privileged -v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket -v /etc/localtime:/etc/localtime:ro -e CLUSTER=ceph -e CEPH_DAEMON=NFS --name=ceph-nfs-pacemaker 172.16.0.1:8787/rhceph/rhceph-3-rhel7:3-23 (code=exited, status=255) Apr 11 03:02:10 controller-1.redhat.local docker[672710]: Error response from daemon: No such container: ceph-nfs-pacemaker If I run the docker command shown in systemctl manually, I get below error 2019-04-11 03:05:36 /entrypoint.sh: static: does not generate config 2019-04-11 03:05:37 /entrypoint.sh: SUCCESS exec: PID 149: spawning /usr/bin/ganesha.nfsd -F -L STDOUT exec: Waiting 149 to quit 11/04/2019 03:05:37 : epoch 5caeaf01 : controller-1.redhat.local : ganesha.nfsd-149[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.7.1 11/04/2019 03:05:37 : epoch 5caeaf01 : controller-1.redhat.local : ganesha.nfsd-149[main] nfs_set_param_from_conf :NFS STARTUP :CRIT :Error while parsing core configuration 11/04/2019 03:05:37 : epoch 5caeaf01 : controller-1.redhat.local : ganesha.nfsd-149[main] main :NFS STARTUP :CRIT :Error setting parameters from configuration file. 11/04/2019 03:05:37 : epoch 5caeaf01 : controller-1.redhat.local : ganesha.nfsd-149[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:6): Expected an IP address, got a option name or number 11/04/2019 03:05:37 : epoch 5caeaf01 : controller-1.redhat.local : ganesha.nfsd-149[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:39): 1 (invalid param value) errors found block NFS_Core_Param 11/04/2019 03:05:37 : epoch 5caeaf01 : controller-1.redhat.local : ganesha.nfsd-149[main] main :NFS STARTUP :FATAL :Fatal errors. Server exiting... teardown: managing teardown after SIGCHLD teardown: Waiting PID 149 to terminate teardown: Process 149 is terminated teardown: Bye Bye, container will die with return code -1 teardown: if you don't want me to die and have access to a shell to debug this situation, next time run me with '-e DEBUG=stayalive' From /etc/ganesha/ganesha.conf NFS_Core_Param { Bind_Addr=overcloud.storagenfs.localdomain; } # grep storagenfs /etc/hosts 172.16.202.101 overcloud.storagenfs.localdomain Should this be an ip? pcs status: ip-172.16.202.101 (ocf::heartbeat:IPaddr2): Started controller-1 Version-Release number of selected component (if applicable): I changed Bind_Addr to the ip address and it can now be started by pacemaker and no errors. So it shows fqdn is used as Bind_Addr in ganesha.conf instead of IP. This need to be fixed to use IP even when ssl everywhere is used. Templates at https://gitlab.cee.redhat.com/sputhenp/openstack/tree/master/basic/templates Deploy command at: https://gitlab.cee.redhat.com/sputhenp/openstack/blob/master/basic/templates/overcloud-deploy-tls.sh How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Thanks, confirming this is a bug, since the BindAddr must be a valid IPv4 or IPv6 address [1], and shouldn't be a hostname/fqdn as configured. [1] https://github.com/nfs-ganesha/nfs-ganesha/blob/af26bf4/src/config_samples/config.txt#L43
after deploying OSP 13z7 build I see the following info about ceph-nfs-pacemaker ceph-nfs (systemd:ceph-nfs@pacemaker): Started controller-0 full output of command 2019-06-20.1 [heat-admin@controller-0 ~]$ sudo pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-2 (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Sun Jun 23 13:04:03 2019 Last change: Fri Jun 21 03:38:12 2019 by root via cibadmin on controller-0 12 nodes configured 40 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-1 galera-bundle-1@controller-2 galera-bundle-2@controller-0 rabbitmq-bundle-0@controller-1 rabbitmq-bundle-1@controller-2 rabbitmq-bundle-2@controller-0 redis-bundle-0@controller-1 redis-bundle-1@controller-2 redis-bundle-2@controller-0 ] Full list of resources: ip-172.17.5.13 (ocf::heartbeat:IPaddr2): Started controller-0 Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 Docker container set: galera-bundle [192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-1 galera-bundle-1 (ocf::heartbeat:galera): Master controller-2 galera-bundle-2 (ocf::heartbeat:galera): Master controller-0 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-1 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-2 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-0 ip-192.168.24.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.102 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.1.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.3.101 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.4.101 (ocf::heartbeat:IPaddr2): Started controller-0 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started controller-1 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started controller-2 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started controller-0 ceph-nfs (systemd:ceph-nfs@pacemaker): Started controller-0 Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started controller-1 Docker container: openstack-manila-share [192.168.24.1:8787/rhosp13/openstack-manila-share:pcmklatest] openstack-manila-share-docker-0 (ocf::heartbeat:docker): Started controller-0 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1738