Description of problem: OSP13z2 overcloud-deploy fails with manila and cephfs-nfs with: 2018-08-22 17:56:08,820 p=2292 u=mistral | fatal: [192.168.24.6]: FAILED! => {"changed": true, "cmd": "echo | rados -p manila_data --cluster ceph put ganesha-export-index -", "delta": "0:00:00.007298", "end": +"2018-08-22 21:56:12.146865", "failed": true, "msg": "non-zero return code", "rc": 127, "start": "2018-08-22 21:56:12.139567", "stderr": "/bin/sh: rados: command not found", "stderr_lines": ["/bin/sh: rados: +command not found"], "stdout": "", "stdout_lines": []} Version-Release number of selected component (if applicable): OSP13z2 puddle 2018-08-22.2, ceph-ansible 3.1.0-0.1.rc10.el7cp How reproducible: Deploy overcloud with cephfs-nfs back end for manila. Steps to Reproduce: 1. Deploy overcloud with cephfs-nfs back end for manila. Instructions for doing this with Infrared are here [1]. Actual results: Overcloud deploy fails with timeout: 2018-08-31 20:31:30Z [overcloud]: CREATE_FAILED Timed out Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps: resource_type: OS::TripleO::PostDeploySteps physical_resource_id: 098548aa-d78b-4fd8-bbe3-4a308aa17737 status: CREATE_FAILED status_reason: | CREATE aborted (Task create from TemplateResource "AllNodesDeploySteps" Stack "overcloud" [fdc5863b-3be6-4237-892f-9263e4c2245b] Timed out) /var/log/mistral/ceph-install-workflow.log shows ansible failure with: 2018-08-28 11:39:25,903 p=892 u=mistral | Install Ceph NFS : In Progress (0:00:30) And the failure cited above where the rados command cannot be found. Expected results: Overcloud deployment will succeed. Additional info: Until recently this issue was masked because OSP included the ceph-common package on the overcloud controller hosts so that the rados command was found even though the command was being run on the host rather than in a container even in containerized deployments. [1] https://docs.google.com/document/d/1XnF9C_3ux-tcLOu8F__Wmu7wphI1yoTBcoKGmskITHA/edit?usp=sharing
Did overcloud deploy with puddle 2018-08-22.2 and ceph-ansible-3.1.2-1-el7cp. 1/ The rados commands that failed before now succeed: 2018-09-05 08:27:52,552 p=1877 u=mistral | TASK [ceph-nfs : set_fact docker_exec_cmd_nfs] ********************************* 2018-09-05 08:27:52,552 p=1877 u=mistral | task path: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml:2 2018-09-05 08:27:52,553 p=1877 u=mistral | Wednesday 05 September 2018 08:27:52 -0400 (0:00:00.054) 0:06:22.112 *** 2018-09-05 08:27:52,652 p=1877 u=mistral | ok: [192.168.24.11] => {"ansible_facts": {"docker_exec_cmd_nfs": "docker exec ceph-mon-controller-0"}, "changed": false, "failed": false} 2018-09-05 08:27:52,687 p=1877 u=mistral | TASK [ceph-nfs : check if rados index object exists] *************************** 2018-09-05 08:27:52,687 p=1877 u=mistral | task path: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml:8 2018-09-05 08:27:52,688 p=1877 u=mistral | Wednesday 05 September 2018 08:27:52 -0400 (0:00:00.134) 0:06:22.247 *** 2018-09-05 08:27:53,312 p=1877 u=mistral | ok: [192.168.24.11 -> 192.168.24.11] => {"changed": false, "cmd": "docker exec ceph-mon-controller-0 rados -p manila_data --cluster ceph ls|grep ganesha-export-index", "delta": "0:00:00.278897", "end": "2018-09-05 12:27:53.284990", "failed": false, "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2018-09-05 12:27:53.006093", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} 2018-09-05 08:27:53,349 p=1877 u=mistral | TASK [ceph-nfs : create an empty rados index object] *************************** 2018-09-05 08:27:53,349 p=1877 u=mistral | task path: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml:19 2018-09-05 08:27:53,349 p=1877 u=mistral | Wednesday 05 September 2018 08:27:53 -0400 (0:00:00.661) 0:06:22.908 *** 2018-09-05 08:27:53,916 p=1877 u=mistral | changed: [192.168.24.11 -> 192.168.24.11] => {"changed": true, "cmd": "docker exec ceph-mon-controller-0 rados -p manila_data --cluster ceph put ganesha-export-index /dev/null", "delta": "0:00:00.201583", "end": "2018-09-05 12:27:53.884237", "failed": false, "rc": 0, "start": "2018-09-05 12:27:53.682654", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} 2/ ceph-install-workflow.log shows normal ceph-ansible completion: 2018-09-05 08:28:59,877 p=1877 u=mistral | INSTALLER STATUS *************************************************************** 2018-09-05 08:28:59,882 p=1877 u=mistral | Install Ceph Monitor : Complete (0:01:26) 2018-09-05 08:28:59,882 p=1877 u=mistral | Install Ceph Manager : Complete (0:00:33) 2018-09-05 08:28:59,883 p=1877 u=mistral | Install Ceph OSD : Complete (0:02:39) 2018-09-05 08:28:59,883 p=1877 u=mistral | Install Ceph MDS : Complete (0:00:56) 2018-09-05 08:28:59,884 p=1877 u=mistral | Install Ceph NFS : Complete (0:00:37) 2018-09-05 08:28:59,884 p=1877 u=mistral | Install Ceph Client : Complete (0:00:59) 3/ overcloud deploy completed successfully 4/ ganesha started up correctly: [root@controller-0 ~]# docker exec -it ceph-nfs-pacemaker cat /var/log/ganesha/ganesha.log 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+eip 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] rados_kv_init :CLIENT ID :EVENT :Rados kv store init done 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0) 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nsm_connect :NLM :CRIT :failed to connect to statd 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nsm_unmonitor_all :NLM :CRIT :Can not unmonitor all clnt_create returned NULL 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 05/09/2018 12:57:03 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819