Bug 1625191 - ceph-nfs runs rados commands on bare metal in containerized deployment
Summary: ceph-nfs runs rados commands on bare metal in containerized deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 3.1
Assignee: Sébastien Han
QA Contact: Tom Barron
URL:
Whiteboard:
Depends On:
Blocks: 1578730
TreeView+ depends on / blocked
 
Reported: 2018-09-04 11:01 UTC by Tom Barron
Modified: 2018-09-26 18:24 UTC (History)
10 users (show)

Fixed In Version: RHEL: ceph-ansible-3.1.2-1.el7cp Ubuntu: ceph-ansible_3.1.2-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-26 18:24:01 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 3088 0 None closed ceph-nfs: run rados cmds in container 2020-06-05 17:38:23 UTC
Red Hat Product Errata RHBA-2018:2819 0 None None None 2018-09-26 18:24:50 UTC

Description Tom Barron 2018-09-04 11:01:49 UTC
Description of problem: OSP13z2 overcloud-deploy fails with manila and cephfs-nfs with:

2018-08-22 17:56:08,820 p=2292 u=mistral |  fatal: [192.168.24.6]: FAILED! => {"changed": true, "cmd": "echo | rados -p manila_data --cluster ceph put ganesha-export-index -", "delta": "0:00:00.007298", "end":
+"2018-08-22 21:56:12.146865", "failed": true, "msg": "non-zero return code", "rc": 127, "start": "2018-08-22 21:56:12.139567", "stderr": "/bin/sh: rados: command not found", "stderr_lines": ["/bin/sh: rados:
+command not found"], "stdout": "", "stdout_lines": []}


Version-Release number of selected component (if applicable): OSP13z2 puddle 2018-08-22.2, ceph-ansible 3.1.0-0.1.rc10.el7cp


How reproducible: Deploy overcloud with cephfs-nfs back end for manila.


Steps to Reproduce:
1.  Deploy overcloud with cephfs-nfs back end for manila.  Instructions for doing this with Infrared are here [1].

Actual results:

Overcloud deploy fails with timeout:

2018-08-31 20:31:30Z [overcloud]: CREATE_FAILED  Timed out

 Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps:
  resource_type: OS::TripleO::PostDeploySteps
  physical_resource_id: 098548aa-d78b-4fd8-bbe3-4a308aa17737
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted (Task create from TemplateResource "AllNodesDeploySteps" Stack "overcloud" [fdc5863b-3be6-4237-892f-9263e4c2245b] Timed out)

/var/log/mistral/ceph-install-workflow.log shows ansible failure with:

2018-08-28 11:39:25,903 p=892 u=mistral |  Install Ceph NFS            : In Progress (0:00:30)

And the failure cited above where the rados command cannot be found.


Expected results:

Overcloud deployment will succeed.

Additional info:

Until recently this issue was masked because OSP included the ceph-common package on the overcloud controller hosts so that the rados command was found even though the command was being run on the host rather than in a container even in containerized deployments.

[1] https://docs.google.com/document/d/1XnF9C_3ux-tcLOu8F__Wmu7wphI1yoTBcoKGmskITHA/edit?usp=sharing

Comment 9 Tom Barron 2018-09-05 17:07:27 UTC
Did overcloud deploy with puddle 2018-08-22.2 and ceph-ansible-3.1.2-1-el7cp.

1/ The rados commands that failed before now succeed:

2018-09-05 08:27:52,552 p=1877 u=mistral |  TASK [ceph-nfs : set_fact docker_exec_cmd_nfs] *********************************
2018-09-05 08:27:52,552 p=1877 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml:2
2018-09-05 08:27:52,553 p=1877 u=mistral |  Wednesday 05 September 2018  08:27:52 -0400 (0:00:00.054)       0:06:22.112 ***
2018-09-05 08:27:52,652 p=1877 u=mistral |  ok: [192.168.24.11] => {"ansible_facts": {"docker_exec_cmd_nfs": "docker exec ceph-mon-controller-0"}, "changed": false, "failed": false}
2018-09-05 08:27:52,687 p=1877 u=mistral |  TASK [ceph-nfs : check if rados index object exists] ***************************
2018-09-05 08:27:52,687 p=1877 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml:8
2018-09-05 08:27:52,688 p=1877 u=mistral |  Wednesday 05 September 2018  08:27:52 -0400 (0:00:00.134)       0:06:22.247 ***
2018-09-05 08:27:53,312 p=1877 u=mistral |  ok: [192.168.24.11 -> 192.168.24.11] => {"changed": false, "cmd": "docker exec ceph-mon-controller-0 rados -p manila_data --cluster ceph ls|grep ganesha-export-index", "delta": "0:00:00.278897", "end": "2018-09-05 12:27:53.284990", "failed": false, "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2018-09-05 12:27:53.006093", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2018-09-05 08:27:53,349 p=1877 u=mistral |  TASK [ceph-nfs : create an empty rados index object] ***************************
2018-09-05 08:27:53,349 p=1877 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml:19
2018-09-05 08:27:53,349 p=1877 u=mistral |  Wednesday 05 September 2018  08:27:53 -0400 (0:00:00.661)       0:06:22.908 ***
2018-09-05 08:27:53,916 p=1877 u=mistral |  changed: [192.168.24.11 -> 192.168.24.11] => {"changed": true, "cmd": "docker exec ceph-mon-controller-0 rados -p manila_data --cluster ceph put ganesha-export-index /dev/null", "delta": "0:00:00.201583", "end": "2018-09-05 12:27:53.884237", "failed": false, "rc": 0, "start": "2018-09-05 12:27:53.682654", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}


2/ ceph-install-workflow.log shows normal ceph-ansible completion:

2018-09-05 08:28:59,877 p=1877 u=mistral |  INSTALLER STATUS ***************************************************************
2018-09-05 08:28:59,882 p=1877 u=mistral |  Install Ceph Monitor        : Complete (0:01:26)
2018-09-05 08:28:59,882 p=1877 u=mistral |  Install Ceph Manager        : Complete (0:00:33)
2018-09-05 08:28:59,883 p=1877 u=mistral |  Install Ceph OSD            : Complete (0:02:39)
2018-09-05 08:28:59,883 p=1877 u=mistral |  Install Ceph MDS            : Complete (0:00:56)
2018-09-05 08:28:59,884 p=1877 u=mistral |  Install Ceph NFS            : Complete (0:00:37)
2018-09-05 08:28:59,884 p=1877 u=mistral |  Install Ceph Client         : Complete (0:00:59)

3/ overcloud deploy completed successfully

4/ ganesha started up correctly:

[root@controller-0 ~]# docker exec -it ceph-nfs-pacemaker cat /var/log/ganesha/ganesha.log
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!!
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+eip
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] rados_kv_init :CLIENT ID :EVENT :Rados kv store init done
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0)
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nsm_connect :NLM :CRIT :failed to connect to statd
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nsm_unmonitor_all :NLM :CRIT :Can not unmonitor all clnt_create returned NULL
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
05/09/2018 12:55:33 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
05/09/2018 12:57:03 : epoch 5b8fd245 : controller-0 : ganesha.nfsd-137[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE

Comment 12 errata-xmlrpc 2018-09-26 18:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819


Note You need to log in before you can comment on or make changes to this bug.