Description of problem: gluster-blockd will not start automatically after ocp node hosting cns pods is rebooted Version-Release number of selected component (if applicable): ocp packages : tuned-profiles-atomic-2.8.0-5.el7.noarch atomic-openshift-pod-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-node-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-utils-3.7.0-0.127.0.git.0.b9941e4.el7.noarch atomic-openshift-docker-excluder-3.7.0-0.127.0.git.0.459b70b.el7.noarch atomic-openshift-tests-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-dockerregistry-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-clients-redistributable-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-registries-1.19.1-3.gitb39a783.el7.x86_64 atomic-openshift-clients-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 tuned-profiles-atomic-openshift-node-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-sdn-ovs-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 atomic-openshift-excluder-3.7.0-0.127.0.git.0.459b70b.el7.noarch cns / heketi packages: cns-deploy-5.0.0-50.el7rhgs.x86_64 python-heketi-5.0.0-15.el7rhgs.x86_64 heketi-client-5.0.0-15.el7rhgs.x86_64 heketi-5.0.0-15.el7rhgs.x86_64 gluster packages: glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64 glusterfs-fuse-3.8.4-44.el7rhgs.x86_64 glusterfs-server-3.8.4-44.el7rhgs.x86_64 gluster-block-0.2.1-14.el7rhgs.x86_64 glusterfs-libs-3.8.4-44.el7rhgs.x86_64 glusterfs-3.8.4-44.el7rhgs.x86_64 glusterfs-api-3.8.4-44.el7rhgs.x86_64 glusterfs-cli-3.8.4-44.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64 targetcli-2.1.fb46-1.el7.noarch gluster-block-0.2.1-14.el7rhgs.x86_64 tcmu-runner-1.2.0-15.el7rhgs.x86_64 Images: deploy-heketi-template.yaml: image: rhgs3/rhgs-volmanager-rhel7:3.3.0-21 glusterblock-provisioner.yaml: image: rhgs3/rhgs-gluster-block-prov-docker:3.3.0-15 glusterfs-template.yaml: - image: rhgs3/rhgs-server-rhel7:3.3.0-27 gluster-s3-template.yaml: image: rhgs3/rhgs-s3-server-rhel7:3.3.0-11 heketi-template.yaml: image: rhgs3/rhgs-volmanager-rhel7:3.3.0-21 How reproducible: always Steps to Reproduce: 1. configure cns on top of ocp 2. reboot ocp node hosting cns pod 3. check are started manually gluster-blockd tcmu-runner gluster-block-target Actual results: service gluster-block-target is not starting Expected results: service gluster-block-target is not starting to start on node reboot Additional info: on node modules : dm_snapshot dm_mirror dm_thin_pool dm_multipath target_core_user are loaded Service rpcbind.service is enabled to start on boot Not started service gluster-block-target inside cns pod leads that cns cluster block device feature is not functional Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Can you verify that the kernel modules are loaded after reboot, e.g. they're specified in a file under /etc/modprobe.d/?
yep, there were loaded , I had $ cat /etc/modules-load.d/cns.conf dm_snapshot dm_mirror dm_thin_pool dm_multipath target_core_user and systemctl enabled / start systemd-modules-load modules loaded on node after reboot were as below # lsmod |grep target target_core_user 23936 0 target_core_mod 367918 1 target_core_user crc_t10dif 12714 1 target_core_mod uio 19259 1 target_core_user root@172: ~ # lsmod |grep dm dm_multipath 27427 0 dm_thin_pool 65968 4 dm_persistent_data 74708 1 dm_thin_pool dm_bio_prison 18209 1 dm_thin_pool dm_snapshot 39100 0 dm_bufio 27972 2 dm_persistent_data,dm_snapshot libcrc32c 12644 5 xfs,dm_persistent_data,openvswitch,nf_nat,nf_conntrack dm_mirror 22124 0 dm_region_hash 20813 1 dm_mirror dm_log 18411 2 dm_region_hash,dm_mirror dm_mod 123303 33 dm_multipath,dm_log,dm_persistent_data,dm_mirror,dm_bufio,d
Elvir, I think the rpcbind service isn't started on the node. Please check with systemctl status. I realised this very recently that systemctl enable rpcbind.service is not sufficient to make rpcbind start on node reboot. We have documented the right steps in https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#idm140088372947408
It turned out that rpcbind was not starting properly, Thanks Talur for pointing systemctl add-wants multi-user rpcbind.service which made it starting always. This can be closed, if all is happy with that!
Thanks for quick confirmation Elvir! I will close this bug as notabug.