Bug 1495466

Summary: gluster-blockd inside cns pod not started after node reboot
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Elvir Kuric <ekuric>
Component: CNS-deploymentAssignee: Michael Adam <madam>
Status: CLOSED NOTABUG QA Contact: Prasanth <pprakash>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: akhakhar, annair, aos-bugs, aos-storage-staff, ekuric, eparis, hchiramm, jarrpa, madam, mliyazud, mzywusko, pprakash, rhs-bugs, rreddy, rtalur, tparsons
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: aos-scalability-37
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-26 15:34:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elvir Kuric 2017-09-26 06:42:35 UTC
Description of problem:

gluster-blockd will not start automatically after ocp node hosting cns pods is rebooted 


Version-Release number of selected component (if applicable):

ocp packages : 

tuned-profiles-atomic-2.8.0-5.el7.noarch
atomic-openshift-pod-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-node-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-utils-3.7.0-0.127.0.git.0.b9941e4.el7.noarch
atomic-openshift-docker-excluder-3.7.0-0.127.0.git.0.459b70b.el7.noarch
atomic-openshift-tests-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-dockerregistry-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-clients-redistributable-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-registries-1.19.1-3.gitb39a783.el7.x86_64
atomic-openshift-clients-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
tuned-profiles-atomic-openshift-node-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-sdn-ovs-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-excluder-3.7.0-0.127.0.git.0.459b70b.el7.noarch


cns / heketi packages: 

cns-deploy-5.0.0-50.el7rhgs.x86_64
python-heketi-5.0.0-15.el7rhgs.x86_64
heketi-client-5.0.0-15.el7rhgs.x86_64
heketi-5.0.0-15.el7rhgs.x86_64

gluster packages: 

glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
gluster-block-0.2.1-14.el7rhgs.x86_64
glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
targetcli-2.1.fb46-1.el7.noarch
gluster-block-0.2.1-14.el7rhgs.x86_64
tcmu-runner-1.2.0-15.el7rhgs.x86_64

Images: 
deploy-heketi-template.yaml:          image: rhgs3/rhgs-volmanager-rhel7:3.3.0-21
glusterblock-provisioner.yaml:          image: rhgs3/rhgs-gluster-block-prov-docker:3.3.0-15
glusterfs-template.yaml:        - image: rhgs3/rhgs-server-rhel7:3.3.0-27
gluster-s3-template.yaml:          image: rhgs3/rhgs-s3-server-rhel7:3.3.0-11
heketi-template.yaml:          image: rhgs3/rhgs-volmanager-rhel7:3.3.0-21


How reproducible:

always 


Steps to Reproduce:
1. configure cns on top of ocp
2. reboot ocp node hosting cns pod
3. check are started manually gluster-blockd tcmu-runner gluster-block-target 

Actual results:

service gluster-block-target is not starting 

Expected results:
service gluster-block-target is not starting to start on node reboot 

Additional info:

on node modules :
dm_snapshot 
dm_mirror 
dm_thin_pool 
dm_multipath 
target_core_user

are loaded

Service rpcbind.service is enabled to start on boot

Not started service gluster-block-target inside cns pod leads that cns cluster block device feature is not functional 






Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 2 Jose A. Rivera 2017-09-26 13:47:28 UTC
Can you verify that the kernel modules are loaded after reboot, e.g. they're specified in a file under /etc/modprobe.d/?

Comment 3 Elvir Kuric 2017-09-26 14:13:32 UTC
yep, there were loaded , I had  

$ cat /etc/modules-load.d/cns.conf 

dm_snapshot 
dm_mirror 
dm_thin_pool 
dm_multipath 
target_core_user

and 

systemctl enabled / start  systemd-modules-load

modules loaded on node after reboot were as below 

#  lsmod  |grep target 
target_core_user       23936  0 
target_core_mod       367918  1 target_core_user
crc_t10dif             12714  1 target_core_mod
uio                    19259  1 target_core_user



root@172: ~ # lsmod  |grep dm
dm_multipath           27427  0 
dm_thin_pool           65968  4 
dm_persistent_data     74708  1 dm_thin_pool
dm_bio_prison          18209  1 dm_thin_pool
dm_snapshot            39100  0 
dm_bufio               27972  2 dm_persistent_data,dm_snapshot
libcrc32c              12644  5 xfs,dm_persistent_data,openvswitch,nf_nat,nf_conntrack
dm_mirror              22124  0 
dm_region_hash         20813  1 dm_mirror
dm_log                 18411  2 dm_region_hash,dm_mirror
dm_mod                123303  33 dm_multipath,dm_log,dm_persistent_data,dm_mirror,dm_bufio,d

Comment 4 Raghavendra Talur 2017-09-26 14:56:16 UTC
Elvir,

I think the rpcbind service isn't started on the node.
Please check with systemctl status.

I realised this very recently that 
systemctl enable rpcbind.service

is not sufficient to make rpcbind start on node reboot.

We have documented the right steps in 

https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#idm140088372947408

Comment 5 Elvir Kuric 2017-09-26 15:22:15 UTC
It turned out that rpcbind was not starting properly, Thanks Talur for pointing 

systemctl add-wants multi-user rpcbind.service

which made it starting always. 

This can be closed, if all is happy with that!

Comment 6 Raghavendra Talur 2017-09-26 15:34:18 UTC
Thanks for quick confirmation Elvir!

I will close this bug as notabug.