1495466 – gluster-blockd inside cns pod not started after node reboot

Bug 1495466 - gluster-blockd inside cns pod not started after node reboot

Summary: gluster-blockd inside cns pod not started after node reboot

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	CNS-deployment
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Michael Adam
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:	aos-scalability-37
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-26 06:42 UTC by Elvir Kuric
Modified:	2017-10-26 18:13 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-26 15:34:18 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Elvir Kuric 2017-09-26 06:42:35 UTC

Description of problem:

gluster-blockd will not start automatically after ocp node hosting cns pods is rebooted 


Version-Release number of selected component (if applicable):

ocp packages : 

tuned-profiles-atomic-2.8.0-5.el7.noarch
atomic-openshift-pod-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-node-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-utils-3.7.0-0.127.0.git.0.b9941e4.el7.noarch
atomic-openshift-docker-excluder-3.7.0-0.127.0.git.0.459b70b.el7.noarch
atomic-openshift-tests-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-dockerregistry-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-clients-redistributable-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-registries-1.19.1-3.gitb39a783.el7.x86_64
atomic-openshift-clients-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
tuned-profiles-atomic-openshift-node-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-sdn-ovs-3.7.0-0.127.0.git.0.459b70b.el7.x86_64
atomic-openshift-excluder-3.7.0-0.127.0.git.0.459b70b.el7.noarch


cns / heketi packages: 

cns-deploy-5.0.0-50.el7rhgs.x86_64
python-heketi-5.0.0-15.el7rhgs.x86_64
heketi-client-5.0.0-15.el7rhgs.x86_64
heketi-5.0.0-15.el7rhgs.x86_64

gluster packages: 

glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
gluster-block-0.2.1-14.el7rhgs.x86_64
glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
targetcli-2.1.fb46-1.el7.noarch
gluster-block-0.2.1-14.el7rhgs.x86_64
tcmu-runner-1.2.0-15.el7rhgs.x86_64

Images: 
deploy-heketi-template.yaml:          image: rhgs3/rhgs-volmanager-rhel7:3.3.0-21
glusterblock-provisioner.yaml:          image: rhgs3/rhgs-gluster-block-prov-docker:3.3.0-15
glusterfs-template.yaml:        - image: rhgs3/rhgs-server-rhel7:3.3.0-27
gluster-s3-template.yaml:          image: rhgs3/rhgs-s3-server-rhel7:3.3.0-11
heketi-template.yaml:          image: rhgs3/rhgs-volmanager-rhel7:3.3.0-21


How reproducible:

always 


Steps to Reproduce:
1. configure cns on top of ocp
2. reboot ocp node hosting cns pod
3. check are started manually gluster-blockd tcmu-runner gluster-block-target 

Actual results:

service gluster-block-target is not starting 

Expected results:
service gluster-block-target is not starting to start on node reboot 

Additional info:

on node modules :
dm_snapshot 
dm_mirror 
dm_thin_pool 
dm_multipath 
target_core_user

are loaded

Service rpcbind.service is enabled to start on boot

Not started service gluster-block-target inside cns pod leads that cns cluster block device feature is not functional 






Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 2 Jose A. Rivera 2017-09-26 13:47:28 UTC

Can you verify that the kernel modules are loaded after reboot, e.g. they're specified in a file under /etc/modprobe.d/?

Comment 3 Elvir Kuric 2017-09-26 14:13:32 UTC

yep, there were loaded , I had  

$ cat /etc/modules-load.d/cns.conf 

dm_snapshot 
dm_mirror 
dm_thin_pool 
dm_multipath 
target_core_user

and 

systemctl enabled / start  systemd-modules-load

modules loaded on node after reboot were as below 

#  lsmod  |grep target 
target_core_user       23936  0 
target_core_mod       367918  1 target_core_user
crc_t10dif             12714  1 target_core_mod
uio                    19259  1 target_core_user



root@172: ~ # lsmod  |grep dm
dm_multipath           27427  0 
dm_thin_pool           65968  4 
dm_persistent_data     74708  1 dm_thin_pool
dm_bio_prison          18209  1 dm_thin_pool
dm_snapshot            39100  0 
dm_bufio               27972  2 dm_persistent_data,dm_snapshot
libcrc32c              12644  5 xfs,dm_persistent_data,openvswitch,nf_nat,nf_conntrack
dm_mirror              22124  0 
dm_region_hash         20813  1 dm_mirror
dm_log                 18411  2 dm_region_hash,dm_mirror
dm_mod                123303  33 dm_multipath,dm_log,dm_persistent_data,dm_mirror,dm_bufio,d

Comment 4 Raghavendra Talur 2017-09-26 14:56:16 UTC

Elvir,

I think the rpcbind service isn't started on the node.
Please check with systemctl status.

I realised this very recently that 
systemctl enable rpcbind.service

is not sufficient to make rpcbind start on node reboot.

We have documented the right steps in 

https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#idm140088372947408

Comment 5 Elvir Kuric 2017-09-26 15:22:15 UTC

It turned out that rpcbind was not starting properly, Thanks Talur for pointing 

systemctl add-wants multi-user rpcbind.service

which made it starting always. 

This can be closed, if all is happy with that!

Comment 6 Raghavendra Talur 2017-09-26 15:34:18 UTC

Thanks for quick confirmation Elvir!

I will close this bug as notabug.

Note You need to log in before you can comment on or make changes to this bug.