2179286 – [OCS] [Gluster] gluster pod not able to start tcmu-runner and gluster-blockd

Bug 2179286 - [OCS] [Gluster] gluster pod not able to start tcmu-runner and gluster-blockd

Summary: [OCS] [Gluster] gluster pod not able to start tcmu-runner and gluster-blockd

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	ocs-3.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	John Mulligan
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-03-17 08:44 UTC by tochan
Modified:	2023-07-06 18:43 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-07-06 18:43:03 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description tochan 2023-03-17 08:44:15 UTC

Description of problem:
ocs 3.11.9
three pod at 3.11.9 and only one pod at 3.11.4 like the customer environment
we’ve aligned it to 3.11.9 simply replace the version in the ds, moving ondelete policy and deleted pod to restart and pull the new image (3.11.9)
gluster pod is able to start glusterd but tcmu-runner is not able to start
gluster-blockd


Version-Release number of selected component (if applicable):
ocs 3.11.4 -> ocs 3.11.9
pods already with ocs 3.11.9:
glusterfs-libs-6.0-63.el7rhgs.x86_64
glusterfs-6.0-63.el7rhgs.x86_64
glusterfs-client-xlators-6.0-63.el7rhgs.x86_64
glusterfs-fuse-6.0-63.el7rhgs.x86_64
glusterfs-geo-replication-6.0-63.el7rhgs.x86_64
glusterfs-api-6.0-63.el7rhgs.x86_64
glusterfs-cli-6.0-63.el7rhgs.x86_64
glusterfs-server-6.0-63.el7rhgs.x86_64
gluster-block-0.2.1-41.el7rhgs.x86_64

pod with the issue updated from ocs 3.11.4 to 3.11.9
glusterfs-api-6.0-30.1.el7rhgs.x86_64
glusterfs-fuse-6.0-30.1.el7rhgs.x86_64
glusterfs-server-6.0-30.1.el7rhgs.x86_64
glusterfs-libs-6.0-30.1.el7rhgs.x86_64
glusterfs-6.0-30.1.el7rhgs.x86_64
glusterfs-client-xlators-6.0-30.1.el7rhgs.x86_64
glusterfs-cli-6.0-30.1.el7rhgs.x86_64
glusterfs-geo-replication-6.0-30.1.el7rhgs.x86_64
gluster-block-0.2.1-36.el7rhgs.x86_64


How reproducible:
NAME                                          READY     STATUS    RESTARTS   AGE       IP              NODE                                           NOMINATED NODE
glusterblock-storage-provisioner-dc-2-kp69k   1/1       Running   0          1d        10.128.0.110    master-1.agabriel311.lab.psi.pnq2.redhat.com   <none>
glusterfs-storage-4lw9q                       1/1       Running   0          1d        10.74.214.241   infra-0.agabriel311.lab.psi.pnq2.redhat.com    <none>
glusterfs-storage-67lfr                       1/1       Running   0          1d        10.74.212.9     infra-1.agabriel311.lab.psi.pnq2.redhat.com    <none>
glusterfs-storage-68bpq                       0/1       Running   54         19h       10.74.213.90    node-0.infra4.lab.psi.pnq2.redhat.com          <none> <-----
glusterfs-storage-tdftc                       1/1       Running   0          1d        10.74.214.108   infra-2.agabriel311.lab.psi.pnq2.redhat.com    <none>
heketi-storage-2-gh66p                        1/1       Running   0          1d        10.129.0.78     master-2.agabriel311.lab.psi.pnq2.redhat.com   <none>



Steps to Reproduce:
cluster with ocs 3.11.9
3 gluster pods at 3.11.9
1 gluster pod at 3.11.4
scale down heketi
oc delete pod
create new gluster pod 3.11.9, issue appears and pod ready/running remains at 0/1

Actual results:
see above

Expected results:
gluster pod should start tcmu-runner and gluster-blockd, and ready/running 1/1

Additional info:


sh-4.2# systemctl status tcmu-runner
● tcmu-runner.service - LIO Userspace-passthrough daemon
   Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor preset: disabled)
   Active: activating (start) since Wed 2023-03-15 09:41:26 UTC; 5min ago
  Process: 178 ExecStopPost=/usr/bin/bash -c /usr/bin/echo 1 > ${NETLINK_BLOCK};                                 /usr/bin/echo 1 > ${NETLINK_RESET};                                 /usr/bin/echo 0 > ${NETLINK_BLOCK}; (code=exited, status=0/SUCCESS)
  Process: 209 ExecStartPre=/usr/libexec/gluster-block/upgrade_activities.sh (code=exited, status=0/SUCCESS)
 Main PID: 238 (tcmu-runner)
   CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod81ebca45_c312_11ed_a23f_fa163e4943d3.slice/docker-b71753f32d66a615ca969d2fbdc8dd303e75bb789560761228bf04a0c8814d75.scope/system.slice/tcmu-runner.service
           └─238 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block


sh-4.2# systemctl status gluster-blockd.service
● gluster-blockd.service - Gluster block storage utility
   Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2023-03-15 09:47:56 UTC; 24s ago
  Process: 196 ExecStart=/usr/sbin/gluster-blockd --glfs-lru-count $GB_GLFS_LRU_COUNT --log-level $GB_LOG_LEVEL $GB_EXTRA_ARGS (code=exited, status=19)
 Main PID: 196 (code=exited, status=19)


case 03460127
this is being done on RH's quicklab lab cluster prior to replicate
the update process in the Customer

Note You need to log in before you can comment on or make changes to this bug.