Bug 2179286
| Summary: | [OCS] [Gluster] gluster pod not able to start tcmu-runner and gluster-blockd | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | tochan |
| Component: | gluster-block | Assignee: | John Mulligan <jmulligan> |
| Status: | CLOSED WONTFIX | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | ocs-3.11 | CC: | abhishku, agabriel, jmulligan, prasanna.kalever, skrenger, xiubli |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-06 18:43:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: ocs 3.11.9 three pod at 3.11.9 and only one pod at 3.11.4 like the customer environment we’ve aligned it to 3.11.9 simply replace the version in the ds, moving ondelete policy and deleted pod to restart and pull the new image (3.11.9) gluster pod is able to start glusterd but tcmu-runner is not able to start gluster-blockd Version-Release number of selected component (if applicable): ocs 3.11.4 -> ocs 3.11.9 pods already with ocs 3.11.9: glusterfs-libs-6.0-63.el7rhgs.x86_64 glusterfs-6.0-63.el7rhgs.x86_64 glusterfs-client-xlators-6.0-63.el7rhgs.x86_64 glusterfs-fuse-6.0-63.el7rhgs.x86_64 glusterfs-geo-replication-6.0-63.el7rhgs.x86_64 glusterfs-api-6.0-63.el7rhgs.x86_64 glusterfs-cli-6.0-63.el7rhgs.x86_64 glusterfs-server-6.0-63.el7rhgs.x86_64 gluster-block-0.2.1-41.el7rhgs.x86_64 pod with the issue updated from ocs 3.11.4 to 3.11.9 glusterfs-api-6.0-30.1.el7rhgs.x86_64 glusterfs-fuse-6.0-30.1.el7rhgs.x86_64 glusterfs-server-6.0-30.1.el7rhgs.x86_64 glusterfs-libs-6.0-30.1.el7rhgs.x86_64 glusterfs-6.0-30.1.el7rhgs.x86_64 glusterfs-client-xlators-6.0-30.1.el7rhgs.x86_64 glusterfs-cli-6.0-30.1.el7rhgs.x86_64 glusterfs-geo-replication-6.0-30.1.el7rhgs.x86_64 gluster-block-0.2.1-36.el7rhgs.x86_64 How reproducible: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE glusterblock-storage-provisioner-dc-2-kp69k 1/1 Running 0 1d 10.128.0.110 master-1.agabriel311.lab.psi.pnq2.redhat.com <none> glusterfs-storage-4lw9q 1/1 Running 0 1d 10.74.214.241 infra-0.agabriel311.lab.psi.pnq2.redhat.com <none> glusterfs-storage-67lfr 1/1 Running 0 1d 10.74.212.9 infra-1.agabriel311.lab.psi.pnq2.redhat.com <none> glusterfs-storage-68bpq 0/1 Running 54 19h 10.74.213.90 node-0.infra4.lab.psi.pnq2.redhat.com <none> <----- glusterfs-storage-tdftc 1/1 Running 0 1d 10.74.214.108 infra-2.agabriel311.lab.psi.pnq2.redhat.com <none> heketi-storage-2-gh66p 1/1 Running 0 1d 10.129.0.78 master-2.agabriel311.lab.psi.pnq2.redhat.com <none> Steps to Reproduce: cluster with ocs 3.11.9 3 gluster pods at 3.11.9 1 gluster pod at 3.11.4 scale down heketi oc delete pod create new gluster pod 3.11.9, issue appears and pod ready/running remains at 0/1 Actual results: see above Expected results: gluster pod should start tcmu-runner and gluster-blockd, and ready/running 1/1 Additional info: sh-4.2# systemctl status tcmu-runner ● tcmu-runner.service - LIO Userspace-passthrough daemon Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor preset: disabled) Active: activating (start) since Wed 2023-03-15 09:41:26 UTC; 5min ago Process: 178 ExecStopPost=/usr/bin/bash -c /usr/bin/echo 1 > ${NETLINK_BLOCK}; /usr/bin/echo 1 > ${NETLINK_RESET}; /usr/bin/echo 0 > ${NETLINK_BLOCK}; (code=exited, status=0/SUCCESS) Process: 209 ExecStartPre=/usr/libexec/gluster-block/upgrade_activities.sh (code=exited, status=0/SUCCESS) Main PID: 238 (tcmu-runner) CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod81ebca45_c312_11ed_a23f_fa163e4943d3.slice/docker-b71753f32d66a615ca969d2fbdc8dd303e75bb789560761228bf04a0c8814d75.scope/system.slice/tcmu-runner.service └─238 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block sh-4.2# systemctl status gluster-blockd.service ● gluster-blockd.service - Gluster block storage utility Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2023-03-15 09:47:56 UTC; 24s ago Process: 196 ExecStart=/usr/sbin/gluster-blockd --glfs-lru-count $GB_GLFS_LRU_COUNT --log-level $GB_LOG_LEVEL $GB_EXTRA_ARGS (code=exited, status=19) Main PID: 196 (code=exited, status=19) case 03460127 this is being done on RH's quicklab lab cluster prior to replicate the update process in the Customer