Bug 131002
Summary: | second and third mount attempts on recovered node hangs | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | gfs | Assignee: | David Teigland <teigland> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | ccaulfie, djuran |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 20:41:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2004-08-26 15:54:35 UTC
morph-01: [root@morph-01 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [3 4 5 6 2 1] DLM Lock Space: "clvmd" 2 3 run - [3 2 4 5 6 1] DLM Lock Space: "corey0" 3 4 run - [3 4 5 6 2 1] DLM Lock Space: "corey1" 5 6 run - [3 4 5 6 2 1] DLM Lock Space: "corey2" 7 8 run - [3 4 5 6 2] DLM Lock Space: "corey3" 9 10 run - [3 4 5 6 2] DLM Lock Space: "corey4" 11 12 run - [3 4 5 6 2] GFS Mount Group: "corey0" 4 5 run - [3 4 5 6 2 1] GFS Mount Group: "corey1" 6 7 update U-4,1,1 [3 4 5 6 2 1] GFS Mount Group: "corey2" 8 9 run - [3 4 5 6 2] GFS Mount Group: "corey3" 10 11 run - [3 4 5 6 2] GFS Mount Group: "corey4" 12 13 run - [3 4 5 6 2] morph-02: [root@morph-02 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [3 4 5 6 2 1] DLM Lock Space: "clvmd" 2 3 run - [3 2 4 5 6 1] DLM Lock Space: "corey0" 3 4 run - [3 4 5 6 2 1] DLM Lock Space: "corey1" 5 6 run - [3 4 5 6 2 1] DLM Lock Space: "corey2" 7 8 run - [3 4 5 6 2] DLM Lock Space: "corey3" 9 10 run - [3 4 5 6 2] DLM Lock Space: "corey4" 11 12 run - [3 4 5 6 2] GFS Mount Group: "corey0" 4 5 run - [3 4 5 6 2 1] GFS Mount Group: "corey1" 6 7 update U-4,1,1 [3 4 5 6 2 1] GFS Mount Group: "corey2" 8 9 run - [3 4 5 6 2] GFS Mount Group: "corey3" 10 11 run - [3 4 5 6 2] GFS Mount Group: "corey4" 12 13 run - [3 4 5 6 2] morph-03: [root@morph-03 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [4 3 5 6 2 1] DLM Lock Space: "clvmd" 2 3 run - [4 2 3 5 6 1] DLM Lock Space: "corey0" 3 4 run - [4 3 5 6 2 1] DLM Lock Space: "corey1" 5 6 run - [4 3 5 6 2 1] DLM Lock Space: "corey2" 7 8 run - [4 3 5 6 2] DLM Lock Space: "corey3" 9 10 run - [4 3 5 6 2] DLM Lock Space: "corey4" 11 12 run - [4 3 5 6 2] GFS Mount Group: "corey0" 4 5 run - [4 3 5 6 2 1] GFS Mount Group: "corey1" 6 7 update U-4,1,1 [4 3 5 6 2 1] GFS Mount Group: "corey2" 8 9 run - [4 3 5 6 2] GFS Mount Group: "corey3" 10 11 run - [4 3 5 6 2] GFS Mount Group: "corey4" 12 13 run - [4 3 5 6 2] morph-04: [root@morph-04 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [5 3 4 6 2 1] DLM Lock Space: "clvmd" 2 3 run - [5 3 2 4 6 1] DLM Lock Space: "corey0" 3 4 run - [5 3 4 6 2 1] DLM Lock Space: "corey1" 5 6 run - [5 3 4 6 2 1] DLM Lock Space: "corey2" 7 8 run - [5 3 4 6 2] DLM Lock Space: "corey3" 9 10 run - [5 3 4 6 2] DLM Lock Space: "corey4" 11 12 run - [5 3 4 6 2] GFS Mount Group: "corey0" 4 5 run - [5 3 4 6 2 1] GFS Mount Group: "corey1" 6 7 update U-4,1,1 [5 3 4 6 2 1] GFS Mount Group: "corey2" 8 9 run - [5 3 4 6 2] GFS Mount Group: "corey3" 10 11 run - [5 3 4 6 2] GFS Mount Group: "corey4" 12 13 run - [5 3 4 6 2] morph-05: [root@morph-05 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [2 3 4 5 6 1] DLM Lock Space: "clvmd" 2 3 run - [2 3 4 5 6 1] DLM Lock Space: "corey0" 3 4 run - [2 3 4 5 6 1] DLM Lock Space: "corey1" 5 6 run - [2 3 4 5 6 1] DLM Lock Space: "corey2" 7 8 run - [2 3 4 5 6] DLM Lock Space: "corey3" 9 10 run - [2 3 4 5 6] DLM Lock Space: "corey4" 11 12 run - [2 3 4 5 6] GFS Mount Group: "corey0" 4 5 run - [2 3 4 5 6 1] GFS Mount Group: "corey1" 6 7 update U-4,1,1 [2 3 4 5 6 1] GFS Mount Group: "corey2" 8 9 run - [2 3 4 5 6] GFS Mount Group: "corey3" 10 11 run - [2 3 4 5 6] GFS Mount Group: "corey4" 12 13 run - [2 3 4 5 6] morph-06: [root@morph-06 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [2 3 4 5 6 1] DLM Lock Space: "clvmd" 2 3 run - [2 3 4 5 6 1] DLM Lock Space: "corey0" 3 4 run - [2 3 4 5 6 1] DLM Lock Space: "corey1" 5 6 run - [2 3 4 5 6 1] GFS Mount Group: "corey0" 4 5 run - [2 3 4 5 6 1] GFS Mount Group: "corey1" 6 7 join S-6,20,6 [2 3 4 5 6 1] I was able to reproduce this mount hang using revolver and by just shooting one node [root@morph-06 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [5 4 3 2 6 1] DLM Lock Space: "clvmd" 2 3 run - [5 4 3 2 6 1] DLM Lock Space: "corey0" 3 4 run - [5 4 3 2 6 1] DLM Lock Space: "corey1" 5 6 run - [5 4 3 2 6 1] DLM Lock Space: "corey2" 7 8 run - [5 4 3 2 6 1] GFS Mount Group: "corey0" 4 5 run - [5 4 3 2 6 1] GFS Mount Group: "corey1" 6 7 run - [5 4 3 2 6 1] GFS Mount Group: "corey2" 8 9 join S-6,20,6 [5 4 3 2 6 1] I recently fixed a dlm bug that could cause any gfs mount to hang. It could be the culprit here. unable to reproduce. marking fixed. Updating version to the right level in the defects. Sorry for the storm. I -think- i have been able to reproduce this. Run this on each of the nodes, wait about 5 hours, two of the nodes were able to continue to mount and unmount, one was not.. it hung at mounting. Approximately one hour later, the second node from my three cluster setup hung at unmounting. Admittedly, this is a bit brutish and I think it may expose the same problem. I have no access to revolver. kernel 2.6.9-55.0.2.EL , and related packages. #!/bin/bash i="0" while [ $i -lt 1 ] do echo "Mounting ... " mount -t gfs /dev/hdb1 /mnt/test echo "Unmounting ..." umount /mnt/test done |