Description of problem: While doing GFS recovery testing with lock_gulm a mount hung after shooting only one node. The lock_gulmd_LT threads on all nodes are very busy. I did a packet capture of port 41040 on the master to hopefully shed some light on what is going on. Version-Release number of selected component (if applicable): gulm-1.0.10-0 kernel-hugemem-2.6.9-67.EL GFS-6.1.15-1 GFS-kernel-hugemem-2.6.9-75.9 How reproducible: Unknown Actual results: Senario iteration 1.1 started at Tue Nov 13 16:09:51 CST 2007 Sleeping 5 minute(s) to let the I/O get its lock count up... Gulm Status =========== morph-02: Client morph-04: Client morph-05: Master morph-03: Slave morph-01: Slave Senario: GULM kill Master Those picked to face the revolver... morph-05 ... checking Gulm recovery... Verifying that clvmd was started properly on the dueler(s) mounting /dev/mapper/morph--cluster-morph--cluster0 on /mnt/morph-cluster0 on morph-05 mounting /dev/mapper/morph--cluster-morph--cluster1 on /mnt/morph-cluster1 on morph-05 (hung) Expected results: The mount should not hang. Additional info: The recovery was done with a load on each node.
Created attachment 258231 [details] gzipped tcpdump of port 41040 from morph-01
I think I've hit this twice now. The most recent time I thought it was a hung mount after losing quorum, but after re-fencing the nodes which were fenced the mount did not continue.
*** This bug has been marked as a duplicate of 252209 ***