Bug 1061580
Summary: | mount hangs, when all the bricks in replica group go down while writes are in progress | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||||||||||||
Component: | glusterfs | Assignee: | Pranith Kumar K <pkarampu> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | ||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||
Priority: | urgent | ||||||||||||||||
Version: | 2.1 | CC: | asrivast, grajaiya, kparthas, nlevinki, nsathyan, pkarampu, rcyriac, sdharane, smanjara, spandura, ssaha, ssamanta, vagarwal, vbellur | ||||||||||||||
Target Milestone: | --- | ||||||||||||||||
Target Release: | RHGS 3.0.0 | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | glusterfs-3.6.0.4-1 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: |
When all the bricks in replica group go down while writes are in progress on that replica group, the mount used to hang some times because of stale structures that were not removed from a list. Now removing of stale structures from the list is added to fix the issue.
|
Story Points: | --- | ||||||||||||||
Clone Of: | |||||||||||||||||
: | 1062256 1063190 (view as bug list) | Environment: | |||||||||||||||
Last Closed: | 2014-09-22 19:32:24 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 1035040, 1062256, 1063190 | ||||||||||||||||
Attachments: |
|
Description
SATHEESARAN
2014-02-05 06:45:42 UTC
Consequence on App VM ===================== 1. VM is not functional, couldn't log in to VM using ssh, 2. Storage domain was DOWN with the error, "Failed to Reconstruct Master domain for Data Center virt-dc." 3. no new disk images can be created as domain is not available Console logs as captured from RHSS Node1 - 10.70.37.86 ======================================================= [Tue Feb 4 20:58:28 UTC 2014 root.37.86:~ ] # gluster v i Volume Name: dr-store Type: Distributed-Replicate Volume ID: c654edf4-3336-47aa-af57-4d096d17da14 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.37.86:/rhs/brick1/drs1 Brick2: 10.70.37.187:/rhs/brick1/dsr1 Brick3: 10.70.37.86:/rhs/brick2/dsr2 Brick4: 10.70.37.187:/rhs/brick2/dsr2 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off Volume Name: repvol Type: Replicate Volume ID: 419bdc4c-2802-4eb4-96d5-b71c1da2fc2c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.37.86:/rhs/brick4/repdir1 Brick2: 10.70.37.187:/rhs/brick4/repdir1 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [Tue Feb 4 20:58:33 UTC 2014 root.37.86:~ ] # gluster volume add-brick dr-store 10.70.37.86:/rhs/brick3/dsr3 10.70.37.187:/rhs/brick3/dsr3 volume add-brick: success <<<<<< Forced power-off the second RHSS Node here >>>>>>>> [Tue Feb 4 20:59:40 UTC 2014 root.37.86:~ ] # gluster volume status Status of volume: dr-store Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick1/drs1 N/A N N/A Brick 10.70.37.86:/rhs/brick2/dsr2 N/A N N/A Brick 10.70.37.86:/rhs/brick3/dsr3 N/A N N/A NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 Task Status of Volume dr-store ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: repvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick4/repdir1 N/A N N/A NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 Task Status of Volume repvol ------------------------------------------------------------------------------ There are no active volume tasks [Tue Feb 4 21:36:20 UTC 2014 root.37.86:~ ] # gluster v i dr-store Volume Name: dr-store Type: Distributed-Replicate Volume ID: c654edf4-3336-47aa-af57-4d096d17da14 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.70.37.86:/rhs/brick1/drs1 Brick2: 10.70.37.187:/rhs/brick1/dsr1 Brick3: 10.70.37.86:/rhs/brick2/dsr2 Brick4: 10.70.37.187:/rhs/brick2/dsr2 Brick5: 10.70.37.86:/rhs/brick3/dsr3 Brick6: 10.70.37.187:/rhs/brick3/dsr3 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [Tue Feb 4 21:37:02 UTC 2014 root.37.86:~ ] # gluster volume status Status of volume: dr-store Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick1/drs1 N/A N N/A Brick 10.70.37.86:/rhs/brick2/dsr2 N/A N N/A Brick 10.70.37.86:/rhs/brick3/dsr3 N/A N N/A NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 Task Status of Volume dr-store ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: repvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick4/repdir1 N/A N N/A NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 Task Status of Volume repvol ------------------------------------------------------------------------------ There are no active volume tasks [Tue Feb 4 21:48:03 UTC 2014 root.37.86:~ ] # gluster volume status Status of volume: dr-store Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick1/drs1 N/A N N/A Brick 10.70.37.86:/rhs/brick2/dsr2 N/A N N/A Brick 10.70.37.86:/rhs/brick3/dsr3 N/A N N/A NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 Task Status of Volume dr-store ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: repvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick4/repdir1 N/A N N/A NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 Task Status of Volume repvol ------------------------------------------------------------------------------ There are no active volume tasks [Tue Feb 4 21:48:09 UTC 2014 root.37.86:~ ] # gluster pe s Number of Peers: 1 Hostname: 10.70.37.187 Uuid: 28eef69b-10d6-4b9d-9103-844eb46b357c State: Peer in Cluster (Disconnected) [Tue Feb 4 21:48:18 UTC 2014 root.37.86:~ ] # gluster peer probe 10.70.37.198 peer probe: failed: Cluster quorum is not met. Changing peers is not allowed in this state <<<< Powered on the second RHSS Node here>>>> [Tue Feb 4 21:52:25 UTC 2014 root.37.86:~ ] # gluster volume status Status of volume: dr-store Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick1/drs1 49152 Y 21810 Brick 10.70.37.187:/rhs/brick1/dsr1 49152 Y 2214 Brick 10.70.37.86:/rhs/brick2/dsr2 49153 Y 21819 Brick 10.70.37.187:/rhs/brick2/dsr2 49153 Y 2215 Brick 10.70.37.86:/rhs/brick3/dsr3 49155 Y 21824 Brick 10.70.37.187:/rhs/brick3/dsr3 49155 Y 2225 NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 NFS Server on 10.70.37.187 2049 Y 2232 Self-heal Daemon on 10.70.37.187 N/A Y 2238 Task Status of Volume dr-store ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: repvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick4/repdir1 49154 Y 21831 Brick 10.70.37.187:/rhs/brick4/repdir1 49154 Y 2210 NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 NFS Server on 10.70.37.187 2049 Y 2232 Self-heal Daemon on 10.70.37.187 N/A Y 2238 Task Status of Volume repvol ------------------------------------------------------------------------------ There are no active volume tasks [Tue Feb 4 22:15:37 UTC 2014 root.37.86:~ ] # gluster volume status Status of volume: dr-store Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick1/drs1 49152 Y 21810 Brick 10.70.37.187:/rhs/brick1/dsr1 49152 Y 2214 Brick 10.70.37.86:/rhs/brick2/dsr2 49153 Y 21819 Brick 10.70.37.187:/rhs/brick2/dsr2 49153 Y 2215 Brick 10.70.37.86:/rhs/brick3/dsr3 49155 Y 21824 Brick 10.70.37.187:/rhs/brick3/dsr3 49155 Y 2225 NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 NFS Server on 10.70.37.187 2049 Y 2232 Self-heal Daemon on 10.70.37.187 N/A Y 2238 Task Status of Volume dr-store ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: repvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick4/repdir1 49154 Y 21831 Brick 10.70.37.187:/rhs/brick4/repdir1 49154 Y 2210 NFS Server on localhost 2049 Y 31631 Self-heal Daemon on localhost N/A Y 31638 NFS Server on 10.70.37.187 2049 Y 2232 Self-heal Daemon on 10.70.37.187 N/A Y 2238 Task Status of Volume repvol ------------------------------------------------------------------------------ There are no active volume tasks Error messages from fuse mount log: <snip1> [2014-02-04 21:56:39.522835] W [afr-inode-write.c:510:afr_writev] 1-dr-store-replicate-0: failing writev due to lack of quorum [2014-02-04 21:56:39.522869] W [fuse-bridge.c:2696:fuse_writev_cbk] 0-glusterfs-fuse: 17070203: WRITE => -1 (Read-only file system) [2014-02-04 21:56:39.536886] W [afr-inode-write.c:510:afr_writev] 1-dr-store-replicate-1: failing writev due to lack of quorum [2014-02-04 21:56:39.536913] W [fuse-bridge.c:2696:fuse_writev_cbk] 0-glusterfs-fuse: 17070207: WRITE => -1 (Read-only file system) [2014-02-04 21:56:39.726681] W [afr-inode-write.c:510:afr_writev] 1-dr-store-replicate-0: failing writev due to lack of quorum [2014-02-04 21:56:39.726711] W [fuse-bridge.c:2696:fuse_writev_cbk] 0-glusterfs-fuse: 17070250: WRITE => -1 (Read-only file system) [2014-02-04 21:56:39.838559] W [afr-inode-write.c:510:afr_writev] 1-dr-store-replicate-0: failing writev due to lack of quorum [2014-02-04 21:56:39.838589] W [fuse-bridge.c:2696:fuse_writev_cbk] 0-glusterfs-fuse: 17070252: WRITE => -1 (Read-only file system) [2014-02-04 21:56:46.903560] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 1-dr-store-client-2: changing port to 49153 (from 0) [2014-02-04 21:56:46.908532] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 1-dr-store-client-4: changing port to 49155 (from 0) </snip1> <snip2> 2014-02-04 22:17:24.095650] C [client-handshake.c:127:rpc_client_ping_timer_expired] 1-dr-store-client-1: server 10.70.37.187:49152 has not responded in the last 42 seconds, disconnecting. [2014-02-04 22:17:47.099842] C [client-handshake.c:127:rpc_client_ping_timer_expired] 1-dr-store-client-2: server 10.70.37.86:49153 has not responded in the last 42 seconds, disconnecting. [2014-02-04 22:17:48.099986] C [client-handshake.c:127:rpc_client_ping_timer_expired] 1-dr-store-client-0: server 10.70.37.86:49152 has not responded in the last 42 seconds, disconnecting. [2014-02-04 22:46:42.227994] E [rpc-clnt.c:208:call_bail] 1-dr-store-client-1: bailing out frame type(GlusterFS 3.3) op(FSYNC(16)) xid = 0x234866x sent = 2014-02-04 22:16:38.497154. timeout = 1800 for 10.70.37.187:49152 [2014-02-04 22:46:42.228049] W [client-rpc-fops.c:984:client3_3_fsync_cbk] 1-dr-store-client-1: remote operation failed: Transport endpoint is not connected [2014-02-04 22:46:42.228069] W [afr-transaction.c:1553:afr_changelog_fsync_cbk] 1-dr-store-replicate-0: fsync(2287b21f-dc5e-47f5-8730-679ee9b44273) failed on subvolume dr-store-client-1. Transaction was WRITE [2014-02-04 22:46:48.228531] E [rpc-clnt.c:208:call_bail] 1-dr-store-client-0: bailing out frame type(GlusterFS 3.3) op(FSYNC(16)) xid = 0x235871x sent = 2014-02-04 22:16:38.497142. timeout = 1800 for 10.70.37.86:49152 [2014-02-04 22:46:48.228592] W [client-rpc-fops.c:984:client3_3_fsync_cbk] 1-dr-store-client-0: remote operation failed: Transport endpoint is not connected [2014-02-04 22:46:48.228612] W [afr-transaction.c:1553:afr_changelog_fsync_cbk] 1-dr-store-replicate-0: fsync(2287b21f-dc5e-47f5-8730-679ee9b44273) failed on subvolume dr-store-client-0. Transaction was WRITE </snip2> Created attachment 859496 [details]
screenshot of appvm1
This screenshot of appvms shows that this appvm's state
Created attachment 859498 [details]
screenshot of appvm2
This screenshot of appvm show the state of appvm
Created attachment 859499 [details]
sosreport from RHSS Node1
Attached the sosreport from RHSS Node1 - 10.70.37.86
Created attachment 859501 [details]
sosreport from RHSS Node2
Attached the sosreport from RHSS Node2 - 10.70.37.187
sosreport on Hypervisor hung as the same as "mount and df" commands So I am attaching the fuse mount logs seperately. Hope that is the most required from Hypervisor Created attachment 859502 [details]
Fuse mount log files from Hypervisor
Attaching the the fuse mount log as got from the Hypervisor
It is expected that RHSS Node can go down in a cluster. If the cluster has limited number of nodes, say 2 or 4. There is high possibility that server-quorum is not met. In those cases, App VMs goes to paused state and unable to resume their operations as normal, once after quorum is again met. No new VMs could also be created with its disk images on this volume. App VMs should not get affected, before and after ,"quorum is not met" and "quorum is met" Considering all this, moving the Severity to 'Urgent' All the operations on the mount point are hung. Suspecting the problems are with Brick processes. ( as Pranith observed initially) Restarting the volume could be the better workaround. But I haven't done this, as Pranith wanted to retain the setup. I repeated the same test with same version of gluster and I could still hit this problem Only thing I could see now is, App VMs are automatically UP, after quorum is met. They were running the paused IO for a minute or so again. After few minutes ( probably 1 minute or more ), I see the Storage Domain again went down. I see the same error messages in App VMs and the "df" command hung Its consistently reproducible As on Pranith's request, tried the same test on glusterfs-3.4.0.56rhs-1 and this issue was not seen. As far as now, there is something wrong happening with FSYNC fop and this issue was not seen in glusterfs-3.4.0.56rhs-1 , as seen below: <<<< Filtering for FSYNC related errors with the glusterfs-3.4.0.58rhs-1.el6rhs <<<<< from fuse mount log root@rhs-client15 glusterfs]# grep -i fsync rhev-data-center-mnt-10.70.37.86:dr-store.log [2014-02-04 21:36:08.678370] E [rpc-clnt.c:369:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7fa34cfca524] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fa34cfca063] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fa34cfc9f7e]))) 1-dr-store-client-3: forced unwinding frame type(GlusterFS 3.3) op(FSYNC(16)) called at 2014-02-04 21:35:00.542502 (xid=0x118726x) [2014-02-04 21:36:08.678385] W [client-rpc-fops.c:984:client3_3_fsync_cbk] 1-dr-store-client-3: remote operation failed: Transport endpoint is not connected [2014-02-04 21:36:08.678409] W [afr-transaction.c:1553:afr_changelog_fsync_cbk] 1-dr-store-replicate-1: fsync(b31d8277-fae7-48c7-a377-1b8fc817a6eb) failed on subvolume dr-store-client-3. Transaction was WRITE [2014-02-04 22:46:42.227994] E [rpc-clnt.c:208:call_bail] 1-dr-store-client-1: bailing out frame type(GlusterFS 3.3) op(FSYNC(16)) xid = 0x234866x sent = 2014-02-04 22:16:38.497154. timeout = 1800 for 10.70.37.187:49152 [2014-02-04 22:46:42.228049] W [client-rpc-fops.c:984:client3_3_fsync_cbk] 1-dr-store-client-1: remote operation failed: Transport endpoint is not connected [2014-02-04 22:46:42.228069] W [afr-transaction.c:1553:afr_changelog_fsync_cbk] 1-dr-store-replicate-0: fsync(2287b21f-dc5e-47f5-8730-679ee9b44273) failed on subvolume dr-store-client-1. Transaction was WRITE [2014-02-04 22:46:48.228531] E [rpc-clnt.c:208:call_bail] 1-dr-store-client-0: bailing out frame type(GlusterFS 3.3) op(FSYNC(16)) xid = 0x235871x sent = 2014-02-04 22:16:38.497142. timeout = 1800 for 10.70.37.86:49152 [2014-02-04 22:46:48.228592] W [client-rpc-fops.c:984:client3_3_fsync_cbk] 1-dr-store-client-0: remote operation failed: Transport endpoint is not connected [2014-02-04 22:46:48.228612] W [afr-transaction.c:1553:afr_changelog_fsync_cbk] 1-dr-store-replicate-0: fsync(2287b21f-dc5e-47f5-8730-679ee9b44273) failed on subvolume dr-store-client-0. Transaction was WRITE <<<< Filtering for FSYNC related errors with the glusterfs-3.4.0.56rhs-1.el6rhs <<<<< from fuse mount log [root@rhs-client15 glusterfs]# grep -i fsync rhev-data-center-mnt-10.70.40.116:drvol.log [root@rhs-client15 glusterfs]# As far as I see, the only workaround is: 1. Identify the mount process (i.e) ps aux | grep gluster 2. Kill the appropriate mount process (i.e) kill -9 <pid> 3. Lazy unmount the fuse mount (i.e) umount -l <fuse-mount-by-rhev> Existing UP ( though the appvms are in hung state, these are shown as UP from RHEVM UI ) appvms will go to paused state 4. Activate the Storage domain from RHEVM UI Here the fuse mount will happen again 5. Force shutdown all existing paused app vms 6. Restart the existing appvms You can also create a new appvms with disk images on this storage domain I performed the above steps and resume the storage domain operations. Capture the the logs from hypervisor as below: Console logs on the hypervisor ============================== [root@rhs-client10 ~]# ps aux | grep gluster root 11252 183 0.3 310444 50088 ? R<sl Feb05 2141:50 /usr/sbin/glusterfs --volfile-server=10.70.37.46 --volfile-id=test-drvol /rhev/data-center/mnt/10.70.37.46:test-drvol root 23481 0.0 0.0 103256 856 pts/6 S+ 05:47 0:00 grep gluster [root@rhs-client10 ~]# kill -9 11252 <<<< df command was not hanging now >> [root@rhs-client10 ~]# df -Th df: `/rhev/data-center/mnt/10.70.37.46:test-drvol': Transport endpoint is not connected Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 99G 2.4G 92G 3% / tmpfs tmpfs 7.8G 8.0K 7.8G 1% /dev/shm /dev/sda1 ext4 1008M 89M 868M 10% /boot [root@rhs-client10 ~]# umount /rhev/data-center/mnt/10.70.37.46:test-drvol umount: /rhev/data-center/mnt/10.70.37.46:test-drvol: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [root@rhs-client10 ~]# ps aux | grep gluster root 24461 0.0 0.0 103252 852 pts/6 S+ 05:48 0:00 grep gluster [root@rhs-client10 ~]# umount -l /rhev/data-center/mnt/10.70.37.46:test-drvol [root@rhs-client10 ~]# <<<< after activation of storage domain from RHEVM UI>>>>> [root@rhs-client10 ~]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 99G 2.4G 92G 3% / tmpfs tmpfs 7.8G 8.0K 7.8G 1% /dev/shm /dev/sda1 ext4 1008M 89M 868M 10% /boot 10.70.37.46:test-drvol fuse.glusterfs 200G 8.0G 192G 4% /rhev/data-center/mnt/10.70.37.46:test-drvol Continued the test with RHEL 6.5 and found the same issue again. In one of my attempt, I restored quorum quickly ( within 5 minutes ), by bringing up the RHSS Node, from the time quorum was not met, and all VM Instances got restored normally. Probably no timer expired ? ( not sure ). But when there is enough time interval ( say 30 minutes or more ), before restoring quorum, I see this error consistently. In all the test cases that I did, I see App VMs returning to normal condition immediately after restoring quorum, and after sometime I could see possible errors In all the test scenarios, I see this error message consistently in log message: <snip> [2014-02-06 19:37:22.151422] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-cvol-client-1: server 10.70.37.187:49156 has not responded in the last 42 seconds, disconnecting. [2014-02-06 19:37:27.155575] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-cvol-client-0: server 10.70.37.86:49156 has not responded in the last 42 seconds, disconnecting. </snip> Why is the disconnect happening after restoring the quorum ? I have captured the specific part of the fuse mount log, from the point server side quorum is restored. Hope this would be helpful to understand the operations Bringing up the node now ========================= [2014-02-06 18:23:42.776185] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207291: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 18:32:22.296254] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207333: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 18:41:04.518361] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207375: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 18:50:00.035737] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207417: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 18:58:37.462267] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207459: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 19:07:19.194087] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207501: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 19:15:50.710956] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207543: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 19:24:32.568873] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 207585: LOOKUP() / => -1 (Transport endpoint is not connected) [2014-02-06 19:30:17.539664] E [client-handshake.c:1778:client_query_portmap_cbk] 0-cvol-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2014-02-06 19:30:17.539745] I [client.c:2103:client_rpc_notify] 0-cvol-client-1: disconnected from 10.70.37.187:24007. Client process will keep trying to connect to glusterd until brick's port is available. [2014-02-06 19:30:17.540290] E [client-handshake.c:1778:client_query_portmap_cbk] 0-cvol-client-3: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2014-02-06 19:30:17.540341] I [client.c:2103:client_rpc_notify] 0-cvol-client-3: disconnected from 10.70.37.187:24007. Client process will keep trying to connect to glusterd until brick's port is available. [2014-02-06 19:30:18.091145] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-cvol-client-3: changing port to 49157 (from 0) [2014-02-06 19:30:18.094941] I [client-handshake.c:1695:select_server_supported_programs] 0-cvol-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-06 19:30:18.097395] I [client-handshake.c:1466:client_setvolume_cbk] 0-cvol-client-3: 'server-pkg-version' key not found, handshaked with older client [2014-02-06 19:30:18.097417] I [client-handshake.c:1480:client_setvolume_cbk] 0-cvol-client-3: Connected to 10.70.37.187:49157, attached to remote volume '/rhs/brick2/dir2'. [2014-02-06 19:30:18.097426] I [client-handshake.c:1492:client_setvolume_cbk] 0-cvol-client-3: Server and Client lk-version numbers are not same, reopening the fds [2014-02-06 19:30:18.097434] I [client-handshake.c:1314:client_post_handshake] 0-cvol-client-3: 2 fds open - Delaying child_up until they are re-opened [2014-02-06 19:30:18.101298] I [client-handshake.c:936:client_child_up_reopen_done] 0-cvol-client-3: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-02-06 19:30:18.101337] I [afr-common.c:3988:afr_notify] 0-cvol-replicate-1: Subvolume 'cvol-client-3' came back up; going online. [2014-02-06 19:30:18.101591] I [client-handshake.c:450:client_set_lk_version_cbk] 0-cvol-client-3: Server lk version = 1 [2014-02-06 19:30:19.095146] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-cvol-client-1: changing port to 49156 (from 0) [2014-02-06 19:30:19.098860] I [client-handshake.c:1695:select_server_supported_programs] 0-cvol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-06 19:30:19.101509] I [client-handshake.c:1466:client_setvolume_cbk] 0-cvol-client-1: 'server-pkg-version' key not found, handshaked with older client [2014-02-06 19:30:19.101535] I [client-handshake.c:1480:client_setvolume_cbk] 0-cvol-client-1: Connected to 10.70.37.187:49156, attached to remote volume '/rhs/brick1/dir1'. [2014-02-06 19:30:19.101544] I [client-handshake.c:1492:client_setvolume_cbk] 0-cvol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2014-02-06 19:30:19.101552] I [client-handshake.c:1314:client_post_handshake] 0-cvol-client-1: 1 fds open - Delaying child_up until they are re-opened [2014-02-06 19:30:19.102325] I [client-handshake.c:936:client_child_up_reopen_done] 0-cvol-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-02-06 19:30:19.102363] I [afr-common.c:3988:afr_notify] 0-cvol-replicate-0: Subvolume 'cvol-client-1' came back up; going online. [2014-02-06 19:30:19.102554] I [client-handshake.c:450:client_set_lk_version_cbk] 0-cvol-client-1: Server lk version = 1 [2014-02-06 19:30:20.101868] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-cvol-client-2: changing port to 49157 (from 0) [2014-02-06 19:30:20.107270] I [client-handshake.c:1695:select_server_supported_programs] 0-cvol-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-06 19:30:20.107339] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-cvol-client-0: changing port to 49156 (from 0) [2014-02-06 19:30:20.110192] I [client-handshake.c:1466:client_setvolume_cbk] 0-cvol-client-2: 'server-pkg-version' key not found, handshaked with older client [2014-02-06 19:30:20.110215] I [client-handshake.c:1480:client_setvolume_cbk] 0-cvol-client-2: Connected to 10.70.37.86:49157, attached to remote volume '/rhs/brick2/dir2'. [2014-02-06 19:30:20.110225] I [client-handshake.c:1492:client_setvolume_cbk] 0-cvol-client-2: Server and Client lk-version numbers are not same, reopening the fds [2014-02-06 19:30:20.110233] I [client-handshake.c:1314:client_post_handshake] 0-cvol-client-2: 2 fds open - Delaying child_up until they are re-opened [2014-02-06 19:30:20.110657] I [client-handshake.c:1695:select_server_supported_programs] 0-cvol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-06 19:30:20.110881] I [client-handshake.c:936:client_child_up_reopen_done] 0-cvol-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-02-06 19:30:20.111031] I [client-handshake.c:1466:client_setvolume_cbk] 0-cvol-client-0: 'server-pkg-version' key not found, handshaked with older client [2014-02-06 19:30:20.111071] I [client-handshake.c:1480:client_setvolume_cbk] 0-cvol-client-0: Connected to 10.70.37.86:49156, attached to remote volume '/rhs/brick1/dir1'. [2014-02-06 19:30:20.111083] I [client-handshake.c:1492:client_setvolume_cbk] 0-cvol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2014-02-06 19:30:20.111110] I [client-handshake.c:1314:client_post_handshake] 0-cvol-client-0: 1 fds open - Delaying child_up until they are re-opened [2014-02-06 19:30:20.111110] I [client-handshake.c:1314:client_post_handshake] 0-cvol-client-0: 1 fds open - Delaying child_up until they are re-opened [2014-02-06 19:30:20.111194] I [client-handshake.c:450:client_set_lk_version_cbk] 0-cvol-client-2: Server lk version = 1 [2014-02-06 19:30:20.111463] I [client-handshake.c:936:client_child_up_reopen_done] 0-cvol-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-02-06 19:30:20.111656] I [client-handshake.c:450:client_set_lk_version_cbk] 0-cvol-client-0: Server lk version = 1 [2014-02-06 19:37:22.151422] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-cvol-client-1: server 10.70.37.187:49156 has not responded in the last 42 seconds, disconnecting. [2014-02-06 19:37:27.155575] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-cvol-client-0: server 10.70.37.86:49156 has not responded in the last 42 seconds, disconnecting. Created attachment 860207 [details]
tcpdump from hypervisor side
Took the tcpdump from the Hypervisor side, analysing this dump with Network Packet Analyser like Wireshark, could help in identifying which transaction actually failed
When the problem was recreated, we observed that the event_dispatch_epoll() thread is blocked on a lock, which is held by the FUSE thread which is looping infinitely over the list of eager locked transactions on an fd opened on behalf of one of the VMs on the volume. This appears to be an issue with eager lock management in afr xlator. Further analysis is required to identify what might have caused this to happen. We also know that this issue is not observed on 3.4.0.56rhs rpms. We are looking into the patches that went in after 56 and trying to identify what might have caused this. I don't think any patch between 3.4.0.56rhs 3.4.0.58rhs could have caused this. Satheesaran, If I remember correctly we brought up the VMs quickly (definitely within 5 to 10 minutes) on 3.4.0.56rhs the last time we tried. Did you get a chance to re-create the bug again and waited for at least 30 minutes even with 3.4.0.56rhs? Krishnan, Did you get a chance to observe if write-behind is pumping writes to afr xlator when you saw this issue? Pranith. (In reply to Pranith Kumar K from comment #22) > I don't think any patch between 3.4.0.56rhs 3.4.0.58rhs could have caused > this. > > Satheesaran, > If I remember correctly we brought up the VMs quickly (definitely > within 5 to 10 minutes) on 3.4.0.56rhs the last time we tried. Did you get a > chance to re-create the bug again and waited for at least 30 minutes even > with 3.4.0.56rhs? Pranith, I retried the same issue with glusterfs-3.4.0.56rhs and gave sufficient time interval. I am not seeing any issues with 56rhs build. I have the 56rhs for you to investigate in to it. I was able to figure out why there could be an infinite loop in afr eager-lock functionality. When both the bricks go down and an 'finodelk' for eager-lock is in progress and if it fails it leaves a stale stub in the list of stubs which share the eager-lock. this stale stub's memory is re-used by new frames which get same pool of memory. So list traversal with this stale stub could lead to undefined behavior could also be infinite loops. I attached the process to gdb to confirm the case I mentioned above lead to stale stubs. I sent the fix for this bug @http://review.gluster.org/6944 upstream. we confirmed that the following logs "[2014-02-04 21:36:12.702263] I [afr-transaction.c:1119:afr_post_blocking_inodelk_cbk] 1-dr-store-replicate-0: Blocking inodelks failed" are present in the setup where the issue is seen and not present in the setup (the one with 3.4.0.56rhs) where the issue is not seen. This is not a regression, this bug must have been there from bigbend. Thanks vijay, kp, satheesaran for helping me find the root cause. I am not sure if there are more than one issues in this bug, for example the reboot we saw on one of the attempts to re-create the issue. But for the issue of infinite loop we saw, we have the fix. Easier steps to re-create the bug: 1) Create a replicate volume and disable all performance xlators. 2) open a bash fd on one of the files on mount using bash fds "exec fd-num>file-name". 3) start performing writes on fd in a while loop in one window. 4) attach the mount process to gdb, put a break point just before afr sends finodelks 5) once the break point is hit stop the volume. This hits the error condition which will lead to infinite loops 6) Restart the volumes and wait for some time and df will hang. I re-created this issue in front of satheesaran and with the fix this issue is not seen any more. Forgot to add one more point about the bug: For this issue to be hit both the bricks need to go down just before taking finodelk. Server-quorum was just simulating that. We can hit this bug even without any quorum and just using volume stop just before finodelk as well. AS per comment25, this bug is not relevant to server quorum only, and modifying the summary accordingly adding 3.0 flag and removing 2.1.z Setting flags required to add BZs to RHS 3.0 Errata *** Bug 1062256 has been marked as a duplicate of this bug. *** Tested with glusterfs-3.6.0.24-1.el6rhs with the following steps as in comment0 and mount point doesn't hang once the client-side quorum is restored. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |