Bug 1201633
Summary: | [epoll+Snapshot] : Snapd crashed while trying to list snaps under .snaps folder | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | senaik | |
Component: | snapshot | Assignee: | Poornima G <pgurusid> | |
Status: | CLOSED ERRATA | QA Contact: | senaik | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.0 | CC: | annair, pgurusid, rcyriac, rhs-bugs, rjoseph, storage-qa-internal, vagarwal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.0.4 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | USS | |||
Fixed In Version: | glusterfs-3.6.0.53-1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1202290 (view as bug list) | Environment: | ||
Last Closed: | 2015-03-26 06:37:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1182947, 1202290 |
Description
senaik
2015-03-13 07:18:30 UTC
Some more details to add to problem description : 2 snapshot activate had failed on machine rhs-arch-srv2.lab.eng.blr.redhat.com. All the other snapshots were activated successfully [2015-03-13 01:07:02.423336] : snapshot activate S254 : FAILED : Commit failed on 10.70.34.50. Please check log file for details. [2015-03-13 01:07:06.692653] : snapshot activate S255 : FAILED : Commit failed on 10.70.34.50. Please check log file for details. Following is the bt from crash: #0 __pthread_mutex_lock (mutex=0x320) at pthread_mutex_lock.c:50 #1 0x00000033e4425060 in gf_log_set_log_buf_size (buf_size=0) at logging.c:256 #2 0x00000033e44251ff in gf_log_disable_suppression_before_exit (ctx=0x22b3010) at logging.c:427 #3 0x00000033e443bac5 in gf_print_trace (signum=11, ctx=0x22b3010) at common-utils.c:493 #4 <signal handler called> #5 0x00000033e444f731 in __gf_free (free_ptr=0x7f911ef33c50) at mem-pool.c:231 #6 0x00000033e443da02 in gf_timer_proc (ctx=0x7f911ef35630) at timer.c:207 #7 0x0000003f236079d1 in start_thread (arg=0x7f8eb197b700) at pthread_create.c:301 #8 0x0000003f232e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 In the test case multiple snapshots were created and then activated. And after activation snapshots were accessed using USS. while accessing these snapshot the crash is seen. code wise this crash is happening during the timer thread destruction. Timer thread is destroyed as part of glfs_fini. Normally glfs_fini is called when snapshots are deactivated or deleted. But in this case no snapshots were deleted or deactivated. In this case glfs_fini is called due to failure in glfs_init. For some reason the snapshot brick is not in started state leading to failure in glfs_init. We could not figure out the exact cause of this since the brick and snapshot logs were missing from the sos-report. But anyway when glfs_init fails we call glfs_fini to clean up the resources allocated. In the timer thread current THIS is overwritten and never restored, leading to wrong value of THIS which causes segmentation fault in __gf_free function. We will send a patch to address this problem. Version : glusterfs 3.6.0.53 ======== Repeated the steps as mentioned in the Description , did not face any crash. Marking the bug as 'Verified' [root@inception ~]# gluster v i Volume Name: vol0 Type: Distributed-Replicate Volume ID: ef518dd8-2416-4347-bcf7-ba042128e89c Status: Started Snap Volume: no Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: inception.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick2: rhs-arch-srv2.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick3: rhs-arch-srv3.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick4: rhs-arch-srv4.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick5: inception.lab.eng.blr.redhat.com:/rhs/brick2/b2 Brick6: rhs-arch-srv2.lab.eng.blr.redhat.com:/rhs/brick2/b2 Brick7: rhs-arch-srv3.lab.eng.blr.redhat.com:/rhs/brick2/b2 Brick8: rhs-arch-srv4.lab.eng.blr.redhat.com:/rhs/brick2/b2 Brick9: inception.lab.eng.blr.redhat.com:/rhs/brick3/b3 Brick10: rhs-arch-srv2.lab.eng.blr.redhat.com:/rhs/brick3/b3 Brick11: rhs-arch-srv3.lab.eng.blr.redhat.com:/rhs/brick3/b3 Brick12: rhs-arch-srv4.lab.eng.blr.redhat.com:/rhs/brick3/b3 Options Reconfigured: features.uss: enable features.barrier: disable client.event-threads: 4 server.event-threads: 5 server.allow-insecure: on features.quota: on performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@inception ~]# gluster v status Status of volume: vol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick inception.lab.eng.blr.redhat.com:/rhs /brick1/b1 49152 0 Y 22780 Brick rhs-arch-srv2.lab.eng.blr.redhat.com: /rhs/brick1/b1 49152 0 Y 25239 Brick rhs-arch-srv3.lab.eng.blr.redhat.com: /rhs/brick1/b1 49152 0 Y 19975 Brick rhs-arch-srv4.lab.eng.blr.redhat.com: /rhs/brick1/b1 49152 0 Y 18271 Brick inception.lab.eng.blr.redhat.com:/rhs /brick2/b2 49153 0 Y 22793 Brick rhs-arch-srv2.lab.eng.blr.redhat.com: /rhs/brick2/b2 49153 0 Y 25252 Brick rhs-arch-srv3.lab.eng.blr.redhat.com: /rhs/brick2/b2 49153 0 Y 19988 Brick rhs-arch-srv4.lab.eng.blr.redhat.com: /rhs/brick2/b2 49153 0 Y 18284 Brick inception.lab.eng.blr.redhat.com:/rhs /brick3/b3 49154 0 Y 22806 Brick rhs-arch-srv2.lab.eng.blr.redhat.com: /rhs/brick3/b3 49154 0 Y 25265 Brick rhs-arch-srv3.lab.eng.blr.redhat.com: /rhs/brick3/b3 49154 0 Y 20001 Brick rhs-arch-srv4.lab.eng.blr.redhat.com: /rhs/brick3/b3 49154 0 Y 18297 Snapshot Daemon on localhost 49923 0 Y 6328 NFS Server on localhost 2049 0 Y 6336 Self-heal Daemon on localhost N/A N/A Y 22827 Quota Daemon on localhost N/A N/A Y 22869 Snapshot Daemon on rhs-arch-srv3.lab.eng.bl r.redhat.com 49923 0 Y 20196 NFS Server on rhs-arch-srv3.lab.eng.blr.red hat.com 2049 0 Y 20205 Self-heal Daemon on rhs-arch-srv3.lab.eng.b lr.redhat.com N/A N/A Y 20022 Quota Daemon on rhs-arch-srv3.lab.eng.blr.r edhat.com N/A N/A Y 20043 Snapshot Daemon on rhs-arch-srv2.lab.eng.bl r.redhat.com 49923 0 Y 10590 NFS Server on rhs-arch-srv2.lab.eng.blr.red hat.com 2049 0 Y 10598 Self-heal Daemon on rhs-arch-srv2.lab.eng.b lr.redhat.com N/A N/A Y 25286 Quota Daemon on rhs-arch-srv2.lab.eng.blr.r edhat.com N/A N/A Y 25318 Snapshot Daemon on rhs-arch-srv4.lab.eng.bl r.redhat.com 49923 0 Y 11216 NFS Server on rhs-arch-srv4.lab.eng.blr.red hat.com 2049 0 Y 11225 Self-heal Daemon on rhs-arch-srv4.lab.eng.b lr.redhat.com N/A N/A Y 18318 Quota Daemon on rhs-arch-srv4.lab.eng.blr.r edhat.com N/A N/A Y 18338 Task Status of Volume vol0 ------------------------------------------------------------------------------ There are no active volume tasks Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0682.html |