Bug 950024
| Summary: | replace-brick immediately saturates IO on source brick causing the entire volume to be unavailable, then dies | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | hans | ||||
| Component: | core | Assignee: | bugs <bugs> | ||||
| Status: | CLOSED DEFERRED | QA Contact: | |||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.3.1 | CC: | bugs, gluster-bugs, nsathyan, yinyin2010 | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-12-14 19:40:30 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
hans
2013-04-09 13:28:19 UTC
Can you please provide the logs for the cluster? Also please provide more information on what the client work-load was which hung. Using ext4 as backend did cause clients to hang. Are you running a kernel version without the ext4 64bit offset kernel patches? Created attachment 733660 [details]
The -etc-glusterfs-glusterd.vol.log
Logfile that goes with these commands :
Apr 8 11:11:37 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b start
Apr 8 11:13:39 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:14:05 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:15:59 stor3-idc1-lga bash[22161]: [hans->root] gluster volume status
Apr 8 11:17:25 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:18:52 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:19:58 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:20:48 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:21:07 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b pause
Apr 8 11:21:14 stor1-idc1-lga bash[9475]: [hans->root] restart glusterd
Apr 8 11:21:19 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:23:24 stor1-idc1-lga bash[9475]: [hans->root] gluster volume status
Apr 8 11:23:52 stor1-idc1-lga bash[9475]: [hans->root] gluster volume status
Apr 8 11:27:18 stor1-idc1-lga bash[9475]: [hans->root] # kill all glusterfs and glusterfsd
Apr 8 11:27:35 stor1-idc1-lga bash[9475]: [hans->root] stop glusterd
Apr 8 11:27:56 stor1-idc1-lga bash[9475]: [hans->root] start glusterd
Apr 8 11:28:43 stor1-idc1-lga bash[9475]: [hans->root] gluster volume status
Apr 8 11:29:11 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b start
Apr 8 11:32:30 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:32:36 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:33:58 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:35:02 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:38:48 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:39:33 stor1-idc1-lga bash[9475]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:40:07 stor3-idc1-lga bash[25373]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:47:49 stor3-idc1-lga bash[25373]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 11:51:37 stor3-idc1-lga bash[25373]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 12:11:44 stor3-idc1-lga bash[25373]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
Apr 8 14:03:30 stor3-idc1-lga bash[25373]: [hans->root] gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status
The client work load is our regular production level. If you need this more detailed like IOPS etc. please specify the exact command line tool with parameters you want to see the output of.
About ext4 : We're not suffering from the 64bit issue (all are 3.0.0-17):
> Additional info:
> ext4 being used on all bricks, none are suffering from the 64bit ext4 issue.
Thanks for looking into the issue !
New insights : I started the replace-brick a day after an upgrade from 3.2.5 to 3.3.1 . The 3.3.1 has the bricks/.glusterfs/ directory trees where 3.2.5 does not. Could this be the gluster single-brick IO saturation cause ? (And if so, how does one check if this tree is fully updated so that a next replace-brick won't DOS the entire gluster volume on all nodes ?) After 7 days I stopped the destination glusterfs. The source node now says : gluster volume replace-brick vol01 stor1:/gluster/c stor3-idc1-lga:/gluster/b status Number of files migrated = 28560 Migration complete The 'number of files migrated' should be some factor 100 higher. The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed. |