Bug 1100211
Summary: | [SNAPSHOT] : A brick volume-id is changed after reboot of the brick's node | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura | |
Component: | snapshot | Assignee: | rjoseph | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.0 | CC: | nsathyan, rhs-bugs, spandura, ssamanta, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.0.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | SNAPSHOT | |||
Fixed In Version: | glusterfs-3.6.0.17-1.el6rhs | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1105484 (view as bug list) | Environment: | ||
Last Closed: | 2014-09-22 19:39:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1105484 |
Description
spandura
2014-05-22 09:33:44 UTC
NOTE : The volume-id of the brick is not alone changed. The contents of the brick is also lost. The logs does not of have entries when the problem occurred. It only has the latest logs where you are getting the symptom. Is it possible for you to reproduce it again? In our local setup it is not getting reproduced. From the mount logs it seems that the main volume brick is mounted with the snapshot brick. Did you perform any snapshot restore operation or mounted the volume brick explicitly, or any other activity which can cause issues with the brick mount point? I didn't perform anything else other than mentioned in the steps to recreate the issue. I will try to recreate the issue. I am able to recreate this issue on build "glusterfs 3.6.0.12 built on Jun 3 2014 11:03:28". This time, the volume-id of the brick on the rebooted node changed along with, the brick which was always online got killed. Because of this I/O failed on the mount with Input/Output Error. Log messages from brick which was always online and got shutdown: ================================================================= [2014-06-04 08:17:10.700408] E [posix.c:4274:_posix_handle_xattr_keyvalue_pair] 0-vol_rep-posix: fgetxattr failed on fd=18 while doing xattrop: Key:trusted.afr.vol_rep-client-1 (Input/output error) [2014-06-04 08:17:10.700450] I [server-rpc-fops.c:1867:server_fxattrop_cbk] 0-vol_rep-server: 65458: FXATTROP 0 (37267384-bf53-4d3a-8114-581d64090819) ==> (Success) [2014-06-04 08:17:24.543339] W [posix-helpers.c:1409:posix_health_check_thread_proc] 0-vol_rep-posix: stat() on /rhs/bricks/b2 returned: Input/output error [2014-06-04 08:17:24.543399] M [posix-helpers.c:1429:posix_health_check_thread_proc] 0-vol_rep-posix: health-check failed, going down [2014-06-04 08:17:34.555909] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1716-2014/06/04-08:17:33:650199-vol_rep-client-1-0-0 [2014-06-04 08:17:34.556005] I [server-handshake.c:578:server_setvolume] 0-vol_rep-server: accepted client from fan.lab.eng.blr.redhat.com-1716-2014/06/04-08:17:33:650199-vol_rep-client-1-0-0 (version: 3.6.0.12) [2014-06-04 08:17:34.556416] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1716-2014/06/04-08:17:33:650199-vol_rep-client-1-0-0 [2014-06-04 08:17:34.558531] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error) [2014-06-04 08:17:34.685769] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1691-2014/06/04-08:17:32:646939-vol_rep-client-1-0-0 [2014-06-04 08:17:34.685854] I [server-handshake.c:578:server_setvolume] 0-vol_rep-server: accepted client from fan.lab.eng.blr.redhat.com-1691-2014/06/04-08:17:32:646939-vol_rep-client-1-0-0 (version: 3.6.0.12) [2014-06-04 08:17:34.686235] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1691-2014/06/04-08:17:32:646939-vol_rep-client-1-0-0 [2014-06-04 08:17:34.686433] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error) [2014-06-04 08:17:34.686799] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error) [2014-06-04 08:17:34.686826] E [posix.c:148:posix_lookup] 0-vol_rep-posix: lstat on /rhs/bricks/b2/ failed: Input/output error [2014-06-04 08:17:34.686941] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error) [2014-06-04 08:17:34.686972] E [posix.c:148:posix_lookup] 0-vol_rep-posix: lstat on /rhs/bricks/b2/ failed: Input/output error [2014-06-04 08:17:34.687012] E [server-rpc-fops.c:190:server_lookup_cbk] 0-vol_rep-server: 8: LOOKUP / (00000000-0000-0000-0000-000000000001) ==> (Input/output error) [2014-06-04 08:17:54.543680] M [posix-helpers.c:1434:posix_health_check_thread_proc] 0-vol_rep-posix: still alive! -> SIGTERM [2014-06-04 08:17:54.544080] W [glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), shutting down This issue is seen because file-system UUID of origin volume and snapshot volume are same. When we take LVM snapshot file-system UUID is also replicated. There are file-system specific tools available to fix this issue, but AFAIK no file-system agnostic solution is available as of now. Will be sending a patch soon after some more investigation. Review posted in downstream https://code.engineering.redhat.com/gerrit/#/c/26739/ Verified the fix on the build "glusterfs 3.6.0.17 built on Jun 13 2014 11:01:21" using the steps as mentioned in the bug description. Bug is fixed. Moving the bug to Verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |