Bug 865674
Summary: | [RHEV-RHS] VM migration fails after self-heal | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anush Shetty <ashetty> | ||||||||||||
Component: | glusterfs | Assignee: | Pranith Kumar K <pkarampu> | ||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Anush Shetty <ashetty> | ||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 2.0 | CC: | grajaiya, pkarampu, rhs-bugs, shaines, spandura, vbellur | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2012-11-29 13:39:41 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Anush Shetty
2012-10-12 06:01:01 UTC
Anush, Do you have the sos-reports of the bricks and client logs. Pranith Created attachment 627297 [details]
brick log
Created attachment 627298 [details]
Brick 4
Created attachment 627299 [details]
Brick 3
Created attachment 627300 [details]
Brick 2
Created attachment 627301 [details]
hypervisor log
This seems like a legitimate split-brain: there is a 2 second delay between disconnects of client-0 and client-1 so there will be some writes that happen only on client-1 there is a 2 second delay between connects of client-0 and client-1 so there will be some writes that happen only on client-0 That will lead to legitimate split-brain. Why did the disconnects happen on both the subvolumes is something we need to investigate. Will be moving the bug to NEEDINFO state to figure that out. [2012-10-11 15:09:20.722501] I [client.c:2090:client_rpc_notify] 0-dist-replica-client-0: disconnected [2012-10-11 15:09:22.854799] I [client.c:2090:client_rpc_notify] 0-dist-replica-client-1: disconnected [2012-10-11 15:09:31.409719] I [client-handshake.c:1614:select_server_supported_programs] 0-dist-replica-client-0: Using Program GlusterFS 3.3.0rhsvirt1, Num (1298437), Version (330) [2012-10-11 15:09:33.413786] I [client-handshake.c:1614:select_server_supported_programs] 0-dist-replica-client-1: Using Program GlusterFS 3.3.0rhsvirt1, Num (1298437), Version (330) [2012-10-11 15:09:36.024280] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-dist-replica-replicate-1: no missing files - /7746e77b-7475-4fb8-ab7f-fd85773c5762/images/5e36cc52-acf4-4db8-83b0-cba9d68867af/f913caef-ca44-416b-832c-25a233cb1b3a. proceeding to metadata check [2012-10-11 15:09:37.418613] E [afr-self-heal-data.c:1047:afr_sh_data_fxattrop_cbk] 0-dist-replica-replicate-1: /7746e77b-7475-4fb8-ab7f-fd85773c5762/images/5e36cc52-acf4-4db8-83b0-cba9d68867af/f913caef-ca44-416b-832c-25a233cb1b3a, inspecting change log succeeded on < 2 children [2012-10-11 15:09:37.419020] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-dist-replica-replicate-1: background data missing-entry gfid self-heal failed on /7746e77b-7475-4fb8-ab7f-fd85773c5762/images/5e36cc52-acf4-4db8-83b0-cba9d68867af/f913caef-ca44-416b-832c-25a233cb1b3a [2012-10-11 15:09:39.049557] E [afr-self-heal-data.c:763:afr_sh_data_fxattrop_fstat_done] 0-dist-replica-replicate-1: Unable to self-heal contents of '/7746e77b-7475-4fb8-ab7f-fd85773c5762/images/5e36cc52-acf4-4db8-83b0-cba9d68867af/f913caef-ca44-416b-832c-25a233cb1b3a' (possible split-brain). Please delete the file from all but the preferred subvolume. Closing this bug as it is a legitimate split brain from a flakey network. |