Bug 1278880
Summary: | Cannot find Master Domain | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Nathan Hill <Sustugriel> | ||||||
Component: | Gluster | Assignee: | Ala Hino <ahino> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.17.10 | CC: | ahino, amureini, bugs, nsoffer, s.danzi, Sustugriel, tnisan, ylavi | ||||||
Target Milestone: | ovirt-3.6.1 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ amureini: devel_ack+ rule-engine: testing_ack+ |
||||||
Target Release: | 4.17.11 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-03-10 12:48:27 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Nathan Hill
2015-11-06 15:57:16 UTC
Ala, can you take a look too please? Hi Nathan, Could you please attach full vdsm log? Thank you, Ala Created attachment 1091260 [details]
VDSM Log 11/6; 12:01p roll up
VDSM Log
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset. Please set the correct milestone or add the z-stream flag. This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset. Please set the correct milestone or drop the z stream flag. Hi Nathan, It seems that there are two issues: 1) 2x4 volume: In 3.6 we enforce gluster volumes to be replica 3. In vdsm log you could see: Thread-13::ERROR::2015-11-06 10:23:49,785::hsm::2465::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 219, in connect self.validate() File "/usr/share/vdsm/storage/storageServer.py", line 336, in validate raise se.UnsupportedGlusterVolumeReplicaCountError(replicaCount) UnsupportedGlusterVolumeReplicaCountError: Gluster volume replica count is not supported: 'replica count = 2' However, if you still wish to use non-relica 3 volumes, you could add following configuration in /etc/vdsm/vdsm.conf: [gluster] allowed_replica_counts = 1,2,3 2) 1x3 volume: The issue is that we are expecting the server that is used to connect to gluster to be part of the volume info, but this is not always the case. We are working to fix this issue. Meanwhile, as a workaround, you could provide gluster server that is part of the replica 3 deployment. Okay, I added the line to /etc/vdsm/vdsm.conf and the error code 480 disappeared. The problem with trying to connect another host with the replica 3 volume is they are in production with live VM's. I can arrange but it'll take time. Also, few questions: -What was the motivation in raising the requirement to Replica 3? Was it due to split-brain and the pausing VM due to quorum? -If I bring two more servers online in my cluster, how would I modify the existing volume from 2x4 to 3x6? Let me know if you need more info. Negative on another host which hosts bricks for both volumes. I received the same errors, general exception and cannot find master storage domain. The workaround: 1. Maintenance 2. yum history undo <vdsm install only> 3. Activate Simply downgrading the VDSM install (and leaving all other updated packages installed) worked. I am attaching the log. Created attachment 1091797 [details]
Separate host, includes both bricks
Upgrade started at about 10:05a. Failed to attach at 10:07a.
The issue on the other host looks like the second issue mentioned in comment 6: Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 224, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/share/vdsm/storage/storageServer.py", line 323, in options backup_servers_option = self._get_backup_servers_option() File "/usr/share/vdsm/storage/storageServer.py", line 340, in _get_backup_servers_option servers.remove(self._volfileserver) ValueError: list.remove(x): x not in list In this case, gluster path provided in ovirt contains server address that is different than the one used when creating gluster volumes: gluster server domain name might be used when creating the sd (in ovirt) while server ip address might be used when creating gluster volumes (or vice versa). We have an issue if gluster server is different when creating sd than when creating gluster volumes. Could you please check gluster volume info and use gluster server as appears in the volume info? We are working on a fix for this issue. You mean the name that I used initially to create the volume? In this case would be gluster.styx.local. That happens to just be host record that points to all of the bricks, in case one goes down it will attempt to mount on another brick. There is no server named "gluster.styx.local" it simply has all of the IP's associated with each of the bricks. Essentially, I cannot proceed with that workaround. Can you run gluster volume info on the volume you are using and send the output? And the path you are using when creating gluster sd, i.e. gluster-server:/volume Gluster volume info: Volume Name: DistRepl Type: Distributed-Replicate Volume ID: 8accad3e-4394-4d44-925f-a1d5cc6d711d Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhev.styx.local:/virtual/DistRepl Brick2: repl.styx.local:/virtual/DistRepl Brick3: host.styx.local:/virtual/DistRepl Brick4: fs1.styx.local:/media/DataPartRaid5/virtual/DistRepl Options Reconfigured: cluster.server-quorum-type: server performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: fixed storage.owner-uid: 36 storage.owner-gid: 36 cluster.quorum-count: 1 Volume Name: Replication Type: Replicate Volume ID: 3d1a377f-f184-4150-926b-9a0594cfed75 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: repl.styx.local:/virtual/Replication Brick2: host.styx.local:/virtual/Replication Brick3: rhev.styx.local:/virtual/Replication Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off The Storage domain mount points are: gluster.styx.local:DistRepl gluster.styx.local:Replication What is gluster.styx.local (vs repl.styx.local, host.styx.local and rhev.styx.local)? Can you use repl.styx.local, host.styx.local or rhev.styx.local as the mount point? The reason for the failure is that gluster.styx.local doesn't appear as a brick in the volume info (which is ok but we have a bug when so) BTW, now that we enforce replica 3 (by default), if one gluster server goes down, the system will still be mounted to the other two servers. (I saw you reply regarding gluster.styx.local) Gluster.styx.local is just a host record that contains all IP addresses of all of the bricks (rhev, repl, host, fs1). The idea is to not have a single point of failure with mount points. If my mount point was rhev.styx.local:DistRepl and that server is destroyed; I can't easily recreate my sd. Instead, I simply change the IP that gluster.styx.local points to. This has been in production for over a year with this setup. It's for H/A in case of mount point failure. (In reply to Nathan Hill from comment #17) > Gluster.styx.local is just a host record that contains all IP addresses of > all of the bricks (rhev, repl, host, fs1). > > The idea is to not have a single point of failure with mount points. If my > mount point was rhev.styx.local:DistRepl and that server is destroyed; I > can't easily recreate my sd. > > Instead, I simply change the IP that gluster.styx.local points to. > > This has been in production for over a year with this setup. It's for H/A in > case of mount point failure. This is not needed now. In 3.6, when mounting a gluster volume, we get the volume info from gluster, find the other bricks, and use the backup-volfile-servers mount option with the other bricks. So assuming that you have replica 3 volume with these bricks: brick1.local:/volname brick2.local:/volname brick3.local:/volname And you configred your ovirt-engine with one of these bricks: brick2.local:/volname When connecting to gluster volume, we use this mount command: mount -t glusterfs -o backup-volfile-servers=brick1:brick3 brick2.local:/volname /mountpoint The gluster mount helper is supposed to do failover to another brick if brick2 fails. The code assumed that you are using one of the bricks in ovirt-engine A workaround for the current issue is to specify the backup-volfile-servers option in the gluster domain mount options. If this option is specified, we do not set the backup servers, and your mount should work. Even when using backup-volfile-servers we depend on the server address configured in the gluster storage domain for getting info from gluster. If this server is not available, you can put the storage domain to maintenance, and edit the storage domain to replace the address with another server. > This is not needed now. In 3.6, when mounting a gluster volume, we get > the volume info from gluster, find the other bricks, and use the > backup-volfile-servers mount option with the other bricks. > > So assuming that you have replica 3 volume with these bricks: > > brick1.local:/volname > brick2.local:/volname > brick3.local:/volname > > And you configred your ovirt-engine with one of these bricks: > > brick2.local:/volname > > When connecting to gluster volume, we use this mount command: > > mount -t glusterfs -o backup-volfile-servers=brick1:brick3 > brick2.local:/volname /mountpoint > > The gluster mount helper is supposed to do failover to another brick if > brick2 fails. > > The code assumed that you are using one of the bricks in ovirt-engine > > A workaround for the current issue is to specify the > backup-volfile-servers option in the gluster domain mount options. If > this option is specified, we do not set the backup servers, and your > mount should work. I understand. How exactly do you specify backup-volfile-servers in your storage domain manually? > > Even when using backup-volfile-servers we depend on the server address > configured in the gluster storage domain for getting info from gluster. > If this server is not available, you can put the storage domain to > maintenance, and edit the storage domain to replace the address with > another server. You cannot put storage domains into maintenance without turning off every machine associated with the storage domain, whether it was on the brick host that crashed or not. This is counter productive to high availability. Ergo, to fix a storage problem we have to turn off all processing. Whether or not this method vs the new method is needed or not needed isn't really relevant here. The point is it is functioning in VDSM 4.16, and 4.17 is supposed to be a direct upgrade path. Users having differently named GlusterFS mount points does not sound like an uncommon event, or having dedicated servers not associated with an ovirt-host at all. (In reply to Nathan Hill from comment #19) > > A workaround for the current issue is to specify the > > backup-volfile-servers option in the gluster domain mount options. If > > this option is specified, we do not set the backup servers, and your > > mount should work. > > I understand. How exactly do you specify backup-volfile-servers in your > storage domain manually? 1. Put the domain to maintenance 2. Edit it, click "Custom connection parametres" 3. Add the options you like in "Additional mount options" > > Even when using backup-volfile-servers we depend on the server address > > configured in the gluster storage domain for getting info from gluster. > > If this server is not available, you can put the storage domain to > > maintenance, and edit the storage domain to replace the address with > > another server. > > You cannot put storage domains into maintenance without turning off every > machine associated with the storage domain, whether it was on the brick host > that crashed or not. This is counter productive to high availability. Ergo, > to fix a storage problem we have to turn off all processing. I agree. Having to shutdown the data center is a huge turn off :-) But can you explain how having an alias to one of the bricks solve the issue when one brick is down? You have several hosts, each mounting the dead brick. Changing the brick ip in the DNS server will not fix your dead mountpoints. Because this is 3.5 setup, you do not use the backup-volfile-servers so there will be no failover in gluster mount helper. So the only way to fix your data center is to deactivate this storage domain, and then you can change the storage domain configuration. > > Whether or not this method vs the new method is needed or not needed isn't > really relevant here. The point is it is functioning in VDSM 4.16, and 4.17 > is supposed to be a direct upgrade path. Sure, this is why we are quickly fixing it. > > I understand. How exactly do you specify backup-volfile-servers in your > > storage domain manually? > > 1. Put the domain to maintenance > 2. Edit it, click "Custom connection parametres" > 3. Add the options you like in "Additional mount options" Neat. Thanks for the data, I think as a go forward I'm going to bring two more bricks online and create a new volume that will be 3.6 'compliant.' Aka, will be replica 3 and point to a brick host as a mount point. > > I agree. Having to shutdown the data center is a huge turn off :-) > > But can you explain how having an alias to one of the bricks solve the > issue when one brick is down? > > You have several hosts, each mounting the dead brick. Changing the brick > ip in the DNS server will not fix your dead mountpoints. > > Because this is 3.5 setup, you do not use the backup-volfile-servers > so there will be no failover in gluster mount helper. > > So the only way to fix your data center is to deactivate this storage > domain, and then you can change the storage domain configuration. > I have a feeling we might be discussing from two different technical standpoints. I was motivated to set it up this way due to recommendations in red hat storage server for power failures causing the main mount point to be offline. I can't comment on how this works programmatically, but from my experience when the main mount point goes offline, none of the hosts can mount the Gluster SD. Which I think is the point you're making. But to elaborate on how this might work in an equipment failure situation: So for an extremely remedial data flow, promise not to laugh! -Setup: rhev.styx.local and gluster.styx.local are both the same machine, set with the same IP in DNS. rhev.styx.local and repl.styx.local are replicas of each other. -rhev.styx.local goes permanently offline -other hosts attempt to mount gluster.styx.local but cannot as the host is offline, they go non-operational status. -administrator intervenes, changes host record of gluster.styx.local to point to a brick that was part of the replicate group of rhev.styx.local. In this case repl.styx.local. -administrator activates hosts that went non-operational -hosts change to status up. The above is my experience in 3.4 and 3.5, if you change the IP of the mount point to a live brick that hosts the volume, the hosts will mount and be able to access the volume as long as quorum is satisfied and as long as the newly specified gluster.styx.local was part of the replica of the previous; without setting anything to maintenance. The last part is what's crucial. I have personally seen this work in action (hosts going from non-operational to up) ... But if this is no longer necessary I will redesign my SD's to be 3.6 spec. Make sense? A Follow up, I reconfigured my gluster storage to be Tri replica, distributed across 6 nodes. I then reconfigured my storage domain to point to a separate dedicated server outside the cluster. Updated VDSM to 4.17. Worked great out of the box with this setup. I've run into some performance "Heartbeat exceeded" issues and general performance tuning not related to this bug. Overall, slick and enjoying the new features. Good job all around. Please set target release or I can't move the bug to ON_QA automatically. Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA. Tested with RHEV 3.6.3.3 and RHGS 3.1.2 RC. I have used replica 3 for my testing and there are no issue seen. |