Red Hat Bugzilla – Bug 1290160
Add-Disk operation failed to complete
Last modified: 2016-03-10 10:28:56 EST
Description of problem:
I have a working RHEL / KVM managed by RHEV. I have a working glusterfs mount in the RHEV cluster with working volumes. When I try to create a new VM, it won't let me add new disks
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Attempted to add disk
Please include the engine and VDSM logs.
Created attachment 1104327 [details]
vdsm log from gprfs030
Created attachment 1104328 [details]
vdsm log from gprfs031
Created attachment 1104329 [details]
vdsm log from gprfs029
This testing is done on a RHEV cluster with 3 hosts gprfs029, gprfs030, gprfs031. The 3 hosts also have a 3 way replicated gluster file system. The RHEV cluster is using gluster volume gprfs029:gl_01 for storage.
I was able to successfully create 3 disks in the storage pool and run 1 VM. When I try to add more disks now, it won't work. But the gluster volume is fine because I can still run the VM that I created earlier.
Created attachment 1104332 [details]
engine.log for the RHEV cluster
Writing the volume's metadata seems to fail for some reason:
abcf08c7-19bc-4687-94a3-500c96e3ee67::ERROR::2015-12-10 07:53:34,391::volume::518::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/volume.py", line 509, in create
volType, diskType, desc, LEGAL_VOL)
File "/usr/share/vdsm/storage/volume.py", line 889, in newMetadata
File "/usr/share/vdsm/storage/fileVolume.py", line 333, in createMetadata
File "/usr/share/vdsm/storage/fileVolume.py", line 326, in __putMetadata
IOError: [Errno 22] Invalid argument
Can't imagine why, though - need to dig a bit further.
I am sorry my cluster went into a really bad state when I took the storage offline and tried to bring it back online. I had to remove the storage and recreate the datacenter.
I will try to reproduce the error and report to the BZ.
Not sure if this is related but I cannot add the storage domain to the cluster. I have re-created gluster volume and set permissions
Volume Name: gl_01
Volume ID: 857ed73d-d69c-42f8-81f7-35cfdb2e77bc
Number of Bricks: 1 x 3 = 3
But I get a Failed to add storage domain. I have verified that the volume can be mounted manually on the same hosts.
I am re-creating the whole setup. I am getting VM import errors from one of the hosts in the engine.log although this is a new set up. I am guessing there's some stuff left over on the host from the previous set up.
From the logs provided, I can see that there is connectivity issue with the gluster storage.
Right after the first Metadata writing error, there is an attempt to disconnect form the mount and it failed:
MountError: (32, ';umount: /rhev/data-center/mnt/glusterSD/gprfs029-10ge:gl__01: mountpoint not found\n')
Sanjay, is this issue still happening ?
There are logs under /var/log/glusterfs that can maybe help understanding.
Can you please provide the logs from around this time 2015-12-09 14:18:34 on host gprfs029 ?
Thread-214905::DEBUG::2015-12-09 14:18:34,195::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/umount -f -l /rhev/data-center/mnt/glusterSD/gprfs029-10ge:gl__01
In any case it seems like a gluster problem more than an issue in Vdsm.
I have not seen the problem with the new config that I created but I have theory. I think this happens if a RHEV storage pool is created on the gluster volume and if any other files are created on the gluster volume outside the RHEV storage pool. This can easily happen as gluster is a shared files system. I think this confuses RHEV because the available space on the gluster volume changes. I have not had the chance to test the theory.
Hi, I have failed to reproduce on my setup.
All disk operations were OK, before and after adding a file in the gluster volume from outside RHEV.
I think we should close this BZ unless we have a clear scenario to reproduce.
I am ok with closing this for now because my environment is not reproducing the problem after the rebuild. If I run into the problem again, I will re-open the BZ or file a new ticket.