Bug 1290160 - Add-Disk operation failed to complete
Add-Disk operation failed to complete
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
Unspecified Linux
unspecified Severity high
: ovirt-3.6.2
: 3.6.0
Assigned To: Fred Rolland
Aharon Canan
Depends On:
  Show dependency treegraph
Reported: 2015-12-09 14:10 EST by Sanjay Rao
Modified: 2016-03-10 10:28 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-01-11 09:43:44 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
vdsm log from gprfs030 (821.75 KB, application/x-gzip)
2015-12-10 07:59 EST, Sanjay Rao
no flags Details
vdsm log from gprfs031 (486.60 KB, application/x-gzip)
2015-12-10 08:00 EST, Sanjay Rao
no flags Details
vdsm log from gprfs029 (1.03 MB, application/x-gzip)
2015-12-10 08:01 EST, Sanjay Rao
no flags Details
engine.log for the RHEV cluster (6.05 MB, text/plain)
2015-12-10 08:06 EST, Sanjay Rao
no flags Details

  None (edit)
Description Sanjay Rao 2015-12-09 14:10:44 EST
Description of problem:
I have a working RHEL / KVM managed by RHEV. I have a working glusterfs mount in the RHEV cluster with working volumes. When I try to create a new VM, it won't let me add new disks

Version-Release number of selected component (if applicable):
RHEV 3.6

How reproducible:
Easily reproducible

Steps to Reproduce:
1.Attempted to add disk

Actual results:

Expected results:

Additional info:
Comment 1 Allon Mureinik 2015-12-10 05:00:02 EST
Please include the engine and VDSM logs.
Comment 2 Sanjay Rao 2015-12-10 07:59 EST
Created attachment 1104327 [details]
vdsm log from gprfs030
Comment 3 Sanjay Rao 2015-12-10 08:00 EST
Created attachment 1104328 [details]
vdsm log from gprfs031
Comment 4 Sanjay Rao 2015-12-10 08:01 EST
Created attachment 1104329 [details]
vdsm log from gprfs029
Comment 5 Sanjay Rao 2015-12-10 08:05:38 EST
This testing is done on a RHEV cluster with 3 hosts gprfs029, gprfs030, gprfs031. The 3 hosts also have a 3 way replicated gluster file system. The RHEV cluster is using gluster volume gprfs029:gl_01 for storage. 

I was able to successfully create 3 disks in the storage pool and run 1 VM. When I try to add more disks now, it won't work. But the gluster volume is fine because I can still run the VM that I created earlier.
Comment 6 Sanjay Rao 2015-12-10 08:06 EST
Created attachment 1104332 [details]
engine.log for the RHEV cluster
Comment 7 Allon Mureinik 2015-12-10 08:46:25 EST
Writing the volume's metadata seems to fail for some reason:

abcf08c7-19bc-4687-94a3-500c96e3ee67::ERROR::2015-12-10 07:53:34,391::volume::518::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 509, in create
    volType, diskType, desc, LEGAL_VOL)
  File "/usr/share/vdsm/storage/volume.py", line 889, in newMetadata
    cls.createMetadata(metaId, meta)
  File "/usr/share/vdsm/storage/fileVolume.py", line 333, in createMetadata
    cls.__putMetadata(metaId, meta)
  File "/usr/share/vdsm/storage/fileVolume.py", line 326, in __putMetadata
IOError: [Errno 22] Invalid argument

Can't imagine why, though - need to dig a bit further.
Comment 9 Sanjay Rao 2015-12-11 06:26:59 EST
I am sorry my cluster went into a really bad state when I took the storage offline and tried to bring it back online. I had to remove the storage and recreate the datacenter.

I will try to reproduce the error and report to the BZ.
Comment 10 Sanjay Rao 2015-12-11 06:59:48 EST
Not sure if this is related but I cannot add the storage domain to the cluster. I have re-created gluster volume and set permissions

Volume Name: gl_01
Type: Replicate
Volume ID: 857ed73d-d69c-42f8-81f7-35cfdb2e77bc
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Brick1: gprfs029-10ge:/brick/b01/g
Brick2: gprfs030-10ge:/brick/b01/g
Brick3: gprfs031-10ge:/brick/b01/g
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on

But I get a Failed to add storage domain. I have verified that the volume can be mounted manually on the same hosts.
Comment 11 Sanjay Rao 2015-12-11 07:40:44 EST
I am re-creating the whole setup. I am getting VM import errors from one of the hosts in the engine.log although this is a new set up. I am guessing there's some stuff left over on the host from the previous set up.
Comment 16 Fred Rolland 2015-12-21 10:45:00 EST

From the logs provided, I can see that there is connectivity issue with the gluster storage.
Right after the first Metadata writing error, there is an attempt to disconnect form the mount and it failed:

MountError: (32, ';umount: /rhev/data-center/mnt/glusterSD/gprfs029-10ge:gl__01: mountpoint not found\n')

Sanjay, is this issue still happening ?

There are logs under /var/log/glusterfs that can maybe help understanding.
Can you please provide the logs from around this time 2015-12-09 14:18:34 on host gprfs029 ?

Thread-214905::DEBUG::2015-12-09 14:18:34,195::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/umount -f -l /rhev/data-center/mnt/glusterSD/gprfs029-10ge:gl__01

In any case it seems like a gluster problem more than an issue in Vdsm.


Comment 18 Sanjay Rao 2016-01-04 14:20:25 EST
I have not seen the problem with the new config that I created but I have theory. I think this happens if a RHEV storage pool is created on the gluster volume and if any other files are created on the gluster volume outside the RHEV storage pool. This can easily happen as gluster is a shared files system. I think this confuses RHEV because the available space on the gluster volume changes. I have not had the chance to test the theory.
Comment 19 Fred Rolland 2016-01-11 07:01:50 EST
Hi, I have failed to reproduce on my setup.

All disk operations were OK, before and after adding a file in the gluster volume from outside RHEV.

I think we should close this BZ unless we have a clear scenario to reproduce.
Comment 20 Sanjay Rao 2016-01-11 09:33:38 EST
I am ok with closing this for now because my environment is not reproducing the problem after the rebuild. If I run into the problem again, I will re-open the BZ or file a new ticket.

Note You need to log in before you can comment on or make changes to this bug.