Bug 1290160

Summary:

Add-Disk operation failed to complete

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Sanjay Rao <srao>

Component:

vdsm

Assignee:

Fred Rolland <frolland>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Aharon Canan <acanan>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.6.0

CC:

amureini, bazulay, ecohen, gklein, lsurette, srao, tnisan, ycui, yeylon, ylavi

Target Milestone:

ovirt-3.6.2

Target Release:

3.6.0

Hardware:

Unspecified

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-01-11 14:43:44 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
vdsm log from gprfs030	none
vdsm log from gprfs031	none
vdsm log from gprfs029	none
engine.log for the RHEV cluster	none

Description Sanjay Rao 2015-12-09 19:10:44 UTC

Description of problem:
I have a working RHEL / KVM managed by RHEV. I have a working glusterfs mount in the RHEV cluster with working volumes. When I try to create a new VM, it won't let me add new disks


Version-Release number of selected component (if applicable):
RHEV 3.6
qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64
qemu-img-rhev-2.3.0-31.el7_2.3.x86_64
vdsm-4.17.10.1-0.el7ev.noarch



How reproducible:
Easily reproducible

Steps to Reproduce:
1.Attempted to add disk
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Allon Mureinik 2015-12-10 10:00:02 UTC

Please include the engine and VDSM logs.

Comment 2 Sanjay Rao 2015-12-10 12:59:51 UTC

Created attachment 1104327 [details]
vdsm log from gprfs030

Comment 3 Sanjay Rao 2015-12-10 13:00:40 UTC

Created attachment 1104328 [details]
vdsm log from gprfs031

Comment 4 Sanjay Rao 2015-12-10 13:01:13 UTC

Created attachment 1104329 [details]
vdsm log from gprfs029

Comment 5 Sanjay Rao 2015-12-10 13:05:38 UTC

This testing is done on a RHEV cluster with 3 hosts gprfs029, gprfs030, gprfs031. The 3 hosts also have a 3 way replicated gluster file system. The RHEV cluster is using gluster volume gprfs029:gl_01 for storage. 

I was able to successfully create 3 disks in the storage pool and run 1 VM. When I try to add more disks now, it won't work. But the gluster volume is fine because I can still run the VM that I created earlier.

Comment 6 Sanjay Rao 2015-12-10 13:06:28 UTC

Created attachment 1104332 [details]
engine.log for the RHEV cluster

Comment 7 Allon Mureinik 2015-12-10 13:46:25 UTC

Writing the volume's metadata seems to fail for some reason:

abcf08c7-19bc-4687-94a3-500c96e3ee67::ERROR::2015-12-10 07:53:34,391::volume::518::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 509, in create
    volType, diskType, desc, LEGAL_VOL)
  File "/usr/share/vdsm/storage/volume.py", line 889, in newMetadata
    cls.createMetadata(metaId, meta)
  File "/usr/share/vdsm/storage/fileVolume.py", line 333, in createMetadata
    cls.__putMetadata(metaId, meta)
  File "/usr/share/vdsm/storage/fileVolume.py", line 326, in __putMetadata
    f.write("EOF\n")
IOError: [Errno 22] Invalid argument

Can't imagine why, though - need to dig a bit further.

Comment 9 Sanjay Rao 2015-12-11 11:26:59 UTC

I am sorry my cluster went into a really bad state when I took the storage offline and tried to bring it back online. I had to remove the storage and recreate the datacenter.

I will try to reproduce the error and report to the BZ.

Comment 10 Sanjay Rao 2015-12-11 11:59:48 UTC

Not sure if this is related but I cannot add the storage domain to the cluster. I have re-created gluster volume and set permissions

Volume Name: gl_01
Type: Replicate
Volume ID: 857ed73d-d69c-42f8-81f7-35cfdb2e77bc
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gprfs029-10ge:/brick/b01/g
Brick2: gprfs030-10ge:/brick/b01/g
Brick3: gprfs031-10ge:/brick/b01/g
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on


But I get a Failed to add storage domain. I have verified that the volume can be mounted manually on the same hosts.

Comment 11 Sanjay Rao 2015-12-11 12:40:44 UTC

I am re-creating the whole setup. I am getting VM import errors from one of the hosts in the engine.log although this is a new set up. I am guessing there's some stuff left over on the host from the previous set up.

Comment 16 Fred Rolland 2015-12-21 15:45:00 UTC

Hi,

From the logs provided, I can see that there is connectivity issue with the gluster storage.
Right after the first Metadata writing error, there is an attempt to disconnect form the mount and it failed:

MountError: (32, ';umount: /rhev/data-center/mnt/glusterSD/gprfs029-10ge:gl__01: mountpoint not found\n')

Sanjay, is this issue still happening ?

There are logs under /var/log/glusterfs that can maybe help understanding.
Can you please provide the logs from around this time 2015-12-09 14:18:34 on host gprfs029 ?

Thread-214905::DEBUG::2015-12-09 14:18:34,195::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/umount -f -l /rhev/data-center/mnt/glusterSD/gprfs029-10ge:gl__01

In any case it seems like a gluster problem more than an issue in Vdsm.

Thanks,

Freddy

Comment 18 Sanjay Rao 2016-01-04 19:20:25 UTC

I have not seen the problem with the new config that I created but I have theory. I think this happens if a RHEV storage pool is created on the gluster volume and if any other files are created on the gluster volume outside the RHEV storage pool. This can easily happen as gluster is a shared files system. I think this confuses RHEV because the available space on the gluster volume changes. I have not had the chance to test the theory.

Comment 19 Fred Rolland 2016-01-11 12:01:50 UTC

Hi, I have failed to reproduce on my setup.

All disk operations were OK, before and after adding a file in the gluster volume from outside RHEV.

I think we should close this BZ unless we have a clear scenario to reproduce.

Comment 20 Sanjay Rao 2016-01-11 14:33:38 UTC

I am ok with closing this for now because my environment is not reproducing the problem after the rebuild. If I run into the problem again, I will re-open the BZ or file a new ticket.