Bug 871701

Summary: When start one nfs pool with 4.5T content , out of memory error occurs.
Product: Red Hat Enterprise Linux 6 Reporter: hongming <honzhang>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: acathrow, dallan, dyasny, dyuan, mzhan, rwu, weizhan, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-12 23:30:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirt debug log
none
libvirt-0.10.2-6.el6.d301360ed5e43-log none

Description hongming 2012-10-31 06:13:07 UTC
Description of problem:
When start one nfs pool with 4.5T content , out of memory error occurs. And the nfs with 4.5T content can be mounted by manual in host. When start one nfs pool with 1.8T content, it works fine. (the two nfs dir have the same configuration /vol/S3/libvirtmanual and /vol/S3/libvirtauto ) 


from libvirtd.log 

2012-10-30 09:50:09.193+0000: 31970: warning : virStorageBackendVolOpenCheckMode:1031 : ignoring socket '/var/lib/libvirt/images/nfs-libvirtmanual/fcoemon.dcbd.1673' 
...... 
2012-10-30 09:50:09.348+0000: 31970: error : virStorageFileGetMetadataFromBuf:716 : out of memory 


Version-Release number of selected component (if applicable):
libvirt-0.10.2-6.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
# virsh pool-list --all
Name                 State      Autostart 
-----------------------------------------
default              active     no             
nfs1                 inactive   no        
nfs2                 active     no       


# virsh pool-dumpxml nfs1
<pool type='netfs'>
  <name>nfs1</name>
  <uuid>87eb3fed-7b18-a118-499b-3d1d6124ac33</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <host name=' 10.66.90.121 '/>
    <dir path=' /vol/S3/libvirtmanual '/>
    <format type='auto'/>
  </source>
  <target>
    <path>/var/lib/libvirt/images/nfs-libvirtmanual</path>
    <permissions>
      <mode>0755</mode>
      <owner>-1</owner>
      <group>-1</group>
    </permissions>
  </target>
</pool>


# virsh pool-start nfs1
error: Failed to start pool nfs1
error: out of memory

  
Actual results:
When start one nfs pool with 4.5T content , out of memory error occurs.

Expected results:
It works fine.

Additional info:

Comment 1 hongming 2012-10-31 06:14:38 UTC
Created attachment 635947 [details]
libvirt debug log

Comment 2 hongming 2012-10-31 06:20:27 UTC
The nfs storage have 9700 volumes.
#  find ./ -type f|wc -l
9701

Comment 3 hongming 2012-10-31 06:34:48 UTC
And the active nfs pool with 15096 volumes and 1.8T. It works fine.

# find ./ -type f|wc -l
15096

# du -smh
1.8T	.

Comment 4 Peter Krempa 2012-10-31 10:57:14 UTC
I improved error reporting from the function that failed upstream:

commit ca043b8c061cee39aada6b6ae6e9ce28f94c02b5
Author: Peter Krempa <pkrempa>
Date:   Wed Oct 31 11:17:41 2012 +0100

    util: Improve error reporting from absolutePathFromBaseFile helper
    
    There are multiple reasons canonicalize_file_name() used in
    absolutePathFromBaseFile helper can fail. This patch enhances error
    reporting from that helper.

Comment 5 Peter Krempa 2012-10-31 10:59:33 UTC
I created a scratch-build with the patch above to test the real cause of the problem: https://brewweb.devel.redhat.com/taskinfo?taskID=5032795

Could you please try to reproduce the problem and report back?

Comment 7 hongming 2012-11-01 02:08:14 UTC
Reproduce it as follows

# rpm -qa|grep libvirt
libvirt-python-0.10.2-6.el6.d301360ed5e43.x86_64
libvirt-client-0.10.2-6.el6.d301360ed5e43.x86_64
libvirt-devel-0.10.2-6.el6.d301360ed5e43.x86_64
libvirt-debuginfo-0.10.2-6.el6.d301360ed5e43.x86_64
libvirt-0.10.2-6.el6.d301360ed5e43.x86_64


# cat nfs.xml
<pool type='netfs'>
  <name>nfs1</name>
  <source>
    <host name='10.66.90.121'/>
    <dir path='/vol/S3/libvirtmanual'/>
    <format type='auto'/>
  </source>
  <target>
    <path>/var/lib/libvirt/images/nfs-libvirtmanual</path>
  </target>
</pool>

# virsh pool-define nfs.xml
Pool nfs1 defined from nfs.xml

# virsh pool-list --all
Name                 State      Autostart 
-----------------------------------------
default              active     no                
nfs1                 inactive   no        
     
# virsh pool-start nfs1
error: Failed to start pool nfs1
error: Can't canonicalize path '/mnt/testkf9.qcow2': No such file or directory

Comment 8 hongming 2012-11-01 02:16:15 UTC
Created attachment 636409 [details]
libvirt-0.10.2-6.el6.d301360ed5e43-log

Comment 9 weizhang 2012-11-09 05:49:45 UTC
I try with libvirt-0.10.2-7.el6.x86_64
# virsh pool-dumpxml nfs1
<pool type='netfs'>
  <name>nfs1</name>
  <uuid>87eb3fed-7b18-a118-499b-3d1d6124ac33</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <host name=' 10.66.90.121 '/>
    <dir path=' /vol/S3/libvirtmanual '/>
    <format type='auto'/>
  </source>
  <target>
    <path>/var/lib/libvirt/images/nfs-libvirtmanual</path>
    <permissions>
      <mode>0755</mode>
      <owner>-1</owner>
      <group>-1</group>
    </permissions>
  </target>
</pool>

# virsh pool-build nfs1
# virsh pool-start nfs1
error: Failed to start pool nfs1
error: out of memory

And after I destroy nfs1, found that my default pool can not be started, and report the same error, even if I restart libvirtd or restart host, it can not be started anymore. The dir /var/lib/libvirt/images/nfs-libvirtmanual I have already removed.

# virsh pool-start default
error: Failed to start pool default
error: out of memory

The default pool xml is
<pool type='dir'>
  <name>default</name>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
  </source>
  <target>
    <path>/var/lib/libvirt/images</path>
    <permissions>
      <mode>0700</mode>
      <owner>0</owner>
      <group>0</group>
    </permissions>
  </target>
</pool>

Comment 10 Peter Krempa 2012-11-12 23:30:52 UTC
The symptoms really look like in BZ 874860. It might be worth trying to reproduce this with the build containing fix for 874860 if the reporter will be able to reproduce this problem at the time the build will be ready.

*** This bug has been marked as a duplicate of bug 874860 ***