Bug 747917 - VDSM is stuck during the startup process
Summary: VDSM is stuck during the startup process
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Federico Simoncelli
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-21 11:44 UTC by Federico Simoncelli
Modified: 2012-08-09 07:59 UTC (History)
6 users (show)

Fixed In Version: v4.9.3
Clone Of:
Environment:
Last Closed: 2012-08-09 07:59:44 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)

Description Federico Simoncelli 2011-10-21 11:44:03 UTC
Description of problem:
VDSM gets stuck during the startup process if a NFS storage is unreachable.

Version-Release number of selected component (if applicable):
vdsm-4.9.0-0

How reproducible:
100%

Steps to Reproduce:
1. connect vdsm to a NFS storage domain (as SPM)
2. block the connection to the NFS storage
3. wait for vdsm to restart
4. vdsm gets stuck on __cleanStorageRepository

Actual results:
VDSM gets stuck on __cleanStorageRepository.

Expected results:
VDSM shouldn't get stuck on __cleanStorageRepository.

Additional info:
The problem is when vdsm descends into the pool directory:

/rhev/data-center/<spUUID>

The links there (eg: mastersd and the sdUUIDs) point to unreachable files and therefore os.walk() gets stuck running os.path.isdir() on them.

Thread-11::DEBUG::2011-10-20 17:46:52,868::hsm::239::Storage.HSM::(__cleanStorageRepository) Cleaning leftovers.

# ps -Lf $(cat /var/run/vdsm/vdsmd.pid)
UID        PID  PPID   LWP  C NLWP STIME TTY      STAT   TIME CMD
[...]
vdsm     13888 12805 13988  0   15 17:46 ?        D<l    0:00 /usr/bin/python /usr/share/vdsm//vdsm
[...]

Thread 4 (Thread 0x7f59517fb700 (LWP 13988)):
#1 <built-in function stat>
#3 file '/usr/lib64/python2.6/genericpath.py', in 'isdir'
#6 file '/usr/lib64/python2.6/os.py', in 'walk'
#8 file '/usr/lib64/python2.6/os.py', in 'walk'
#10 file '/usr/share/vdsm/storage/hsm.py', in '__cleanStorageRepository'
#14 file '/usr/share/vdsm/storage/hsm.py', in 'storageRefresh'
#19 file '/usr/lib64/python2.6/threading.py', in 'run'
#22 file '/usr/lib64/python2.6/threading.py', in '__bootstrap_inner'
#25 file '/usr/lib64/python2.6/threading.py', in '__bootstrap'

Comment 1 Federico Simoncelli 2011-10-21 12:50:08 UTC
commit 80840522ffade1714791cf7ea2b3c1758f181090
Author: Federico Simoncelli <fsimonce>
Date:   Fri Oct 21 10:50:01 2011 +0000

    BZ#747917 Don't get information about mountpoints
    
    The regular os.walk() function tries to identify the files present in
    the given path. Avoiding to descend into the mountpoint is not enough
    to prevent vdsm from getting stuck if a NFS mount is unreachable, we
    should also prevent any other operation, eg: os.path.isdir().
    
    Change-Id: I16a9e54586daa766e420fa8571b19a0b744b602d

http://gerrit.usersys.redhat.com/1050

Comment 2 Dan Kenigsberg 2012-02-06 21:01:36 UTC
posted upstream http://gerrit.ovirt.org/202

Comment 3 Itamar Heim 2012-08-09 07:59:44 UTC
closing ON_QA bugs as oVirt 3.1 was released:
http://www.ovirt.org/get-ovirt/


Note You need to log in before you can comment on or make changes to this bug.