Bug 689223
| Summary: | [RHEL-6] statvfs tries to stat unrelated mountpoints | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||||
| Component: | kernel | Assignee: | Eric Sandeen <esandeen> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Boris Ranto <branto> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.1 | CC: | abaron, bazulay, branto, chellwig, danken, dnaori, egerman, eguan, esandeen, hateya, iheim, kzhang, mgoldboi, rwheeler, syeghiay, ykaul | ||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | storage | ||||||||
| Fixed In Version: | kernel-2.6.32-171.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 711987 (view as bug list) | Environment: | |||||||
| Last Closed: | 2011-12-06 12:45:39 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 711987 | ||||||||
| Attachments: |
|
||||||||
Odd, but both domains complain about their operation being stuck
Thread-343::ERROR::2011-03-20 14:14:33,622::sp::84::Storage.StatsThread::(run) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sp.py", line 79, in run
stats, code = self._statsfunc(self._domain)
File "/usr/share/vdsm/storage/sp.py", line 1558, in _repostats
if not domain.selftest():
File "/usr/share/vdsm/storage/nfsSD.py", line 127, in selftest
return fileSD.FileStorageDomain.selftest(self)
File "/usr/share/vdsm/storage/fileSD.py", line 298, in selftest
oop.os.statvfs(self.domaindir)
File "/usr/share/vdsm/storage/processPool.py", line 35, in wrapper
return self.runExternally(func, *args, **kwds)
File "/usr/share/vdsm/storage/processPool.py", line 63, in runExternally
raise Timeout("Operation Stuck")
Timeout: Operation Stuck
Thread-37::ERROR::2011-03-20 14:14:33,963::sp::84::Storage.StatsThread::(run) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sp.py", line 79, in run
stats, code = self._statsfunc(self._domain)
File "/usr/share/vdsm/storage/sp.py", line 1558, in _repostats
if not domain.selftest():
File "/usr/share/vdsm/storage/nfsSD.py", line 127, in selftest
return fileSD.FileStorageDomain.selftest(self)
File "/usr/share/vdsm/storage/fileSD.py", line 298, in selftest
oop.os.statvfs(self.domaindir)
File "/usr/share/vdsm/storage/processPool.py", line 35, in wrapper
return self.runExternally(func, *args, **kwds)
File "/usr/share/vdsm/storage/processPool.py", line 63, in runExternally
raise Timeout("Operation Stuck")
Timeout: Operation Stuck
Apparently
mount orion.qa.lab.tlv.redhat.com:/export/david/data /mnt/1
mount qanashead.qa.lab.tlv.redhat.com:/export/david/iso /mnt/2
iptables -A OUTPUT -d qanashead.qa.lab.tlv.redhat.com -j DROP
strace -tt -T -o out python -c "import os;os.statvfs('/mnt/1')"
- shows that python's statvfs tries to stat unrelated mountpoints:
17:11:06.052633 stat("/mnt/2", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0 <1095.005126>
which ended only after i have flushed the iptables
Version:
* python-2.6.6-9.el6.x86_64
Attached full strace.
Created attachment 486823 [details]
Strace
(In reply to comment #6) > - shows that python's statvfs tries to stat unrelated mountpoints: It doesn't on my machine. I tried running the reproducer from comment #6, and the output file doesn't contain a stat of /mnt/2. It _does_ stat various other mountpoints. What directory are you running this from? Does it still happen if you use the "-S" option to python to suppress running site.py? (In reply to comment #8) > (In reply to comment #6) > > - shows that python's statvfs tries to stat unrelated mountpoints: > It doesn't on my machine. > > I tried running the reproducer from comment #6, and the output file doesn't > contain a stat of /mnt/2. It _does_ stat various other mountpoints. > > What directory are you running this from? > > Does it still happen if you use the "-S" option to python to suppress running > site.py? Im running it from /root strace -tt -T -o out python -S -c "import os;os.statvfs('/mnt/2')" - 11:24:23.622689 stat("/mnt/1" -still happens *in order to reproduce the issue you may try to block the first mount and try to get stats on the second mount. Looking at attachment 486823 [details], I see this line:
17:11:06.046874 stat("/mnt/1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.001057>
which presumably corresponds to the statvfs
Immediately after that line in the trace I see:
17:11:06.048076 open("/proc/mounts", O_RDONLY) = 3 <0.000394>
followed by what looks like a stat of each mountpoint.
Looking at /usr/src/debug/glibc-2.12.2/sysdeps/unix/sysv/linux/internal_statvfs.c (on a Fedora 13 box) I see:
__statvfs_getflags
which has this line:
FILE *mtab = __setmntent ("/proc/mounts", "r");
which appears to then loop through the mountpoints.
This seems to be called by:
INTERNAL_STATVFS (const char *name, struct STATVFS *buf,
struct STATFS *fsbuf, struct stat64 *st)
which presumably is the:
__internal_statvfs (file, buf, &fsbuf,
stat64 (file, &st) == -1 ? NULL : &st);
within:
int
statvfs (const char *file, struct statvfs *buf)
within /usr/src/debug/glibc-2.12.2/sysdeps/unix/sysv/linux/statvfs.c
So it looks like these syscalls are being injected by glibc, within the statvfs entrypoint.
Reassigning to glibc; removing "python" from title.
statvfs must be able to fill in the statvfs.f_flags field, and only 2.6.36+ has the required support for that. When the bug happen the status of host in RHEVM is NON_RESPONSIVE commit 365b18189789bfa1acd9939e6312b8a4b4577b28
Author: Christoph Hellwig <hch>
Date: Wed Jul 7 18:53:25 2010 +0200
add f_flags to struct statfs(64)
Add a flags field to help glibc implementing statvfs(3) efficiently.
We copy the flag values from glibc, and add a new ST_VALID flag to
denote that f_flags is implemented.
Signed-off-by: Christoph Hellwig <hch>
Signed-off-by: Al Viro <viro.org.uk>
since this uses up a bit of f_spare[] it is probably kabi safe...?
Andreas, are glibc changes needed as well? Yes, glibc of course needs to changes to make use of these flags. I have no idea if RHEL6 already has these changes, but if not they shold be easily backportable as they are very isolated in the statvfs code. Thanks, just trying to figure out if/when the changes would be available in glibc for RHEL6. Is this something to try to get in before 6.1? It's pretty late, but this isn't very invasive on the kernel side or, according to hch :) on the glibc side... No reason to push for one without the other though. (In reply to comment #17) > Thanks, just trying to figure out if/when the changes would be available in > glibc for RHEL6. Is this something to try to get in before 6.1? It's pretty > late, but this isn't very invasive on the kernel side or, according to hch :) > on the glibc side... No reason to push for one without the other though. Eric, any news wrt this bug? it is quite problematic for us and we need it for 6.2 Kernel should be do-able; is there buy-in on the glibc side? Has a bug been filed for glibc? (In reply to comment #20) > Kernel should be do-able; is there buy-in on the glibc side? Has a bug been > filed for glibc? This bug was moved to kernel from glibc, so I'm guessing there isn't an additional one there. Cloning this one to save time. Ayal, thanks. We can certainly fix this for 6.2, just need to make sure we can get the glibc side as well. Patch(es) available on kernel-2.6.32-171.el6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |
Created attachment 486461 [details] vdsm-log Description of the problem: When blocking a connection to a storage server- storage domain from another server is being reported by repoStats as "valid:Flase" while actually available. Scenario: - 2 Storage servers: 1st Server: Master NFS DSD and NFS Export Domain 2nd Server: Non master NFS DSD Blocked Connection to a Storage server who contains 1-Non-Master DSD using iptables - repoStats Reports The Export Doamin from the accessible server as valid:False Thread-712::INFO::2011-03-20 14:09:52,693::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: repoStats, Return response: {'status': {'message': 'OK', 'code': 0 }, '9908a9cd-c8d2-4bc8-b4c5-57edd2625c7c': {'delay': '0.00286793708801', 'lastCheck': 1300622983.784523, 'valid': True, 'code': 0}, '74b99101-ae73-4c38-a2bf-4f8cd930e2f7': {'del ay': '62.0643398762', 'lastCheck': 1300622985.5161159, 'valid': False, 'code': 200}, '1bb2691e-af34-4a4c-8e21-8fced0fce1ce': {'delay': '62.0639359951', 'lastCheck': 1300622985.7 854619, 'valid': False, 'code': 200}} *Confirmed SD accessibility using dd. Version-Release number of selected component (if applicable): -vdsm-4.9-54.el6.x86_64 -lvm2-2.02.83-2.el6.x86_64 Additional info: Full vdsm log attached.