Bug 654189
| Summary: | Host hung when installing win2003 with a private bridge | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Amos Kong <akong> | ||||||||||||
| Component: | kernel | Assignee: | Herbert Xu <herbert.xu> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||
| Priority: | low | ||||||||||||||
| Version: | 5.6 | CC: | ailan, ehabkost, jolsa, jpirko, jyang, nhorman, plambri, tgraf | ||||||||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2011-10-22 18:19:26 UTC | Type: | --- | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | |||||||||||||||
| Bug Blocks: | 580948 | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Amos Kong
2010-11-17 05:04:32 UTC
Created attachment 473979 [details]
Debug vbr0 ref count leak
Can you please try applying this patch on the host and then reproduce the problem? The kernel log should then point us in the direction of the culprit. Thanks!
Created attachment 474236 [details]
dmesg
I try to release private bridge when guest is using it.
attached the dmesg of host.
Thanks! Unfortunately it seems that the log isn't complete as I only see a single hold in there. Can you please try grabbing the syslog copy of the kernel messages? If it still appears imcomplete, you may need to increase the kernel printk buffer size. Created attachment 475747 [details]
calltrace of hold/put vbr0
host hung about 60 seconds, but not outputed
"unregister_netdevice: waiting for vbr0 to become free. Usage count = .."
Created attachment 475748 [details]
hold/put vbr0 caused host fs read-only
When I hold/put vbr0, it caused host fs read-only.
host hung about 60 seconds, and not outputted "unregister_netdevice: waiting for vbr0 to become free. Usage count = .."
I'll try in other machines.
Created attachment 475757 [details]
reproduced reference-count issue
I re-tested in another machine, I try to delete vbr0 when guest is using it, kernel outputted
"""
Message from syslogd@ at Fri Jan 28 17:26:06 2011 ...
intel-8400-8-2 last message repeated 6 times
unregister_netdevice: waiting for vbr0 to become free. Usage count = 7
...
"""
I try to kill guest process by kill -9 $pid, but it becomes zombie.
# ps aux|grep qemu
root 3371 2.2 0.0 0 0 pts/1 Zl 17:20 0:10 [qemu-kvm] <defunct>
attached the detail dmesg.
Thanks a lot for the data! I tried analysing it and there may be multiple sources of leaks. But the most obvious should be cured if we backport this patch:
commit 1e493d1946a0b26b79001c18d7312d536156ff5a
Author: David S. Miller <davem>
Date: Wed Sep 10 17:27:15 2008 -0700
ipv6: On interface down/unregister, purge icmp routes too.
Johannes Berg reported that occaisionally, bringing an interface
down or unregistering it would hang for up to 30 seconds. Using
debugging output he provided it became clear that ICMP6 routes
were the culprit.
The problem is that ICMP6 routes live in their own world totally
separate from normal ipv6 routes. So there are all kinds of special
cases throughout the ipv6 code to handle this.
While we should really try to unify all of this stuff somehow,
for the time being let's fix this by purging the ICMP6 routes
that match the device in question during rt6_ifdown().
Signed-off-by: David S. Miller <davem>
I'll get a patch to you in a few days to test.
|