Bug 1256042
Summary: | Diskless HTPC freezes after exactly 6 hours | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Göran Uddeborg <goeran> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 21 | CC: | gansalmon, george, itamar, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab, olle |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-09-22 09:18:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Göran Uddeborg
2015-08-23 15:10:30 UTC
Created attachment 1066043 [details]
Data saved by 30 minutes run of: tcpdump -i em1 -w "#pluto-tcpdump-$TS" host pluto
Created attachment 1066044 [details]
Output from: ssh -n pluto top -b -d 15
Created attachment 1066045 [details]
Output from ping from server to HTPC: ping -i 15 -c 120 -D pluto
Created attachment 1066046 [details]
Output from ping from HTPC to server: ssh -n pluto ping -i 15 -D mimmi
I tried the last kernel we used from F19, 3.14.27-100.fc19.x86_64. With that, the machine continues to run just fine for longer than 6 hours. Userspace is otherwise unchanged, so we are right now running F21 userspace on an F19 kernel. (I haven't noticed any obvious issues with the combination.) As far as I can tell, whatever is causing this problem was introduced somewhere between 3.14.27-100.fc19 and 4.1.5-100.fc21. Would some kind of bisection be feasible? I imagine it would be a bit complicated as both different kernel versions and different Fedora packagings with various patch sets would be involved. I've been working on bisection scripts for the kernel but 3.14 is too old to use them unfortunately. Can you narrow down which package broke it using koji builds? We may be able to bisect if you can get a tighter window as well. This may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1232528 I hadn't realised Koji builds were kept that long. Since they are, I'll certainly try to narrow it down. Meanwhile I've tried the 3.17.4-301.fc21 kernel. It's the original F21 kernel, and is thus still available from the "fedora" channel. It freezes after 6 hours; "BAD" in the bisection. So it breaks somewhere between 3.14.27-100.fc19 and 3.17.4-301.fc21. To narrow this down further, is it the kernel version number I should use for ordering the releases when bisecting? In calendar time, 3.14.27-100.fc19 was built after 3.17.4-301.fc21. But it's the "3.14.27" and "3.17.4" parts that are most important, right? Since the oldest F21 kernel is bad, I will have to use F20 kernels to get older versions. It isn't really that much difference between the builds for different branches right? That testing is still useful? After having tried a number of kernels (with a turnaround of 6 hours for each test) I realized the problem doesn't actually lay in the kernel proper. When I reinstalled 3.14.27-100.fc19, the new installation too hangs. The difference is, of course, that the new installation's initramfs image was created using the F21 tools, while the original was done on F19. Now I'll read up on how initramfs images are made. Then I'll do a kind of bisection to find what difference between the to images is the important one. And see where this bugzilla really belongs. (Maybe back at my setup.) It's not a bug, it is by design. The newer dhclient-script sets valid_lft for the interface when it is brought up. Sorry! In case anyone comes by later: bug 1121258 is about having these parameters documented. (In reply to Göran Uddeborg from comment #9) > It's not a bug, it is by design. The newer dhclient-script sets valid_lft > for the interface when it is brought up. Sorry! Still seems like a bug to have a system freeze when it renews a dhcp lease. Maybe I should clarify for George, and any others who may read this report after the fact. Dhclient DOES update the lifetime parameters when it renews the DHCP lease. But I wasn't running dhclient on this box. I had given it a statically decided IP address in dhcpd.conf. During boot it would set this address, and then never change it. That worked fine as it "owned" the address, it only needed DHCP to find out in the beginning. No need to spend cycles on dhclient running. That is, until the new initramfs image set the lifetime parameters of the interface. Then the address stopped working after it expired. I fixed the problem by changing the timeout parameters for this and a few other statically assigned IPs in my dhcpd.conf. (I'm thus still not running dhclient on the box.) |