Bug 144589
Summary: | yum upgrade FC2->FC3 hangs/freezes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Trevor Cordes <trevor> | ||||||
Component: | yum | Assignee: | Jeremy Katz <katzj> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 2 | CC: | athompso, jbj, katzj, mattdm, nobody+pnasrat | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-05-18 15:07:36 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Trevor Cordes
2005-01-09 00:08:34 UTC
I should make it more clear that I first successfully upgraded from FC1->FC2 using yum upgrade. It's the FC2->FC3 part that failed. I was not attempting to do straight FC1->FC3! You didn't have cron or anacron still running did you? It could be that your upgrade attempt got hung when something else hit your rpmdb at the same time. It won't be the first time that's happened. You might try doing an anaconda upgrade of this machine and see if it can get fixed up that way. Yes, cron and anacron would have been running as daemons. Since I hit this problem at 4am-ish, it very well could have been caused by some cron job. Not sure what would manipulate rpm in the stock FC cron stuff, but it's a possibility. Next time I will ensure I stop *cron first. I've found lots of yum FC upgrade instruction hints, and none mentioned your idea. I suppose it would be a good idea to shutdown *all* services (except sshd!) before trying such a thing again. As always, I will report back whatever I find (though this one may take time). I just ran into this problem again on another machine. It hung at 348/606 at the completing update for hdparm. This time I HAD made sure to stop cron and anacron. When it hung, I verified no cron was running with ps. There has got to be something else going on here. It's sitting there hung and I haven't killed it yet. Is there any signal or anything I can do to kick-start it back into finishing? Perhaps to just skip the one it's on? I'm desperate to not have to go onsite to fix this machine. Also, ps -ef |grep rpm shows that nothing else is running rpm right now. What is yum sitting there waiting for if not an rpm process? Created attachment 111407 [details]
terminal output from yum up to the point where it has hung
more details I thought might help: r#/tmp/strace -p 18952 Process 18952 attached - interrupt to quit futex(0xaed6438, FUTEX_WAIT, 1, NULL Created attachment 111410 [details]
listing from lsof -p psid of the hung yum process
I needed to try to get this machine running, so I had to kill the stalled yum. SIGINT had no effect. SIGHUP had no effect. kill -9 finally killed it. Here's what I will attempt as a fix for this half-updated machine: mkdir /tmp/rpm cp /var/cache/yum/base/packages/*.rpm /tmp/rpm cp /var/cache/yum/updates-released/*.rpm /tmp/rpm cd /tmp/rpm ls -1 > /tmp/ls edit /tmp/ls and remove the packages that the yum debug output said were completely updated (including post-update tasks) -- perhaps this step would be best left out? rpm -U --force * This appears to have worked and hopefully got the packages completed properly. I sure would like to solve the yum hang issue though since I don't want hit this bug again when FC4 rolls around... FYI - I've seen the same problem on a stock FC3 install, doing a "yum -y upgrade" (which pulls down 300+ packages) does the same thing... sometimes! How many kernels did you have installed at the time? adding jbj and nasrat to get any ideas they might have At that moment I had just the latest FC2 2.6 kernel installed. Well, by the time it froze it had already put in the latest FC3 kernel as well, so if you look at it that way, then the answer is 2. I perhaps should also mention that while it was hung, and after I had killed it (but before I rebooted), there were some wacky files/mounts in /tmp that were causing errors when I did a df, mount or ls /tmp. I think those are some sort of shared mem files used by yum/rpm? Anyways, there were a couple that were obviously screwed up. I think this happened before also, but I never gave it too much attention before rebooting. I don't have any mounts of my own that run off /tmp so these must be system/app generated. Ah, lucky it was still in the terminal scroll-back buffer: #df Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda3 30890364 23443936 5877280 80% / /dev/hda1 101086 6900 88967 8% /boot none 128012 0 128012 0% /dev/shm df: `/tmp/tmp.FmjnEy8727': No such file or directory df: `/tmp/tmp.WVkFrU8730': No such file or directory /tmp/tmp.WVkFrU8730 128012 0 128012 0% /dev/shm Exit 1 yum doesn't make these mounts - some packages might in %post - nevertheless - that's your problem - rpm was probably hanging during the install from that. Hmm, possibly. But isn't it strange that the first stall I had (my first post here), it was mtools it hung on, but for the recent stall it was hung on hdparm? Are you sure these /tmp shm mounts aren't related to the futex yum is using that the strace indicated it was stalled on? Perhaps some python libs or rpm libs are creating them? There's still the odd fact that I've now upgraded 9 *nearly identical* systems (with regards to installed rpms they are identical) and only 2 have stalled. If it was a %post section hosing it then a) it should stall on the same package in the same place, and b) it should stall on every system. Regardless of what the cause is, what about the possiblity of adding an alarm() timer around the calls that can possibly stall? If yum had timedout on the stalled hdparm "update completion" and then continued to the next package, I'd be in much better shape than it just stopping and leaving the system in limbo. Or, instead of alarm(), perhaps catch SIGUSR1 that will kick it onto the next iteration. Fedora Core 2 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC3 updates or in the FC4 test release, reopen and change the version to match. This bug still exists in FC3. See bug 145021 (duplicate, but perhaps should move discussion there). Anyone watching this bug who has had this issue, please put a note in bug 145021 that you have seen / are still seeing this problem. Should we resolve this as a duplicate of #145021? (Even though this one is obviously slightly older...?) Yes, I'd move the discussion to bug 145021. This bug is really just a more pathological / easier-to-reproduce case, but I'm convinced it's the same bug. *** This bug has been marked as a duplicate of 145021 *** |