Bug 822008 - Silent lockup in upgrade of sbcl
Summary: Silent lockup in upgrade of sbcl
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sbcl
Version: rawhide
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Rex Dieter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 822604 826683 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-16 04:58 UTC by Paulo Andrade
Modified: 2012-09-17 16:14 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-10 01:27:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Paulo Andrade 2012-05-16 04:58:37 UTC
Upgrading from f16 to rawhide was stuck for some time, and it was
required to kill the sbcl process to continue.

Running the command manually:

$ /usr/bin/sbcl --core /usr/lib/sbcl/sbcl-dist.core --noinform --sysinit /etc/sbcl.rc --userinit /dev/null --load /usr/lib/sbcl/install-clc.lisp
fatal error encountered in SBCL pid 1947(tid 140737353893696):
can't load .core for different runtime, sorry


Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>


$ rpm -qf  /usr/lib/sbcl/sbcl-dist.core
sbcl-1.0.51-1.fc16.x86_64

$ rpm -q sbcl
sbcl-1.0.51-1.fc16.x86_64
sbcl-1.0.56-1.fc18.x86_64

rpm -q done during upgrade.

I believe it may work if installing only sbcl in a single transaction
or using %posttrans for the script.

Comment 1 Rex Dieter 2012-05-17 15:42:58 UTC
*** Bug 822604 has been marked as a duplicate of this bug. ***

Comment 2 Rex Dieter 2012-05-17 15:53:09 UTC
Ugh, affects f17 upgrades too. :(

So 2 things going on here:
1.  %post script shouldn't never go interactive and fail gracefully, instead of waiting forever
2.  make %post script actually work.

Comment 3 Fedora Update System 2012-05-30 17:46:31 UTC
pvs-sbcl-5.0-10.fc17,maxima-5.27.0-3.fc17,sbcl-1.0.57-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/pvs-sbcl-5.0-10.fc17,maxima-5.27.0-3.fc17,sbcl-1.0.57-1.fc17

Comment 4 Hin-Tak Leung 2012-05-30 21:29:59 UTC
Just a me-too. Was going to file a new bug before I saw this. f16 to f17 proper via preupgrade. noticed there were no disk activity nor CPU actvity for a while, so I just kill that script (it was run via rpm)... 

from ps -ax in ctrl-alt-f2, it shows:

/usr/bin/sbcl --core /usr/lib/sucl/sucl-dist.core --noinform --sysinit /etc/sbcl.rc ...

and that was between
sbcl-1.0.51-1.fc16
sbcl-1.0.56-1.fc17

Comment 5 Nadav Har'El 2012-05-31 07:06:11 UTC
I had exactly the same problem: an upgrade from Fedora 16 to Fedora 17, using preupgrade, hung on the /usr/bin/sbcl command exactly as shown above (as shown by doing "ps" in a different virtual terminal). I waited for 6 hours (!!) and it wouldn't finish. Killing it with just "kill" didn't help, and I had to resort to "kill -9". I don't know what it was doing - it didn't seem to be using CPU or disk.

I think this bug should be quickly solved and its "severity" should be changed to URGENT, definitely not MEDIUM - as it will probably ruin the system of many people upgrading from Fedora 16 to Fedora 17!!! Please push the fix to the main repository (not some testing repository) as soon as possible... Remember that most users, unlike me (and the other people on this thread) will not know how to move to another terminal and find, or kill, the hung process. They will reboot, and find a half-upgraded, and probably completely-broken system which they won't know how to use.

P.S.
This situation brought to my attention two installer problems that I'll open new bugs for:

1. The installer (anaconda) doesn't know how to refresh its window; If you go to another terminal and return to the installer window, you see a blank (gray) screen. You don't even see the "upgrading lisp..." message any more, and no longer have any idea what's going on.

2. Anaconda is making a mistake by letting each package hang and destroy the whole upgrade. Packagages should be time-limited, and if a package fails to upgrade, tough luck for it, but at least the rest of the upgrade will succeeded. I wish I could say that this is the first time that this happen - I had a very similar problem a year ago with the (equally unimportant) "nethack" package which hung the whole upgrade.

P.S.#2
Don't blame me for using the "sbcl" package and bringing this situation on myself - I never even heard of this "Steel Bank Common Lisp" before today, and I don't even program in lisp. I'm assuming that some other package I installed depended on it, and this is why it got installed.

Comment 6 Hin-Tak Leung 2012-05-31 09:48:46 UTC
(In reply to comment #5)
...I waited for 6
> hours (!!) and it wouldn't finish...

Sorry to hear that. I only waited < half-an-hour - this was the 2nd error I have from preupgrade, my rpm database was hosed anyway so I was keeping a close look. (the previous error was kde-filesystem which I already filed).

indeed, that hung script needed kill -9 to kill.

> 1. The installer (anaconda) doesn't know how to refresh its window; If you
> go to another terminal and return to the installer window, you see a blank
> (gray) screen. You don't even see the "upgrading lisp..." message any more,
> and no longer have any idea what's going on.

Yes, I think the developer was relying on updates being regular enough to refresh - maybe pygtk (what is is written in?) doesn't do screen refreshes...

> 2. Anaconda is making a mistake by letting each package hang and destroy the
> whole upgrade. Packagages should be time-limited, and if a package fails to
> upgrade, tough luck for it, but at least the rest of the upgrade will
> succeeded. I wish I could say that this is the first time that this happen -
> I had a very similar problem a year ago with the (equally unimportant)
> "nethack" package which hung the whole upgrade.

Yes, I think preupgrade should have essentially the same "--skip-broken" option of yum, and given how painful this is, make that the default.

> Don't blame me for using the "sbcl" package ...

I know where it came from for my case - maxima seems to depend on it, and I did use maxima at one point.

Comment 7 Adam Williamson 2012-05-31 18:44:34 UTC
skip-broken doesn't do anything about issues like this. it only works around dependency issues by leaving packages with dependency problems out of the set of packages to be installed. it does not (and cannot) predict problems in %post scripts.

Comment 8 Matt Hirsch 2012-05-31 19:01:37 UTC
I'll just add a me too to this. I have run into this on 3 different systems, both updating from f15 and f16. For me, this is because sbcl is a dependency of maxima.

One positive point here, is that in both cases, if the user reboots the system and resumes the installation, they will be left with a mostly working system.

They will need to cleanup the packages missed by the hung installer.

#package-cleanup --cleandupes

Comment 9 Hin-Tak Leung 2012-06-01 03:46:22 UTC
(In reply to comment #8)
> #package-cleanup --cleandupes

This is dangerous if done too early - yum erase is recursive. I had a look and it wanted to erase 1/4 of my packages (the texlive 2011 testing stuff, there is a new 2012): just because my system is not completely up-to-date due to failed partial upgrade, does not mean that I want to remove all my "slightly-outdated" packages which still depends on the older duplicates.

Comment 10 Matt Hirsch 2012-06-01 04:03:26 UTC
As I understand it, cleandupes will remove the older duplicate package. To clarify, what you should do is complete the upgrade, and then run package-cleanup --cleandupes. You are right that if you run it on a partially updated system (e.g. with some mix of f17 and f16 packages, where not all f16 packages have been updated), it can cause problems as you describe. You shouldn't boot a partially updated system, since interactions between different versions of packages (with possibly unmet dependencies) are unpredictable. 

I'm not sure exactly what you mean about the texlive 2011/2012 stuff, or the interaction of package-cleanup with enabled or disabled testing repos. If you want to be extra careful, use package-cleanup --dupes to generate a list of duplicate packages, and then cull them by hand. However, in my case that list was 1800 items long...

Comment 11 Hin-Tak Leung 2012-06-01 04:34:14 UTC
(In reply to comment #10)
> I'm not sure exactly what you mean about the texlive 2011/2012 stuff, or the
> interaction of package-cleanup with enabled or disabled testing repos. If
> you want to be extra careful, use package-cleanup --dupes to generate a list
> of duplicate packages, and then cull them by hand. However, in my case that
> list was 1800 items long...

I had a 3rd-party repository for texlive 2011 (which replaces the fedora shipped texlive 2007) for f16 - there is a corresponding f17 texlive 2012 but upgrade was not automatic - one needs to replace the texlive-release rpm first - like how upgrade through yum is done - . package-cleanup --cleandupes wanted to remove nearly the whole of that, and it scared me.

In the end, what I did was to dump --dupes to a file, grep -v fc17 it, do a 
cat |xargs -n 1 rpm -e --nodeps
to remove them one by one. (with some checking between).

Comment 12 Fedora Update System 2012-06-01 17:11:34 UTC
Package pvs-sbcl-5.0-10.fc17, maxima-5.27.0-3.fc17, sbcl-1.0.57-1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing pvs-sbcl-5.0-10.fc17 maxima-5.27.0-3.fc17 sbcl-1.0.57-1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-8728/pvs-sbcl-5.0-10.fc17,maxima-5.27.0-3.fc17,sbcl-1.0.57-1.fc17
then log in and leave karma (feedback).

Comment 13 Matt Hirsch 2012-06-01 17:25:27 UTC
It's great that this is being fixed, but shouldn't this be pushed to the Fedora 15/16 repos? Once the upgrade to 17 has occurred, it's 1) difficult to test, and 2) too late to prevent the bug for other users.

Comment 14 Germano Massullo 2012-06-02 09:51:26 UTC
(In reply to comment #12)
> Package pvs-sbcl-5.0-10.fc17, maxima-5.27.0-3.fc17, sbcl-1.0.57-1.fc17:
> * should fix your issue,
> * was pushed to the Fedora 17 testing repository,
> * should be available at your local mirror within two days.
> Update it with:
> # su -c 'yum update --enablerepo=updates-testing pvs-sbcl-5.0-10.fc17
> maxima-5.27.0-3.fc17 sbcl-1.0.57-1.fc17'
> as soon as you are able to.
> Please go to the following url:
> https://admin.fedoraproject.org/updates/FEDORA-2012-8728/pvs-sbcl-5.0-10.
> fc17,maxima-5.27.0-3.fc17,sbcl-1.0.57-1.fc17
> then log in and leave karma (feedback).
Sorry, but how is it possible to test it with anaconda/preupgrade?
(In reply to comment #13)
> It's great that this is being fixed, but shouldn't this be pushed to the
> Fedora 15/16 repos? Once the upgrade to 17 has occurred, it's 1) difficult
> to test, and 2) too late to prevent the bug for other users.

Indeed

Comment 15 Rex Dieter 2012-06-02 12:27:06 UTC
Hmm... I thought I'd commented, but maybe that was in the dup'd bug, so here we go.

Re: why no < f17 fixes?

Because it's not required.  The only builds that was ever broken here were those done for fedora 17.  In particular, the only case that fails (that I'm aware of) is attempts to upgrade to sbcl-1.0.56-1.fc17


Re: how is it possible to test it with anaconda/preupgrade?

Good question, any upgrade method that can enable f17-updates repo can benefit from this fix.  otherwise, we're left with manual workarounds including:
1. uninstall sbcl prior to upgrade
2. manually kill the hung process during the upgrade (via something like:  killall -KILL sbcl)

Comment 16 Adam Williamson 2012-06-03 22:10:47 UTC
caterpillar: with preupgrade it's slightly tricky, you have to edit /boot/upgrade/ks.cfg after running the first stage of preupgrade, before rebooting, to add an updates-testing repo. (just like you did to test the preupgrade fix, but make the repo http://dl.fedoraproject.org/pub/fedora/linux/updates/testing/17/x86_64/ ). With straight DVD/netinst it's easier, just enable the updates-testing repo during the upgrade.

Comment 17 Germano Massullo 2012-06-04 10:00:30 UTC
*** Bug 826683 has been marked as a duplicate of this bug. ***

Comment 18 Adam Williamson 2012-06-05 19:34:46 UTC
Can people please karma up the fix for this, if they test and conclude that the fixed package helps? It's easy enough to test, just install F16 with sbcl, then do an upgrade to F17, enabling the updates-testing repo at repo selection. It should avoid the hang.

Comment 19 Maurizio Paolini 2012-06-06 20:05:24 UTC
(In reply to comment #18)
> Can people please karma up the fix for this, if they test and conclude that
> the fixed package helps? It's easy enough to test, just install F16 with
> sbcl, then do an upgrade to F17, enabling the updates-testing repo at repo
> selection. It should avoid the hang.

Of course it helps! However this will be too late for many people (like me).
I was upgrading via yum and got the hang in the middle and "killall -9 yum"
In the end I had a system on which "yum-complete-transaction" and
"yum upgrade ..." did not work and had to manually remove all the installed
and still duplicated packages of fc17.  Too bad.

As in a Comment above, there should be a timeout set in order to avoid infinite
hangups due to badly written pre/post-install scripts.

But this is a problem with yum/anaconda/rpm/...

Comment 20 Hin-Tak Leung 2012-06-06 23:48:26 UTC
(In reply to comment #18)
> Can people please karma up the fix for this, if they test and conclude that
> the fixed package helps? It's easy enough to test, just install F16 with
> sbcl, then do an upgrade to F17, enabling the updates-testing repo at repo
> selection. It should avoid the hang.

I agree with comment 19 - preupgrade should try a lot harder to continue, time-out and skip over locked-up pre-post package scripts, and cope with error status of scripts to come back with a "mostly" working system.

Comment 21 Nadav Har'El 2012-06-07 12:02:36 UTC
#19 and #20: I created a separate bug 827699 about that Anaconda issue (Anaconda should timeout so that hung scripts don't hang the entire installation), but unfortunately it was closed - as a dup :(

I hope someone reopens it, but I'm not optimistic :(

Like I wrote in that bug, this is *not* the first time that an upgrade has hung for me, so I hope that someone treats this as the serious issue that it is.

Comment 22 Adam Williamson 2012-06-07 17:16:50 UTC
The update still needs karma, to avoid other users being affected by the problem. Thanks.

Comment 23 Fedora Update System 2012-06-10 01:27:41 UTC
pvs-sbcl-5.0-10.fc17, maxima-5.27.0-3.fc17, sbcl-1.0.57-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Kamil Páral 2012-09-17 16:14:27 UTC
Removing CommonBugs, Fedora 17 is long time released.


Note You need to log in before you can comment on or make changes to this bug.