Bug 886061 - 'fedora' or 'updates' repository error must be fatal, otherwise the upgraded system might be broken heavily
Summary: 'fedora' or 'updates' repository error must be fatal, otherwise the upgraded ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: fedup
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Will Woods
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-11 12:37 UTC by Kamil Páral
Modified: 2013-01-23 19:06 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-01-23 19:06:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
fedup output with OK repositories (25.07 KB, text/plain)
2012-12-11 12:38 UTC, Kamil Páral
no flags Details
fedup debug output with OK repositories (114.72 KB, text/plain)
2012-12-11 12:39 UTC, Kamil Páral
no flags Details
fedup output with broken 'fedora' repository (21.37 KB, text/plain)
2012-12-11 12:39 UTC, Kamil Páral
no flags Details
fedup debug output with broken 'fedora' repository (95.02 KB, text/plain)
2012-12-11 12:39 UTC, Kamil Páral
no flags Details

Description Kamil Páral 2012-12-11 12:37:38 UTC
Description of problem:
fedup currently ignores when a core fedora repository ('fedora' or 'updates') is unavailable. That has serious implications, resulting in a half-performed upgrade and potentially broken system.

I noticed the issue accidentally, while performing an upgrade I saw a message like this:

> No upgrade available for the following repos: fedora

When I ran fedup again, 'fedora' repository was correctly refreshed and more packages were downloaded. I played with it a bit, and figured out that if there is a temporary repository problem (e.g. fedora mirror manager returning HTTP 500 or similar for a single repository request), fedup will happily ignore that repository. That might be desired for third-party repos, but not for core fedora repos.

If you happen to be the unlucky person for who one of the repositories fail (preferably the 'fedora' repo), fedup will download just some of the packages. When the upgrade process runs, it will upgrade just part of the system. The outcome is a system in an unknown state, which might boot or might not.

I tried to simulate the issue on a F17 minimal system by editing fedora.repo and changing the URL to something non-existent. Instead of 257+71 packages downloaded, only 220 were downloaded. The upgraded system had 12 .fc17 packages, most notably kernel (i.e. kernel was not upgraded at all). Also "yum update" offered me 76 packages to update. That is of course dependent on the current state of the mirrors.


Version-Release number of selected component (if applicable):
fedup-0.7.2-0.git20121206

How reproducible:
always

Steps to Reproduce:
1. edit fedora.repo and make it point at unavailable location
2. run # fedup-cli --network 18 --debuglog fedup.log --instrepo http://download.fedoraproject.org/pub/fedora/linux/releases/test/18-Beta/Fedora/x86_64/os
3. see "No upgrade available for the following repos: fedora" on stdout
4. see that a fewer packages were downloaded compared to a standard run
5. fix back the fedora.repo change
6. reboot and upgrade
7. see that some important packages are still .fc17, the system might not boot or malfunction in some other way
8. see that yum update offers you packages to be updated

Expected results:
if 'fedora' or 'updates' repo is not available, fedup should halt, print a reasonable message and ask user to try again in a while

Comment 1 Kamil Páral 2012-12-11 12:38:46 UTC
Created attachment 661412 [details]
fedup output with OK repositories

Comment 2 Kamil Páral 2012-12-11 12:39:07 UTC
Created attachment 661414 [details]
fedup debug output with OK repositories

Comment 3 Kamil Páral 2012-12-11 12:39:29 UTC
Created attachment 661415 [details]
fedup output with broken 'fedora' repository

Comment 4 Kamil Páral 2012-12-11 12:39:45 UTC
Created attachment 661416 [details]
fedup debug output with broken 'fedora' repository

Comment 5 Kamil Páral 2012-12-11 12:44:27 UTC
Nominating as F18 blocker. Criterion:

" For each one of the release-blocking package sets ('minimal', and the package sets for each one of the release-blocking desktops), it must be possible to successfully complete an upgrade from a fully updated installation of the previous stable Fedora release with that package set installed, using all officially recommended upgrade mechanisms. The upgraded system must meet all release criteria "
https://fedoraproject.org/wiki/Fedora_18_Final_Release_Criteria

If a single network/server glitch is sufficient for fedup to omit a large portion of packages that are needed for a successful upgrade, the criterion is then violated in those cases. The state of the upgraded system is non-deterministic, and it can work quite fine, but it can also be utterly broken and fail to boot.

Comment 6 Kamil Páral 2012-12-12 17:56:34 UTC
Will, is there anything I have missed? It seems like a serious bug and we need some developer feedback for the blocker bug decision. Thanks.

Comment 7 Will Woods 2012-12-12 22:25:54 UTC
In normal operation, the "instrepo" is autodetected, and it's an actual repo, with the basic Fedora package set (i.e. the DVD contents) inside it. If the default instrepo is missing, you can't get the kernel/initrd, so you can't run the upgrade.

So this bug will only happen if you have a system where:

  a) you're using a custom --instrepo with no packages in it, and
  b) something happens to your "fedora" repo

which is pretty unlikely to happen outside of the confines of super-special-case testing.

Even if you do trigger this bug - since glibc, bash, coreutils, systemd, grub, kernel, etc. are all in the "fedora" repo, none of those would be touched during the upgrade.. so it's pretty likely your system could still boot OK after the misfired upgrade. Did yours? (You said the system "might not boot", but you don't say whether or not yours was still bootable..)

So: yeah, it's kind of a problem if your core repos are missing. But once mirrormanager handles 'repo=fedora-install-$releasever', it shouldn't be possible to get into this situation unless you're overriding --instrepo and setting things up yourself.

Comment 8 Jaroslav Reznik 2012-12-13 09:54:02 UTC
I think Kamil describes that b) option - "fedora" repo is somehow broken (unavailable, incomplete mirror...).

How preupgrade handled it?

Comment 9 Kamil Páral 2012-12-13 12:30:53 UTC
(In reply to comment #7)
> So this bug will only happen if you have a system where:
> 
>   a) you're using a custom --instrepo with no packages in it, and

There is OR between these lines, right? Because my instrepo had package repo inside, see Steps to Reproduce.

>   b) something happens to your "fedora" repo
> 
> which is pretty unlikely to happen outside of the confines of
> super-special-case testing.

Not that unlikely, because that's how I found this bug, it happened. Actually Fedora's MirrorManager (the tool behind mirrors.fedoraproject.org) is unavailable pretty often, it's not a one-in-a-million chance that it rejects your request. More likely one-in-a-hundred or one-in-a-thousand. I also know this from my experience with automated testing - yum mirror requests and download.fp.o requests fail quite often.


> 
> Even if you do trigger this bug - since glibc, bash, coreutils, systemd,
> grub, kernel, etc. are all in the "fedora" repo, none of those would be
> touched during the upgrade.. so it's pretty likely your system could still
> boot OK after the misfired upgrade.

Can't it happen that systemd is in "updates" repo but dracut is not, so that I end up with fc18 systemd and fc17 dracut, probably not going too well together (artificial example, you might think of better ones)?

> Did yours? (You said the system "might
> not boot", but you don't say whether or not yours was still bootable..)

Mine booted, yes. It was a bit weird, all the console text was bold for was reason, but it booted. It was a minimal install, there's lower chance something goes wrong there than in full desktop system.

But I think the point is in "might go wrong", not in "went wrong". It can depend on the exact state of repositories in that exact time. I can hardly simulate the possibilities. Even if I do 10 test runs in different times, it still doesn't prove much.

> So: yeah, it's kind of a problem if your core repos are missing. But once
> mirrormanager handles 'repo=fedora-install-$releasever', it shouldn't be
> possible to get into this situation unless you're overriding --instrepo and
> setting things up yourself.

Do I understand it correctly that once F18 is out, the automatically discovered instrepo will be exactly the same as fedora.repo? Then missing fedora.repo would do no harm.

But we can also talk about missing fedora-updates.repo. Doesn't that break the upgrade path for many packages, if you don't have "updates" downloaded?

My idea is that this could be very easy to fix: If either instrepo or "fedora" or "updates" repo is unavailable, halt the program and announce the problem ("repository XYZ unavailable, please run again in a while"). Is it more difficult than I suppose?

Comment 10 Adam Williamson 2012-12-17 18:32:54 UTC
Discussed at 2012-12-17 blocker review meeting: http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-17/f18final-blocker-review-5.2012-12-17-16.40.log.txt . We kicked this around for a while but ultimately couldn't come to a decision.

The worst possible case of this bug is if fedup is able to contact/use the 'updates' repository but not the 'fedora' repository. In this case, a partial - and broken - upgrade is theoretically possible (packages which happen to have updates in f18 get upgraded, others don't).

We need an evaluation of how likely/possible it will be that this case will happen after F18 official release, and whether it will actually lead to broken upgrades (or if, as was posited in the meeting, it may be the case that the upgrade process would simply fail without breaking the system).

Comment 11 Will Woods 2012-12-18 17:18:17 UTC
(In reply to comment #9)

> Do I understand it correctly that once F18 is out, the automatically
> discovered instrepo will be exactly the same as fedora.repo? Then missing
> fedora.repo would do no harm.

Almost - it's the same as the content of the DVD. It's basically using the $MIRROR/18/Fedora/$ARCH/os repo, so all the core packages are there, plus GNOME, KDE, etc.

The auto-discovered instrepo is live now, by the way. See e.g.:
http://mirrors.fedoraproject.org/metalink?repo=fedora-install-18&arch=x86_64

> But we can also talk about missing fedora-updates.repo. Doesn't that break
> the upgrade path for many packages, if you don't have "updates" downloaded?

This would be exactly the same result as upgrading using a DVD used to give - you get the contents of the 'Fedora' repo only.

Given that this used to be our recommended upgrade path, I doubt we'd consider it a bug now?

> My idea is that this could be very easy to fix: If either instrepo or
> "fedora" or "updates" repo is unavailable, halt the program and announce the
> problem ("repository XYZ unavailable, please run again in a while"). Is it
> more difficult than I suppose?

Isn't it always?

1) You have evidence - from long experience - of which repos are "critical" and which aren't. But there's no policy which defines this, so what happens if the repos change names?

2) If this bug still applies in the case where you have instrepo but no updates - i.e. if missing the "updates" repo is considered sufficient to block the upgrade - shouldn't that also apply if e.g. the rpmfusion repos were missing?

3) For some third-party repos (e.g. Dropbox) there isn't an F18 version that matches the F17 repo. So those repos will *definitely* be missing. Which means: if we block for *any* missing repo, systems with e.g. the Dropbox repo will never be able to upgrade, unless there's an override switch.

So the questions are: 
  What repos are "critical"?
  How do we identify them?
  What happens if a "critical" repo is missing? A simple warning, an overrideable error, or a hard error? 

I suggest the following answer, which is what's already implemented:

  There's only one truly *critical* repo, and that's the install repo. 
  The install repo has the base system plus Xorg, GNOME, KDE, etc.
  (It's equivalent to DVD upgrades in previous releases.)
  If it's missing, that's a hard error.

  Any other missing repos will print a warning, and it's up to the user
  to decide whether to re-run because of missing repos or proceed anyway.

I think the root problem here is that the warning gets printed very early and has gone off-screen by the time the "Reboot to start upgrade" message appears, so it seems somewhat surprising if it happens.

I could make the warning message more prominent by repeating it at the end of the run, before the "Reboot" message?

Comment 12 Will Woods 2012-12-18 17:28:18 UTC
To clarify, the current situation with this bug is:

Due to network problems, one or more of the repo mirrors might fail temporarily.

Fedup will retry until it runs out of mirrors for that repo.

If *all* the mirrors for a given repo fail, a warning message is printed and that repo is disabled.

If the install repo fails, the upgrade cannot proceed.

Otherwise, the upgrade can be started and will complete successfully.

The upgraded system will boot and run - it will work just like a DVD upgrade from previous releases - but some updates will not be installed.

In short: If *all* the mirrors fail for an update repo, you can still run the upgrade, and the resulting system will work fine, but it won't have those updates.

Comment 13 Kamil Páral 2012-12-18 22:57:56 UTC
For the sake of any blocker bug meeting or similar process, I'm attaching an IRC log of today's conversation between me and Will. Summary: I clarified the root cause (all mirrors doesn't have to be unavailable, it is sufficient for the mirror list itself to be unavailable). We couldn't agree which repositories are critical, Will doesn't want to maintain the list on his own. He would be fine with releng declaring it somehow. It seems very likely that fedup will at least visibly warn user (but not halt) if one of his repos (any, not just fedora+updates) could not be contacted. Even though I'd like to have the process safer, it would definitely improve this situation. Will also said he would think about it more and try to find a more proper solution.

The log follows:
<kparal> wwoods: "If *all* the mirrors for a given repo fail" - I was referring to the situation when you don't get a list of mirrors, so all the mirrors fail instantly
<wwoods> kparal: that shouldn't happen for any of the normal repos, since we have mirrorlists for those
<wwoods> including the automatic instrepo
<kparal> wwoods: yes, but you send a http request to get the list, right? and you might get HTTP 500 server error
<kparal> that's what I received without any simulation, it just happened
<kparal> a second later everything worked OK
<wwoods> welp, that behavior is up to yum
<kparal> I wanted to clarify what I was referring to
<wwoods> I gotcha though
<wwoods> probably the warning for missing repos just needs to be more prominent
<kparal> second, the DVD upgrade method was really frowned upon, kkofler being the loudest opponent, so it's not really great to have it back in some cases. it caused lots of kernel-related issues for F17
<wwoods> kkofler is against it?? you don't say
<kparal> wwoods: and third, the 'fatal' repos are really easy - just instrepo, fedora and updates. nothing else needed
<kparal> we can't catch _all_ cases of something going wrong, but these ones are really easy and we know they have to work
<kparal> otherwise the upgrade is risky
<wwoods> kparal: mmm, and what about when I inevitably have to port to RHEL? which are the core repos there?
<wwoods> and what about when they change names in RHEL, or in Fedora
<kparal> wwoods: how often does that happen? then we change it in fedup too
<wwoods> nah
<wwoods> it's just silly: we already have one actually-critical repo
<wwoods> and it's already a critical failure if that's missing
<kparal> it will be prone to the same problem as the kernel package in F17
<wwoods> in the absence of some data from the repos/distro about which repos are critical and which are optional, I suggest that it's up to the user to decide
<kparal> wwoods: I suggest we set a safe defaults
<wwoods> you suggest that I maintain the list of safe defaults
<wwoods> I basically refuse to be the gatekeeper
<kparal> wwoods: for gods sake, repository names haven't changed in a lifetime!
<wwoods> dude, i get the problem. I'm saying your suggested implementation is weak at best
<kparal> and even if they do change, it won't happen mid-release
<wwoods> get me a guarantee from rel-eng that they'll never change the repo names and that's fine
<kparal> they won't change it mid-release, it would be changed in Rawhide at best
<wwoods> but without that agreed-upon policy the implementation resting on an assumption
<kparal> wwoods: I think you have to assume much worse things in OSS. the question is just what is reasonable
<kparal> wwoods: and I'll personally write you a patch once that happens
<wwoods> or, even better, get them to agree to put "critical=1" in the repos that are critical
<wwoods> then I'm not responsible for maintaining the list
<wwoods> and then it also scales to third party repos / distros
<kparal> wwoods: would you be so kind to put a large warning message as the last line, if some repository fails? something like REPOSITORY "FEDORA" IS UNAVAILABLE, THE UPGRADE MIGHT NOT SUCCEED. YOU MIGHT WANT TO RUN FEDUP AGAIN TO RULE OUT TEMPORARY NETWORK ERRORS
<kparal> also the GUI should display that
<kparal> at least that to help the poor users to avoid broken systems
<kparal> it seems I won't convince you to any safer but no so elegant solution
<kparal> wwoods: the other (better) option would be to add a new option like --skipbrokenrepos that needs to be supplied if any repository error should not be fatal
<kparal> the user still has a choice, but the default is safe
### The discussion followed regarding third-party repositories, but it was long and I decided not to attach it here. Thinking about third-party repos is definitely a step in a good direction, but I feel it is an "extra" step than I asked for, and therefore not fully relevant to this particular bug report.

Comment 14 Adam Williamson 2012-12-19 00:17:35 UTC
well, reading c#11 carefully:

"I suggest the following answer, which is what's already implemented:

  There's only one truly *critical* repo, and that's the install repo. 
  The install repo has the base system plus Xorg, GNOME, KDE, etc.
  (It's equivalent to DVD upgrades in previous releases.)
  If it's missing, that's a hard error."

so it seems fedup already treats the install repo, which is what it will be using (not a repo called 'fedora') post-release, as critical, and fails out if it cannot be reached. So I think actually the worst case we can think of - the 'main' repo fails to work but the other repos do work - is actually covered already? I'm just not clear on how the pre-release config doesn't work in this way, but if the worst case is already guarded against for post-release situations, that goes a long way to reassuring me...

Comment 15 Adam Williamson 2012-12-19 00:19:45 UTC
so to simplify further - the question is why did fedup, in the config Kamil tested when he first reported this bug, not behave as Will claims it should? The bug title is 'core fedora repository error must be fatal', and Kamil's description is very clear on the point that contacting the main repo failed, but fedup proceed to try and upgrade.

Will seems to be clearly claiming that this is not what should happen.

So that is our key remaining discrepancy. If this is just related to the precise temporary config we had for testing fedup at the time Kamil was running his test, and will no longer be the case when we actually ship F18, then that's good.

Comment 16 Kamil Páral 2012-12-19 14:36:07 UTC
I'll summarize. First, the vocabulary:

* instrepo
  - fedup internal repository containing upgrade.img
  - http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-install-18&arch=x86_64
  - currently points to Beta, after GA will point to Final
  - does not contain Everything/, only the DVD-like limited set of packages
* fedora
  - fedora.repo
* updates
  - updates.repo

Now the failure use cases:

1) 'instrepo' fails
In this case fedup halts completely, announcing error. GRUB is not modified. Perfect, no problem.

2) 'fedora' fails
In this case fedup completes OK, GRUB is modified. Some fc18 packages are not downloaded: packages that are in 'fedora', but they are neither in 'updates' nor in 'instrepo'. Because 'instrepo' contains all critical packages (kernel, Xserver, etc), the problem should probably boot fine.
However, I don't know what happens if dependencies are broken for some affected package. Let's say user has inkscape.fc17, and inkscape.fc18 is not included neither in 'updates' nor in 'instrepo'. The upgrade might break dependencies for inkscape.fc17. Will the upgrade finish successfully? Will it even start?

3) 'updates' fails
In this case fedup completes OK, GRUB is modified. Some fc18 packages are not downloaded: all package updates from 'updates'. Instead their older versions from 'fedora' and 'instrepo' are used. This is very similar to DVD upgrade method used in Fedora 17 and earlier. It can open a can of worms like this:
http://fedoraproject.org/wiki/Common_F17_bugs#Kernel_from_previous_release_may_still_be_used_after_upgrade.2C_prevents_clean_shutdown
From our experience this mostly works, but it has some quirks. My secret wish was that we would progress to safer methods with the new upgrader and this one would become the past.

4) 'fedora' and 'updates' fail
A combination of 2) and 3). This is exactly what DVD upgrade was like. Similar 'might happen' issues as in 2) and 3).


Originally when I reported this issue, I didn't know the 'instrepo' would be equivalent to the DVD repo. Because I know it now, this issue doesn't seem _so_ blockery now. The probability of breaking the system is now similar to the old DVD upgrade method. Also bear in mind this whole issue is just a corner case and happens only when you manage to download 'instrepo' metadata, but fail to contact 'fedora' and/or 'updates'. It is not very probable, especially since 'instrepo' moved to MirrorManager as well.

Still I think it is infuriatingly easy to fix this corner case, but I would probably have to become a fedup co-maintainer in the process, and I can't really afford that. If Will at least implements the big fat warning printout, it will help a bit.

I think this is no longer fit for a blocker, at least in our Fedora 'no guarantees' world.

Comment 17 Adam Williamson 2012-12-19 17:38:28 UTC
Discussed at 2012-12-19 blocker review meeting: http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-19/f18final-blocker-review-6.2012-12-19-17.02.log.txt . Given the info in comment #16, this is rejected as a blocker, on the basis that the failure case is likely to be fairly rare and even when it fails, you're really only in the situation of doing a DVD upgrade in previous releases.

Comment 18 Will Woods 2013-01-23 19:06:43 UTC
Fedup does print a stronger warning if 'updates' or 'fedora' is missing, since 0.7.2. See https://github.com/wgwoods/fedup/commit/e8a6dc1


Note You need to log in before you can comment on or make changes to this bug.