Bug 1228445

Summary: unbound-libs takes 50 minutes to install
Product: [Fedora] Fedora Reporter: York Possemiers <ypossem>
Component: unboundAssignee: Tomáš Hozza <thozza>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 23CC: gsgatlin, magnus, mmedvede, pj.pandit, psimerda, pwouters, thozza, ypossem
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Anaconda netinst Fedora Workstation configuration x86_64
Last Closed: 2016-01-27 23:20:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Installation log
none
dnf log
none
dnf rpm log
none
unbound-anchor strace hang none

Description York Possemiers 2015-06-05 01:25:51 UTC
Description of problem:
It takes an approximately 50 minutes to install unbound-libs.x86_64 via network installs.

Version-Release number of selected component (if applicable):
1.5.3-6.fc23

How reproducible:
Always for workstation installs on x86_64 (small sample size, see additional info). Does not occur on minimal installs.


Steps to Reproduce:
1. Use Rawhide boot.iso
2. Select Workstation install in software selection
3. Install

Actual results:
Installer needs approximately 50 minutes to install the unbound-libs package

Expected results:
take no more than a few minutes for each package (except for the big ones, such as kernel)


Additional info:

First seen in a yum update for F21 (though in this case, I cancelled the update, started it again, and it didn't recur, maybe an issue in post install script? Was performed approximately a week ago), then on a fresh install of F22 workstation (2 days ago) netinst and now on a fresh install of Rawhide from the boot.iso from 2015-06-04.

Unable to do a very comprehensive test (due to time taken to download packages), the two older versions of fedora from memory. Tested rawhide install on bare metal (btrfs, Workstation install, 16G ram) and in a VM (automatic LVM, Workstation install, 2G allocated).

Comment 1 Tomáš Hozza 2015-06-05 07:10:18 UTC
Hi.

Thank you for the report. The change made in the %post script was only to start the unbound-anchor.timer unit. I can try to reproduce your issue.

Can you please attach installation log from the installation?

Thanks!

Comment 2 York Possemiers 2015-06-08 05:21:41 UTC
Created attachment 1036108 [details]
Installation log

Sorry I am not sure which constitutes the "install log" so I will provide a couple of the files in /var/log/anaconda. I'd prefer to be able to attach multiple files to one response, but that doesn't seem to be possible.

Comment 3 York Possemiers 2015-06-08 05:22:23 UTC
Created attachment 1036109 [details]
dnf log

Comment 4 York Possemiers 2015-06-08 05:22:52 UTC
Created attachment 1036110 [details]
dnf rpm log

Comment 5 Tomáš Hozza 2015-06-08 06:07:09 UTC
(In reply to York Possemiers from comment #2)
> Created attachment 1036108 [details]
> Installation log
> 
> Sorry I am not sure which constitutes the "install log" so I will provide a
> couple of the files in /var/log/anaconda. I'd prefer to be able to attach
> multiple files to one response, but that doesn't seem to be possible.

That's exactly what I meant. I'll have a look to see if there is any clue.

Thank you!

Comment 6 York Possemiers 2015-07-15 01:36:26 UTC
This issue persists with the upgrade (with dnf update) to unbound-libs.x86_64 1.5.4-1.fc23

Comment 7 Jan Kurik 2015-07-15 14:03:27 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 8 Mikhail Medvedev 2015-07-15 16:21:45 UTC
I am observing similar behavior during unbound-libs-1.5.3-4.fc21.ppc64.rpm install on Fedora 21.

I did find that the process causing the delay is showing up as
-unbound-anchor,3656,unbound -a /var/lib/unbound/root.anchor -c /etc/unbound/icannbundle.pem

When listed with 'pstree -panuh'

Killing the process allowed the installation to proceed.

Comment 9 Tomáš Hozza 2015-07-16 07:13:05 UTC
Thanks for the clue.

This is really helpful. I'll try to look at the possible cause and will update the bug once I have some new information.

Comment 10 Tomáš Hozza 2015-07-16 11:37:47 UTC
So the %post section for unbound-libs is executing unbound-anchor on installation and upgrade. I can see that the /var/lib/unbound/root.anchor is one mistake, since it should be /var/lib/unbound/root.key.

The purpose of this is to update the root key. I think it makes sense to run it only on the package installation, or not at all, since there is a systemd timer unit that does this every 24h.

Never the less, the utility should not hang for such long. Could you please attach to the process using gdb and get the traceback? I guess there is some issue with the network communication.

Is the host on which you installed the package connected to the Internet? Is there any network connectivity at all?

Can you please run the command by hand and see if it hangs, too?
unbound-anchor -a /var/lib/unbound/root.anchor -c /etc/unbound/icannbundle.pem

I'm not able to reproduce it. I'll most probably remove the call from %post section completely, but would like to check if there is any real issue in the tool.

Thanks in advance!

Comment 11 Paul Wouters 2015-07-16 12:43:36 UTC
actually the root.key and root.anchor had a different format in which they stored the root key. To prevent apps from breaking on the format change, I used a different file name.

Comment 12 Paul Wouters 2015-07-16 12:47:32 UTC
Maybe we should move the anchor call from %post to a onetime service call on startup? That at least will not cause an install to hang

Comment 13 Tomáš Hozza 2015-07-16 12:51:56 UTC
(In reply to Paul Wouters from comment #12)
> Maybe we should move the anchor call from %post to a onetime service call on
> startup? That at least will not cause an install to hang

That's what the timer unit does... runs the oneshot service. Therefore I don't think it is necessary to call it during the installation at all. It is done regularly, so there is no need for explicit call.

Comment 14 Mikhail Medvedev 2015-07-16 13:35:49 UTC
Tried starting a clean f21 VM and running 'yum install -y unbound-libs'. The long wait is reproducible every time. After killing the offending process and letting the installation to complete, I manually ran:

unbound-anchor -a /var/lib/unbound/root.anchor -c /etc/unbound/icannbundle.pem

Which also hanged.

I got this bt each time with gdb:

#0  0x00003fff8e0a4958 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x00003fff8e76a824 in 00000043.plt_call.bufferevent_disable () from /lib64/libevent-2.0.so.5
#2  0x00003fff8e74b61c in .event_base_loop () from /lib64/libevent-2.0.so.5
#3  0x00003fff8e74cec4 in .event_base_dispatch () from /lib64/libevent-2.0.so.5
#4  0x00003fff8e8eaa6c in 0000106d.plt_call.PyTuple_New () from /lib64/libunbound.so.2
#5  0x00003fff8e965fd4 in .ub_resolve () from /lib64/libunbound.so.2
#6  0x0000000045d746b0 in 000002e1.plt_call.ub_ctx_create ()
#7  0x00003fff8dfa438c in .generic_start_main.isra () from /lib64/libc.so.6
#8  0x00003fff8dfa45b4 in .__libc_start_main () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()

The VM I ran the test on has network connectivity.

Just in case, I have also ran the same test on baremetal F21 host, with the
same result of unbound-anchor hanging, with exactly the same gdb backtrace.

Comment 15 Tomáš Hozza 2015-07-20 10:57:10 UTC
Thanks for the traceback. What is strange is the call to PyTuple_New(), since this is part of the Python bindings and unbound-anchor is a C binary.

Reported to upstream:
https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=689

Maybe they will have some ideas...

Comment 16 Tomáš Hozza 2015-07-20 11:18:19 UTC
Would you be able to capture the packet communication (using e.g. tcpdump) from the VM while running unbound-anchor (with the reproduced hang)?

Thanks in advance.

Comment 17 Mikhail Medvedev 2015-07-20 19:55:07 UTC
Created attachment 1054022 [details]
unbound-anchor strace hang

Comment 18 Mikhail Medvedev 2015-07-20 19:56:47 UTC
I have attached an strace captured with like this:

while [ -z "$(ps ax |grep unboun[d]-anch | grep -v runuser)" ]; do sleep 0.1; done; export PID="$(ps ax |grep unboun[d]-anch |grep -v runuser |awk '{print $1}')";echo $PID; strace -p $PID -f -e trace=network -s 10000 2> strace_unbound_anchor

Comment 19 Gary Gatling 2015-08-26 21:48:54 UTC
I am seeing problems in fedora 22, MATE spin installing from DVD. Ran "dnf update" as first thing after install completes. it hung on unbound-libs in two separate installs.

I will leave my "dnf update" command running overnight to see if it ever finishes.

Comment 20 Gary Gatling 2015-08-27 20:47:37 UTC
after leaving my pc overnight it did work. I did a dnf downgrade unbound-libs and then did a "time dnf upgrade unbound-libs" and it took

38m26.542s to run the dnf transaction. Pretty long time in scriptlet.

Comment 21 Magnus 2015-09-03 11:12:44 UTC
I had the exact same problem, and found that the problem was that my firewall (Cisco ASA) was dropping a DNS UDP reply. I have a DNS inspection rule and that rule drops packages lager than 512 bytes. If you have a Cisco ASA, you could either disable the DNS inspection, or edit the rule:
https://supportforums.cisco.com/discussion/11325716/asa-udp-dns-reply-dropped

Comment 22 Gary Gatling 2015-09-08 21:47:53 UTC
We do not control the networking here at NCSU. There is another bureaucracy that deals with that. We cannot log into the devices or change anything about them. I am unsure if they would be willing to change the cisco switches because of one package in fedora. But I will submit a support ticket and ask them.

Comment 23 Paul Wouters 2015-09-09 00:33:18 UTC
That cisco default value has been a bane to DNSSEC. If that is enabled, your DNS is basically getting mangled, Cisco even tells you themselves:

http://www.cisco.com/web/about/security/intelligence/dnssec.html#15

You can confirm this is your problem at:

https://www.dns-oarc.net/oarc/services/replysizetest

However, if this is really the cause of the unbound-libs stall, we should look into fixing that and as Tomas said, we should remove it from the installer and let a one-shot service deal with this.

Comment 24 Gary Gatling 2015-09-09 14:10:56 UTC
Here is what the networking group said.

-------------------------------------------------------------------------------
Hi Gary,

Are you talking about outgoing or incoming UDP DNS packets?  

What is the business justification for enabling this change?  

I see 913 bytes UDP DNS packets coming from the outside to your machine, but they are from international IPs. (Amsterdam, Tokyo, etc.)

Regards,
Early Yu
Network Operations
Communication Technologies
-------------------------------------------------------------------------------
Is the problem with outgoing or incoming UDP DNS packets?

Comment 25 Paul Wouters 2015-09-09 14:29:53 UTC
It's about incoming to you, eg from DNSSEC servers on the outside world to your machine that is installing unbound-libs

business justification: Do not break DNS - The cisco option is wrong and obsoleted. See URL I provided in comment 23.

Comment 26 Gary Gatling 2015-09-09 15:13:27 UTC
Paul, thank you for your help. They changed DNS Inspection and now the package updates in like 7 seconds.

Comment 27 York Possemiers 2016-01-27 23:20:49 UTC
Haven't seen this in a while, I'm presuming it fixed