Bug 1831158
Summary: | ipmitool access to HP BMC takes 2 minutes due to "Unable to Get Channel Cipher Suites" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Bob Fournier <bfournie> | ||||
Component: | ipmitool | Assignee: | Vaclav Dolezal <vdolezal> | ||||
Status: | CLOSED ERRATA | QA Contact: | Rachel Sibley <rasibley> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.2 | CC: | cpaquin, dtantsur, kelly.griese, ovasik, rasibley, rvr, sreichar, vdolezal | ||||
Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
Target Release: | 8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | ipmitool-1.8.18-17.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-04 03:17:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1831893, 1849038 | ||||||
Attachments: |
|
Description
Bob Fournier
2020-05-04 17:53:36 UTC
Correction - This is related to the ipmitool Cipher Suites - https://bugzilla.redhat.com/show_bug.cgi?id=1749360 I did find that this BMC responds quickly when using Cipher Suite 3 $ time ipmitool -I lanplus -H 10.9.103.29 -C 3 -U ADMINISTRATOR -P password -v -R 12 -N 5 chassis status Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20 Error response 0xc1 from Get PICMG Properities Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20 Invalid completion code received: Invalid command Discovered IPMB address 0x0 System Power : off Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : always-on Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false Front Panel Control : none real 0m0.106s user 0m0.001s sys 0m0.003s I also uploaded the iLO firmware to the latest - 2.60 (May 23 2018) but it has the same issue. This version also responds quickly to Cipher Suite 3. yes, if you specify cipher suite on the commandline, ipmitool won't check which ones are supported and just try the one from commandline btw. what is the reason to have 12 retries specified? > btw. what is the reason to have 12 retries specified?
Its a valid question. That is the default the Ironic uses based on:
command_retry_timeout = 60 (Maximum time in seconds to retry retryable IPMI operations)
min_command_interval = 5 (Minimum time, in seconds, between IPMI operations sent to a server.)
num_retries = max((command_retry_timeout // min_command_interval), 1) = 12
Its worth for us to revisit these defaults in Ironic. With 6 retries it takes 45 seconds to complete the above command which should not cause a timeout.
> command_retry_timeout = 60 (Maximum time in seconds to retry retryable IPMI > operations) Is 60 second the timeout enforced by wrapping script? If so, then you should note that -N and -R apply to each IPMI message that is generated by used ipmitool command. Also I found this in ipmitool/src/plugins/lanplus/lanplus.c: > /* increment session timeout by 1 second each retry */ > session->timeout++; I have no idea why it is here, but it explains why it takes so long >>> R=12 >>> N=5 >>> R * (N + (R - 1)/2) 126.0 > Is 60 second the timeout enforced by wrapping script? If so, then you should note that -N and -R apply to each IPMI message that is generated by used ipmitool > command. It is an ironic parameter but its really just used to generate the number of retries (N), it is not enforced. We are looking at a change to limit N. > Also I found this in ipmitool/src/plugins/lanplus/lanplus.c: Interesting, yes that explains why it takes that length of time. What is confusing is that the command does succeed but the retries are still attempted when it can't get the Cipher Suites. Shouldn't it return success after the first attempt? Folks, let's not concentrate too much on the retry number. While it may be a bit too high, even lower numbers will be very problematic if we hit the timeout on literally every call. I think it's a bug that ipmitool retries "command not supported". It's not a condition that can just go away, unlike connection problems or insufficient resources. As to work arounds, is it possible to run the "get suites" command via ipmitool? And is it possible to check if a cipher is supported? We could do it without retries and then use the picked cipher ourself (although we'll be really doing ipmitool's job in this case). (In reply to Bob Fournier from comment #7) > What is confusing is that the command does succeed but the retries are still > attempted when it can't get the Cipher Suites. Shouldn't it return success > after the first attempt? I'm not sure I understand. Are you mixing ipmitool command with underlying IPMI commands? Because there are several IPMI commands used here, including two ("Get Channel Authentication Capabilities" and "Get Channel Cipher Suites") that are sent prior to establishing a session. (In reply to Dmitry Tantsur from comment #8) > I think it's a bug that ipmitool retries "command not supported". It's not a > condition that can just go away, unlike connection problems or insufficient > resources. The problem here is about not receiving anything. (I assume. Didn't see output with `-vvvvv` in this case.) You should see multiple entries of >> Sending IPMI command payload >> netfn : 0x06 >> command : 0x54 >> data : 0x0e 0x00 0x80 without any response in the `-vvvvv` output if it is really ignored. You would see some `<< IPMI Response Message Header` otherwise. (In reply to Dmitry Tantsur from comment #8) > As to work arounds, is it possible to run the "get suites" command via > ipmitool? And is it possible to check if a cipher is supported? We could do > it without retries and then use the picked cipher ourself (although we'll be > really doing ipmitool's job in this case). While it is possible (`ipmitool channel getciphers ipmi [channel #]`), the tricky part here might be the difference between sending it inside or outside of a session. AFAIK ipmitool doesn't support sending command outside of a session. (In reply to Dmitry Tantsur from comment #8) > Folks, let's not concentrate too much on the retry number. While it may be a > bit too high, even lower numbers will be very problematic if we hit the > timeout on literally every call. If it is an option, you can use `ipmitool exec` to run multiple ipmitool commands inside one session so there will be only one "Unable to Get Channel Cipher Suites" timeout per ipmitool invocation. Anyway, I'll report this issue to upstream. Ask me if you are interested in a quick&dirty solution, otherwise I'll wait for the upstream. Vaclav - yes, we would be interested in a quick and dirty solution. We are working on a workaround with the way we use ipmitool retries but would prefer to have a solution in ipmitool. Thank you. Thanks! I verified your patch works well. It now takes -N seconds to get a response back when there are failures with the Channel Cipher Suites command, much better than the 2 minutes in the initial description. $ sudo dnf install http://download.eng.bos.redhat.com/brewroot/work/tasks/8472/28888472/ipmitool-1.8.18-16.0.bz1831158.0.el8.x86_64.rpm $ time ipmitool -I lanplus -H 10.9.103.29 -U ADMINISTRATOR -P password -v -R 12 -N 5 chassis status Unable to Get Channel Cipher Suites Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20 Error response 0xc1 from Get PICMG Properities Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20 Invalid completion code received: Invalid command Discovered IPMB address 0x0 System Power : on Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : always-on Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false Front Panel Control : none real 0m5.103s user 0m0.000s sys 0m0.006s For QE: this is reproducible on ipmi_sim from OpenIPMI-lanserv package Created attachment 1694100 [details]
quick patch for this issue
ALL TESTS PASSED Verified the ipmitool chassis status command only took a few seconds once updating to 1.8.18-18 versus over 2m when using 1.8.18-14. Before: [root@ci-vm-10-0-139-249 ~]# rpm -q ipmitool ipmitool-1.8.18-14.el8.x86_64 [root@ci-vm-10-0-139-249 ~]# time ipmitool -I lanplus -H 10.9.103.29 -U ADMINISTRATOR -P password -v -R 12 -N 5 chassis status Unable to Get Channel Cipher Suites Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20 Error response 0xc1 from Get PICMG Properities Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20 Invalid completion code received: Invalid command Discovered IPMB address 0x0 System Power : off Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : always-on Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false Front Panel Control : none real 2m6.305s user 0m0.002s sys 0m0.009s After: [root@ci-vm-10-0-139-249 ~]# rpm -q ipmitool ipmitool-1.8.18-17.el8.x86_64 [root@ci-vm-10-0-139-249 ~]# time ipmitool -I lanplus -H 10.9.103.29 -U ADMINISTRATOR -P password -v -R 12 -N 5 chassis status Unable to Get Channel Cipher Suites Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20 Error response 0xc1 from Get PICMG Properities Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20 Invalid completion code received: Invalid command Discovered IPMB address 0x0 System Power : off Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : always-on Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false Front Panel Control : none real 0m5.117s user 0m0.002s sys 0m0.002s (In reply to Rachel Sibley from comment #19) > After: > [root@ci-vm-10-0-139-249 ~]# rpm -q ipmitool > ipmitool-1.8.18-17.el8.x86_64 > Hi Rachel, I'm hitting this problem as well. We are using RHEL 8.2, ipmitool-1.8.18-14.el8.x86_64 Where can I get the rpm for ipmitool-1.8.18-17.el8.x86_64 ? The RedHat customer portal still shows the latest version is ipmitool-1.8.18-14.el8.x86_64. https://access.redhat.com/downloads/content/ipmitool/1.8.18-14.el8/x86_64/fd431d51/package Any help appreciated. Thanks! When I look at the portal I see the latest version as ipmitool-1.8.18-17.el8.x86_64 - https://access.redhat.com/downloads/content/ipmitool/1.8.18-17.el8/x86_64/fd431d51/package (In reply to Bob Fournier from comment #21) > When I look at the portal I see the latest version as > ipmitool-1.8.18-17.el8.x86_64 - > https://access.redhat.com/downloads/content/ipmitool/1.8.18-17.el8/x86_64/ > fd431d51/package Thanks Bob Fournier! We were able to download, and tried the test again. [root@nfixundercloud ~]# yum list installed | grep ipmitool ipmitool.x86_64 1.8.18-17.el8 @@commandline Confirmed the timing is much faster, but please note the first line in the response message shows "Unable to Get Channel Cipher Suites" [root@nfixundercloud ~]# time ipmitool -I lanplus -H 135.121.21.39 -U ipdlab -P mainstreet -v -R 12 -N 5 chassis status Unable to Get Channel Cipher Suites Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20 Error response 0xc1 from Get PICMG Properities Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20 Invalid completion code received: Invalid command Discovered IPMB address 0x0 System Power : on Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : previous Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false Front Panel Control : none real 0m5.105s user 0m0.000s sys 0m0.005s When I run this command, it obviously hits the same error (undercloud) [stack@nfixundercloud ~]$ openstack overcloud node introspect --all-manageable --provide Snip from: /var/log/containers/ironic/ironic-conductor.log 2020-09-22 15:18:33.333 8 DEBUG ironic.common.utils [-] Execution completed, command line is "ipmitool -I lanplus -H 135.121.21.45 -L ADMINISTRATOR -U ipdlab -R 1 -N 1 -f /tmp/tmp6zrry8y8 power status" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77 2020-09-22 15:18:33.333 8 DEBUG ironic.common.utils [-] Command stdout is: "Chassis Power is on " execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:78 2020-09-22 15:18:33.334 8 DEBUG ironic.common.utils [-] Command stderr is: "Unable to Get Channel Cipher Suites " execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:79 Any other suggestions on how to work around this? Should I open a separate ticket? > Confirmed the timing is much faster, but please note the first line in the response message shows "Unable to Get Channel Cipher Suites"
> [root@nfixundercloud ~]# time ipmitool -I lanplus -H 135.121.21.39 -U ipdlab -P mainstreet -v -R 12 -N 5 chassis status
> Unable to Get Channel Cipher Suites
Yes, that is expected. It will still not get the Cipher Suites from this BMC, but that was not the problem. The problem was the delay. It was taking up to two minutes with the previous number of retries and delays (-R 12 -N 5) from Ironic to execute one ipmitool command. That was causing introspection to time out. This fix removes that large delay and introspection will no longer time out.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ipmitool bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4720 |