Bug 1084604
| Summary: | cannot start infiniband connection as dhcp failing | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Vladimir Benes <vbenes> | ||||||
| Component: | NetworkManager | Assignee: | Lubomir Rintel <lrintel> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Desktop QE <desktop-qa-list> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.0 | CC: | aloughla, danw, dcbw, dledford, jklimes, thaller, vbenes | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 7.1 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-09-15 10:12:19 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1110708 | ||||||||
| Attachments: |
|
||||||||
|
Description
Vladimir Benes
2014-04-04 19:33:35 UTC
Created attachment 882831 [details]
messages with failure
Apr 4 09:38:58 rdma-qe-11 NetworkManager[12139]: <info> dhclient started with pid 12719 Apr 4 09:38:58 rdma-qe-11 NetworkManager[12139]: <info> Activation (mlx4_ib1) Stage 3 of 5 (IP Configure Start) complete. Apr 4 09:38:58 rdma-qe-11 dhclient[12719]: Internet Systems Consortium DHCP Client 4.2.5 Apr 4 09:38:58 rdma-qe-11 NetworkManager[12139]: <info> (mlx4_ib1): DHCPv4 client pid 12719 exited with status -1 Apr 4 09:38:58 rdma-qe-11 NetworkManager[12139]: <warn> DHCP client died abnormally Vlad, can you run with --log-level=debug or set the log level in NetworkManager.conf so we can see exactly what NM is passing to dhclient here? Dan, it's quite funny race condition. Thomas advised me to enlarge journald limits by running:
sed -i 's/^#\?\(RateLimitInterval *= *\).*/\10/' /etc/systemd/journald.conf
sed -i 's/^#\?\(RateLimitBurst *= *\).*/\10/' /etc/systemd/journald.conf
systemctl restart systemd-journald.service
but this journald restart causes it. When I restart NM after that it works again. See DEBUG logs attached.
Created attachment 882880 [details]
DEBUG enabled messages
Hmm, the logs don't make things clearer to me unfortunately. Can you try something for me? Just leave everything up at that point, and: 1) nmcli dev disconnect mlx4_ib1 2) dhclient -v -d -1 mlx4_ib1 and lets see what dhclient says when it fails. If it actually crashes, would you be able to install gdb on that machine and then 'gdb dhclient' and 'run -v -d -1 mlx4_ib1' to get a backtrace? Then we can figure out whether this is a dhclient bug or an NM bug. Vlad, any chance you could grab the info requested in comment 5? Thanks! (In reply to Vladimir Benes from comment #3) > Dan, it's quite funny race condition. Thomas advised me to enlarge journald > limits by running: > sed -i 's/^#\?\(RateLimitInterval *= *\).*/\10/' > /etc/systemd/journald.conf > sed -i 's/^#\?\(RateLimitBurst *= *\).*/\10/' /etc/systemd/journald.conf > systemctl restart systemd-journald.service > ^^^^ this says it quite clearly a duplicate of 1136836 I reproduced, then updated NM to -35 version and it just works since then *** This bug has been marked as a duplicate of bug 1136836 *** |