Routing
Routing

BGP failed: Broken pipe

[ Edited ]
‎12-14-2018 04:36 AM

Hello,

Observe the problem on MX10T, sometimes bgp session go down.

One of them:

Dec 14 13:39:50 sr1 rpd[1297]: bgp_send: sending 90 bytes to xxx.xxx.xxx.xxx (External AS 477xx) failed: Broken pipe
Dec 14 13:39:50 sr1 rpd[1297]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer xxx.xxx.xxx.xxx (External AS 477xx) changed state from Established to Idle (event TransportError)
Dec 14 13:39:59 sr1 rpd[1297]: bgp_pp_recv: rejecting connection from xxx.xxx.xxx.xxx (External AS 477xx), peer in state Idle
Dec 14 13:39:59 sr1 rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to xxx.xxx.xxx.xxx+55030 (proto): code 6 (Cease) subcode 5 (Connection Rejected)

Other session have same status: Broken pipe, TransportError, Idle, Connection Rejected.

 

Neighbors connected through the trunk port, about of 90% sessions get failed. Sessions that are connected directly by cable in MX didnt go down.

 

7 REPLIES 7
Routing

Re: BGP failed: Broken pipe

‎12-14-2018 05:52 AM

We need to look for a common element that can cause all the peers to have problems communicating at the same time.

 

If all of the sessions lost are on the same link, what errors, flaps or physical statistics do we see on that link?

 

Are the other peers in those sessions also on the same router or do they go to different routers?

 

What is the full layer 2 path for the connection and what else in common on that path could introduce communications problems?

 

Steve Puluka BSEET - Juniper Ambassador
IP Architect - DQE Communications Pittsburgh, PA (Metro Ethernet & ISP)
http://puluka.com/home
Routing

Re: BGP failed: Broken pipe

[ Edited ]
‎12-14-2018 06:22 AM

Hello, 

 

Port statistics on MX10:

Last flapped   : 2018-10-24 11:11:50 MSK (7w2d 05:57 ago)

 

When BGP sessions go down, a couple of sessions continue to work. They have same routes but, others irb interfaces. 

 

L2 path looks like: MX10 - EX4550-32F - EX4200-48T. I have other connections on last switch and i dont see problem on them.

 

For example i have one session with our ISP, they are connected to second switch ( EX4500-32F ),

Path: switch port xe-0/0/13 - same switch port xe-0/0/31 trunk to MX ( all vlans ) - mx10t port xe-0/0/3 - irb interface

Diagnostics optics:

    Laser output power                        :  0.5910 mW / -2.28 dBm
    Module temperature                        :  39 degrees C / 102 degrees F
    Module voltage                            :  3.3370 V
    Receiver signal average optical power     :  0.5279 mW / -2.77 dBm
    Laser output power                        :  0.5270 mW / -2.78 dBm
    Module temperature                        :  37 degrees C / 98 degrees F
    Module voltage                            :  3.3330 V
    Receiver signal average optical power     :  0.4586 mW / -3.39 dBm

and this session was droped with same errors:

 

Dec 14 13:45:14  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:45:14  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+30344 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:45:43  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:45:43  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+50268 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:46:28  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:46:28  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+39976 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:46:54  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:46:54  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+36526 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:47:38  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:47:38  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+31796 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:48:24  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:48:24  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+33350 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:48:51  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:48:51  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+51746 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:49:33  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:49:33  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+28659 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:50:03  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:50:03  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+39788 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:50:53  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:50:53  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+28874 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:51:40  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:51:40  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+41956 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:52:07  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:52:07  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+31881 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:52:47  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:52:47  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+31010 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:53:19  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:53:19  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+38220 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:53:54  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:53:54  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+40149 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:54:32  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:54:32  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+24109 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:55:11  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:55:11  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+17577 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:55:44  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:55:44  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+55018 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:56:30  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:56:30  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+24127 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:57:09  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:57:09  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+61292 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:57:55  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:57:55  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+43626 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:58:42  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:58:42  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+22469 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:59:10  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:59:10  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+33366 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 13:59:45  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 13:59:45  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+44166 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 14:00:13  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 14:00:13  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+20126 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 14:00:47  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 14:00:47  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+58556 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 14:01:22  MX10T rpd[1297]: bgp_pp_recv: rejecting connection from XXX.XXX.254.36 (External AS 20XXX), peer in state Idle
Dec 14 14:01:22  MX10T rpd[1297]: bgp_pp_recv:3130: NOTIFICATION sent to XXX.XXX.254.36+22305 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
Dec 14 14:01:51  MX10T rpd[1297]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer XXX.XXX.254.36 (External AS 20XXX) changed state from OpenConfirm to Established (event RecvKeepAlive)

 

switch errors statisctics

 Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0, L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO errors: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 5, Errors: 0, Drops: 0, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0

 

 

 

Problem repeted. This time i got others session drops. When this happens in my RE CPU loading is 100re0-cpu.png

Routing

Re: BGP failed: Broken pipe

‎12-14-2018 07:40 AM

Hi Roman,

 

"BGP failed: Broken pipe" is a general error. It means that router got a TCP reset request

from the peer device. For example if the peer cleared the tcp-session but your MX80 hasn't

received the notification (detected that bgp-session is down) and sent any message to peer

it will make the peer send a TCP reset, resulting in broken pipe. This probably points to broken

transmission between these peers.

 

There might be several reasons for it, for example RPD scheduler slip on your MX or on the peer.

Or it light be a bad link between the peers or bad interface from any side.

 

Thanks,

Alex

Routing

Re: BGP failed: Broken pipe

‎12-15-2018 08:48 AM

High cpu could be the root cause.  During the event if you could run this checklist to determine what process is responsible for the high cpu.

https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261

 

Steve Puluka BSEET - Juniper Ambassador
IP Architect - DQE Communications Pittsburgh, PA (Metro Ethernet & ISP)
http://puluka.com/home
Routing

Re: BGP failed: Broken pipe

‎12-17-2018 06:49 AM

Dec 17 17:16:09  xx.xx rpd[1297]: RPD_SCHED_SLIP: 5 sec scheduler slip, user: 0 sec 683289 usec, system: 0 sec, 0 usec

I got this to day, after this message, half of my sessions went down.

Routing

Re: BGP failed: Broken pipe

‎12-17-2018 04:48 PM

This message is listed in the high cpu article as indicating problems with the kernal process.  They list two common causes you can check.

 

Some of the reasons for high kernel CPU are as follows:

 

  • CLI sessions are not closed gracefully on the router.  In this case, one would see mgd running high on CPU, starving the kernel of CPU cycles. 
    1059 root 1 132 0 24344K 18936K RUN 405.0H 43.75% mgd
    26275 root 1 132 0 24344K 18936K RUN 353.5H 43.75% mgd
    
    CPU utilization:
          User                      12 percent
          Kernel                    87 percent
          Interrupt                  0 percent
          Idle                       0 percent
    
    One way to address this issue is to kill the mgd processes eating up the CPU.


  • 'Sampling' is enabled on the router.  This sometimes leads to high kernel CPU; to address this, reduce the rate at which you are sampling on the router.

 

 

https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261

 

Steve Puluka BSEET - Juniper Ambassador
IP Architect - DQE Communications Pittsburgh, PA (Metro Ethernet & ISP)
http://puluka.com/home
Routing

Re: BGP failed: Broken pipe

‎12-18-2018 08:27 AM

Hi Roman,

 

You should not see BGP flaps from single short RPD slip like this:

> Dec 17 17:16:09  xx.xx rpd[1297]: RPD_SCHED_SLIP: 5 sec scheduler slip, user: 0 sec 683289 usec, system: 0 sec, 0 usec

 

May be you have agressive BGP timers - holddow ~5s.

What was the trigger for this RPD slip? In other words do you see something suspetios 

in the logs around "Dec 17 17:16:04" (slip time - 5 sec) ? May be commit or interface flap?

 

Also please note that in your case RPD was starving w/o CPU resourses. While some other 

process(es) consumes CPU, so aparently your MX80 is reached or soon will reach its max scale. 

 

Thanks,

Alex