Hi,
Since 13:50 i have very high cpu on routing engine, router doesn't answer for some snmp request (re cpu, interface counters are ok) but besides this, looks good - router pass traffic. It looks like problem with chassisd process ( nanslp ?? state). I don't see any problems with syn flood, ntp attack or other dos. All graphs shows normal traffic. What can be a reason of that? maybe it can be some bug in 11.4R5.5 ? I will plan restart entire router in next service window but maybe there is a other way to solve problem? i think about restart chassisd process only but i'm not sure about traffic impact ?
thanks for any help.
Ted
logs from time when it began:
Jun 19 13:50:10 CHASSISD_SIGPIPE: SIGPIPE received
Jun 19 13:50:10 CHASSISD_SIGPIPE: SIGPIPE received
Jun 19 13:50:10 CHASSISD_SIGPIPE: SIGPIPE received
Jun 19 13:50:22 CHASSISD_SIGPIPE: SIGPIPE received
@MX240# run show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Master
Election priority Master (default)
Temperature 25 degrees C / 77 degrees F
CPU temperature 25 degrees C / 77 degrees F
DRAM 8960 MB
Memory utilization 46 percent
CPU utilization:
User 53 percent
Background 7 percent
Kernel 23 percent
Interrupt 0 percent
Idle 16 percent
Model RE-S-1800x2
@MX240# run show system processes extensive
last pid: 74857; load averages: 0.91, 1.00, 1.02 up 583+10:51:40 00:02:30
139 processes: 3 running, 120 sleeping, 1 zombie, 15 waiting
Mem: 2070M Active, 181M Inact, 190M Wired, 106M Cache, 214M Buf, 4755M Free
Swap: 8192M Total, 8192M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1442 root 2 8 -88 162M 136M nanslp 200.9H 83.01% chassisd
11 root 1 171 52 0K 16K RUN ??? 11.77% idle
1481 root 1 111 15 245M 243M select 179.3H 0.00% sampled
@MX240# run show chassis alarms
No alarms currently active
temp, power, fans are ok
update:
in logs i found
MX240 /kernel: Process (1442,chassisd) attempted to exceed RLIMIT_DATA: attempted 131076 KB Max 131072 KB
chassisd[1442]: rtslib: ERROR Failed to allocate new block of size 16384
Jun 19 22:10:46 MX240 chassisd[1442]: rtslib: ERROR Allocation Failure
and process reboot himself,
Jun 20 04:26:20 MX240 chassisd[1442]: rtslib: ERROR Failed to allocate new block of size 16384
Jun 20 04:26:26 MX240 chassisd[1442]: ../../../../src/junos/usr.sbin/chasd/re/re_k2.c:106: insist 'dimm != NULL' failed
Jun 20 04:26:31 MX240 /kernel: pid 1442 (chassisd), uid 0: exited on signal 6 (core dumped)
Jun 20 04:26:31 MX240 init: chassis-control (PID 1442) terminated by signal number 6. Core dumped!
Jun 20 04:26:31 MX240 alarmd[1443]: shutting down chassisd connection: chassisd ipc pipe read error
Jun 20 04:26:31 MX240 fpc1 CMLC: Going disconnected; Routing engine chassis socket closed abruptly
Jun 20 04:26:31 MX240 init: chassis-control (PID 75547) started
Jun 20 04:26:31 MX240 /kernel: setsockopt(RTS_ASYNC_NEED_RESYNC) ignored (chassisd): client already active
reboot not affected on bgp session, wasn't traffic disruption
router works, all looks ok. Maybe there is some bug ? Still don't know reasons, maybe just bug in version 11.4R5.5 ?