Routing

last person joined: 19 hours ago 

Ask questions and share experiences about ACX Series, CTP Series, MX Series, PTX Series, SSR Series, JRR Series, and all things routing, including portfolios and protocols.
  • 1.  mx240 - chassid process takes whole cpu

    Posted 06-19-2014 15:29

    Hi,

    Since 13:50 i have very high cpu on routing engine, router doesn't answer for some snmp request (re cpu, interface counters are ok) but besides this, looks good - router pass traffic. It looks like problem with chassisd process ( nanslp ?? state). I don't see any problems with syn flood, ntp attack or other dos. All graphs shows normal traffic. What can be a reason of that? maybe it can be some bug in 11.4R5.5 ? I will plan restart entire router in next service window but maybe there is a other way to solve problem? i think about restart chassisd process only but i'm not sure about traffic impact ?

    thanks for any help.
    Ted

    logs from time when it began:

    Jun 19 13:50:10 CHASSISD_SIGPIPE: SIGPIPE received
    Jun 19 13:50:10 CHASSISD_SIGPIPE: SIGPIPE received
    Jun 19 13:50:10 CHASSISD_SIGPIPE: SIGPIPE received
    Jun 19 13:50:22 CHASSISD_SIGPIPE: SIGPIPE received


    @MX240# run show chassis routing-engine
    Routing Engine status:
    Slot 0:
    Current state Master
    Election priority Master (default)
    Temperature 25 degrees C / 77 degrees F
    CPU temperature 25 degrees C / 77 degrees F
    DRAM 8960 MB
    Memory utilization 46 percent
    CPU utilization:
    User 53 percent
    Background 7 percent
    Kernel 23 percent
    Interrupt 0 percent
    Idle 16 percent
    Model RE-S-1800x2



    @MX240# run show system processes extensive

    last pid: 74857; load averages: 0.91, 1.00, 1.02 up 583+10:51:40 00:02:30
    139 processes: 3 running, 120 sleeping, 1 zombie, 15 waiting

    Mem: 2070M Active, 181M Inact, 190M Wired, 106M Cache, 214M Buf, 4755M Free
    Swap: 8192M Total, 8192M Free


    PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
    1442 root 2 8 -88 162M 136M nanslp 200.9H 83.01% chassisd
    11 root 1 171 52 0K 16K RUN ??? 11.77% idle
    1481 root 1 111 15 245M 243M select 179.3H 0.00% sampled

     

     @MX240# run show chassis alarms
    No alarms currently active

    temp, power, fans are ok

     

     

    update:

    in logs i found

    MX240 /kernel: Process (1442,chassisd) attempted to exceed RLIMIT_DATA: attempted 131076 KB Max 131072 KB
    chassisd[1442]: rtslib: ERROR Failed to allocate new block of size 16384
    Jun 19 22:10:46 MX240 chassisd[1442]: rtslib: ERROR Allocation Failure

     

    and process reboot himself, 


    Jun 20 04:26:20 MX240 chassisd[1442]: rtslib: ERROR Failed to allocate new block of size 16384
    Jun 20 04:26:26 MX240 chassisd[1442]: ../../../../src/junos/usr.sbin/chasd/re/re_k2.c:106: insist 'dimm != NULL' failed
    Jun 20 04:26:31 MX240 /kernel: pid 1442 (chassisd), uid 0: exited on signal 6 (core dumped)
    Jun 20 04:26:31 MX240 init: chassis-control (PID 1442) terminated by signal number 6. Core dumped!
    Jun 20 04:26:31 MX240 alarmd[1443]: shutting down chassisd connection: chassisd ipc pipe read error
    Jun 20 04:26:31 MX240 fpc1 CMLC: Going disconnected; Routing engine chassis socket closed abruptly
    Jun 20 04:26:31 MX240 init: chassis-control (PID 75547) started
    Jun 20 04:26:31 MX240 /kernel: setsockopt(RTS_ASYNC_NEED_RESYNC) ignored (chassisd): client already active

     

    reboot  not affected on bgp session, wasn't traffic disruption

    router works, all looks ok. Maybe there is some bug  ? Still don't know reasons, maybe just bug in version 11.4R5.5 ?

     

     



  • 2.  RE: mx240 - chassid process takes whole cpu
    Best Answer

     
    Posted 06-20-2014 06:12

    If you want to find out why the chassisd is running with high CPU, then you will need to open a JTAC case.

     

    On the other hand, you mentioned that you want to restart the box or the chassisd.

     

    Is this dual RE? Do you have GRES? Is single RE?

     

    Usually, when you restart chassisd, then the FPCs will disconnect from RE and all RE protocols will flap.

     

    You can try to softly restart chassisd 'restart chassis-control soft'. This will not disconnect the FPCs, but the chassisd will have the same PID.

     

    In all other situations, the FPCs will disconnect(regardless it's gracefully) and the PID will change.

     

    The soft restart is not quite complete restart as a graceful one, but you can try this before and see if it will fix your problem.

     

     

     

    =====

    If this worked for you please flag my post as an "Accepted Solution" so others can benefit. A kudo would be cool if you think I earned it.



  • 3.  RE: mx240 - chassid process takes whole cpu

     
    Posted 06-20-2014 06:15

    I just saw that you updated your initial post.

     

    Seems to be a memory leak.

     

    Do you still have the core-dump? If yes, you should open a case to investigate this.

     

     

     

    =====

    If this worked for you please flag my post as an "Accepted Solution" so others can benefit. A kudo would be cool if you think I earned it.

     



  • 4.  RE: mx240 - chassid process takes whole cpu

    Posted 06-22-2014 03:16

    I know that my junos is quite old, so first, i have to upgrade junos to some newer stable version. Thanks for good explanation. I suppose that junos did automatic soft restart of chassisd process because bgp session didn't disconnect.