SRX

last person joined: 2 days ago 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
  • 1.  SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-12-2012 15:53
      |   view attached

    This afternoon, an SRX 220 that's been running with average CPU utilization around 10% since it was installed 2 months ago suddenly shot up to over 80% sustained CPU.  (we use Solarwinds Orion to monitor and graph)  This has been going on for several hours, sustained.  We've made no configuration changes recently.

     

    See attached performance graph image.  it's after business hours and our traffic is near zero.

     

    Can anyone suggest what might be causing this, and what the solution might be?  (firmware is 11.2R6.3)

     

     

    root@SRX1> show system processes extensive
    last pid: 65805;  load averages:  3.05,  2.09,  1.78  up 1+01:18:11    18:47:26
    136 processes: 19 running, 104 sleeping, 2 zombie, 11 waiting

    Mem: 131M Active, 113M Inact, 529M Wired, 144M Cache, 112M Buf, 53M Free
    Swap:


      PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
     1149 root        5  76    0   498M 52568K select 0  27.9H 98.44% flowd_octeon_hm
       22 root        1 171   52     0K    16K RUN    0  18.1H 11.23% idle: cpu0
    65799 root        1  -8    0  8288K  3044K RUN    0   0:00  1.46% sshd
    65800 root        1   4    0  7728K  2948K sbwait 0   0:00  1.46% sshd
    65801 sshd        1   4    0  7768K  1728K sbwait 0   0:00  0.73% sshd
       24 root        1 -20 -139     0K    16K RUN    0  14:06  0.00% swi7: clock
     1144 root        1  76    0 20032K  8864K select 0  12:47  0.00% kmd
       23 root        1 -40 -159     0K    16K WAIT   0   9:07  0.00% swi2: net
      894 root        1  76    0  8100K  2708K select 0   8:18  0.00% eventd
        5 root        1 -16    0     0K    16K rtfifo 0   3:51  0.00% rtfifo_kern_recv
     1136 root        1  76    0 17692K 10352K select 0   3:32  0.00% snmpd
     1173 root        1  76    0 10576K  3880K select 0   3:09  0.00% license-check
     1139 root        1  76    0 17500K  6848K select 0   2:59  0.00% l2ald

     

     



  • 2.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 03:39

    What does the following show from the shell command?

     

    user@srx>start shell
    %top -H

     



  • 3.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 04:39

    Hello..... looks like the same process as above (flowd_octeon_hm)


     

     

    ast pid: 66011;  load averages:  0.90,  0.65,  0.50                                                     up 1+14:08:53  07:38:08
    72 processes:  3 running, 68 sleeping, 1 zombie
    CPU states: 64.2% user,  0.0% nice,  5.6% system,  0.0% interrupt, 30.2% idle
    Mem: 132M Active, 113M Inact, 529M Wired, 144M Cache, 112M Buf, 51M Free
    Swap:

      PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
     1149 root     139    0   498M 52444K CPU1   1  42.2H 92.48% flowd_octeon_hm
     1149 root      78    0   498M 52444K select 0  42.2H  5.22% flowd_octeon_hm
    66011 sshd     104    0  7736K  1576K RUN    0   0:00  0.73% sshd
     1149 root      76    0   498M 52444K select 0  42.2H  0.00% flowd_octeon_hm
     1149 root      76    0   498M 52444K select 0  42.2H  0.00% flowd_octeon_hm
     1149 root       8    0   498M 52444K nanslp 0  42.2H  0.00% flowd_octeon_hm
     1144 root      76    0 20076K  8908K select 0  20:27  0.00% kmd
      894 root      76    0  8100K  2708K select 0  12:46  0.00% eventd
     1136 root      76    0 17692K 10352K select 0   5:17  0.00% snmpd
     1173 root      76    0 10576K  3880K select 0   4:44  0.00% license-check
     1139 root      76    0 17500K  6848K select 0   4:29  0.00% l2ald
     1137 root      76    0 17472K  7732K select 0   4:23  0.00% mib2d
     1168 root      76    0 11832K  4676K select 0   3:33  0.00% rtlogd
     1145 root      76    0  9068K  3400K select 0   3:18  0.00% ppmd
     1179 nobody    76    0 10368K  4428K ucondt 0   2:32  0.00% httpd
     1179 nobody    76    0 10368K  4428K ucondt 0   2:32  0.00% httpd



  • 4.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 04:56

    This is normal.  The flowd_octeon thread usually runs at high percentage in order to make sure that all the packets received are processed faster.  This is why flowd reads high always, however it does not mean that system is busy.

     

    The actual CPU usage can be viewed below.

     

    For Routing Engine (Control Plane) CPU usage:

      show chassis routing-engine

     

    For Packet Forwarding Engine (Data Plane) CPU usage:

      show chassis forwarding

      show security monitoring fpc 0

     

    Here is an output from an SRX240, the multicore processor, people sometimes wonder why the CPU% is 270%, but it is normal, 90% per core.

     

    PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
     1095 root        7  76    0   499M 49852K select 0    ??? 277.39% flowd_octeon_hm
       22 root        1 171   52     0K    16K RUN    0 3720.0 87.21% idle: cpu0
       23 root        1 -20 -139     0K    16K WAIT   0  41.1H  0.00% swi7: clock
     1120 root        1  76    0  8788K  3740K select 0 582:56  0.00% license-check
        5 root        1 -84    0     0K    16K rtfifo 0 502:39  0.00% rtfifo_kern_recv
     1086 root        1  76    0 17084K  6696K select 0 325:33  0.00% l2ald
       25 root        1 -40 -159     0K    16K WAIT   0 200:36  0.00% swi2: net
     1099 root        3  20    0 43940K 13416K sigwai 0 173:25  0.00% authd

     

    And the top -H shows each CPU

     

    PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
     1095 root     139    0   499M 49852K CPU3   3    ??? 91.70% flowd_octeon_hm
     1095 root     139    0   499M 49852K CPU2   2    ??? 91.70% flowd_octeon_hm
     1095 root     139    0   499M 49852K CPU1   1    ??? 91.70% flowd_octeon_hm
     1095 root      77    0   499M 49852K RUN    0    ???  1.46% flowd_octeon_hm
     1095 root      76    0   499M 49852K select 0    ???  0.00% flowd_octeon_hm
     1095 root      76    0   499M 49852K select 0    ???  0.00% flowd_octeon_hm
     1095 root       8    0   499M 49852K nanslp 0    ???  0.00% flowd_octeon_hm
     1120 root      76    0  8788K  3740K select 0 582:55  0.00% license-check
     1086 root      76    0 17084K  6696K select 0 325:33  0.00% l2ald
     1099 root      96    0 43940K 13416K ucond  0 173:25  0.00% authd
     1099 root      76    0 43940K 13416K select 0 173:25  0.00% authd
     1099 root      20    0 43940K 13416K sigwai 0 173:25  0.00% authd
     1088 root      76    0 22016K  6800K select 0 155:18  0.00% pfed




  • 5.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 05:15

    But notice the performance graph I attached to the original post.

     

    This CPU spike is sudden an significant for this device, as depoloyed in our company.  We've been monitoring it for 2 months and it's always averaged around 10% for all that time.  Until yesterday.

     

    Something has changed, and it wasn't a config change because we haven't made any.



  • 6.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 06:23

    Yup, I know, from Solad Winds it states:

     

    "This poller should pull back the equivilant of "show security monitoring fpc 0" on an SRX"

     

    What does the following commands return?

     

    For Packet Forwarding Engine (Data Plane) CPU usage:

      show chassis forwarding

      show security monitoring fpc 0



  • 7.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 06:48

    Here you go......

     

    root@SRX1> show chassis forwarding
    FWDD status:
      State                                 Online   
      Microkernel CPU utilization         7 percent
      Real-time threads CPU utilization   1 percent
      Heap utilization                   52 percent
      Buffer utilization                  1 percent
      Uptime:                               1 day, 16 hours, 10 minutes, 5 seconds

    root@SRX1> show security monitoring fpc 0
    FPC 0
      PIC 0
        CPU utilization      :    2 %
        Memory utilization   :   52 %
        Current flow session :  426
        Max flow session     : 98304
    Session Creation Per Second (for last 96 seconds on average):    0

     

     

    Overnight the CPU utilization dropped suddenly from 80% to 40% and is holding steady in the low 40s.  An improvement, but still a far cry from "normal" for this device.

    Also note that uptime of 1 day 16 hours was after a recovery from a power outage, which outlasted the UPS battery.  It was an unplanned un-graceful shutdown.  Could that have anything to do with it?  I didn't mention this before, because the unit ran for about 22 hours at "normal" 10% CPU before the sudden spike to 80% sustained.



  • 8.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization
    Best Answer

    Posted 07-13-2012 07:04

    Hi,

     

    Well I do not know much about Solar Windows but somwhere along the way, either the SRX or the Solar Windows Product is giving/providing/showing incorrect data.

     

    The real readings above are of course correct.  So I would try and restart the SNMP Daemon first.

     

    user@srx>restart snmp

     

    I also notice your graph is "Average" CPU load and not a true current reading.  Im leaning toward an issue with the way the Solar Windows product has handled the unexpected system shutdown.  I expect resetting the data would result in it dropping down again.

     

    Can it show a running CPU graph?



  • 9.  RE: SRX 220 - Sudden Very High Sustained CPU Utilization

    Posted 07-13-2012 11:12
      |   view attached

    See attached graph.

     

    I restarted snmp on the SRX and the solarwinds processes, and reported CPU average utilization is back down around 10%.

     

    It's interesting that the anomolous spike didn't happen right after the router came back up from the power outage, but rather, it happend about 22 hours later.  Hmmmm.

     

    Thanks for you help and suggestions.