SRX

last person joined: 4 days ago 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
Expand all | Collapse all

SRX550 High Memory strange issue

  • 1.  SRX550 High Memory strange issue

    Posted 01-18-2019 01:23

    Hi All!

    I have a strange issue with SRX550 High Memory.

    It is connected to 2 ISPs with BGP (full view filtered /24).

    After some time as BGP sessions come UP the log show:

    Jan 17 22:53:06 srx550-1 fto_new: failed to allocate fto
    Jan 17 22:53:06 srx550-1 RT: IPv4:0 - 205.253/16 (RT: Failed to allocate object for flow)
    Jan 17 22:53:06 srx550-1 RT-HAL,rt_entry_add_msg_proc,3405: rt_halp_vectors->rt_create failed
    Jan 17 22:53:06 srx550-1 RT-HAL,rt_entry_add_msg_proc,3466: proto ipv4,len 16 prefix 205.253/16 nh 1342
    Jan 17 22:53:06 srx550-1 RT-HAL,rt_msg_handler,688: route process failed
    Jan 17 22:53:06 srx550-1 fto_new: failed to allocate fto
    Jan 17 22:53:06 srx550-1 RT: IPv4:0 - 49.40.33/24 (RT: Failed to allocate object for flow)
    Jan 17 22:53:06 srx550-1 RT-HAL,rt_entry_add_msg_proc,3405: rt_halp_vectors->rt_create failed
    Jan 17 22:53:06 srx550-1 RT-HAL,rt_entry_add_msg_proc,3466: proto ipv4,len 24 prefix 49.40.33/24 nh 1342
    Jan 17 22:53:06 srx550-1 RT-HAL,rt_msg_handler,688: route process failed

     

    The commands "show chassis routing-engine", "show security flow" does not returns any suspisious info.

    I tried to filter routes with /23 - still the same.

    Moreover, another SRX of the same model with same config works like a charm.

    I assumed that it is smth wrong with DRAM, so I took DRAM modules from "working" SRX and put them to the "bad" one, but it did not help.

    I will really appreciate any help, thank you!

     



  • 2.  RE: SRX550 High Memory strange issue

    Posted 01-18-2019 02:50

    The memory issue may be because you are exceeding the maximum bgp route table size on the srx550 which is only 712k routes.  With dual full tables you will have a very large rib and fib that likely is over this limit.

     

    https://www.juniper.net/us/en/local/pdf/datasheets/1000281-en.pdf

     



  • 3.  RE: SRX550 High Memory strange issue

    Posted 01-18-2019 02:57

    Thx, but that is not the reason, as:

    1. another device of the same model works with the same configuration

    2. reducing route size to /23 (that's less than 712k) makes no difference



  • 4.  RE: SRX550 High Memory strange issue

    Posted 01-18-2019 23:34

    Hello Dmytro,

     

    Can you share a "show route summary" command from both SRXs?

     



  • 5.  RE: SRX550 High Memory strange issue

    Posted 01-19-2019 06:03

    Hi there!

    That's how it looks like on a "good" working srx:

    inet.0: 739887 destinations, 1463164 routes (735990 active, 0 holddown, 5753 hidden)
    Direct: 4 routes, 4 active
    Local: 4 routes, 4 active
    BGP: 1463152 routes, 735978 active
    Static: 3 routes, 3 active
    Aggregate: 1 routes, 1 active

     

    Unfortunatelly, I can't provide you the same info for the "bad" one as it is not connected to any upstream right now.

    But, even when I reduce the number of routes for about twice (smth like 300-400K) I still see the following messages in log:

     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
     RT-HAL,rt_msg_handler,673: route check failed 22
    last message repeated 1257 times
     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
    /kernel: last message repeated 1255 times
     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
     RT-HAL,rt_msg_handler,673: route check failed 22
     last message repeated 1167 times
     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
     /kernel: last message repeated 1181 times
     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
    and don't see messages like (RT: Failed to allocate object for flow)

     



  • 6.  RE: SRX550 High Memory strange issue

    Posted 01-19-2019 07:51

    I think the first step will be to identify the process(es) responsible.

    Confirm the overal memory usage status during the issue.

    show chassis routing-engine

     

    Look for the processes associated with high memory

    show system processes extensive

     

    If this is not conclusive you can drop to the shell and use top interactively

    start shell

    top -H

     

    Once you know the responsible daemons we can figure out what functions are responsible.

     



  • 7.  RE: SRX550 High Memory strange issue

    Posted 01-20-2019 06:16

    Hello there,

    Do You have Netflow sampling configured on the SRX550 that has memory allocation failures?

    HTH

    Thx
    Alex



  • 8.  RE: SRX550 High Memory strange issue

     
    Posted 01-19-2019 19:28

    Hello Dmytro,

     

    Yes, definitely looks like a capacity issue. Have you tried bringing up one ISP at a time?

     

    Can you get the following command outputs whenever you get a chance to connect the device back in the problem state?

    > request pfe execute target fwdd command "show usp fto stats"

    > request pfe execute target fwdd command "show route summary"

    > show route summary

    > show system memory

    > show version

    > show chassis hardware

     

    Regards,

     

    Vikas

    JTAC - CFTS



  • 9.  RE: SRX550 High Memory strange issue

    Posted 01-21-2019 13:13

    Hi All!

    Sorry for delay, but I can perform testing only late in the evening.

    So, regarding netflow the answer is no.

    Regarding all debugging info - attached.

    24.txt - 2 ISPs connected accepting routes upto /24

    then I changed route filter to /23

    23.txt - 2 ISPs connected accepting routes upto /23

    then I disconnected 1 ISP

    1isp23.txt - 1 ISPs connected accepting routes upto /23

     

    show version

    Model: srx550m
    Junos: 15.1X49-D110.4
    JUNOS Software Release [15.1X49-D110.4]


    show chasis hardware
    Hardware inventory:
    Item Version Part number Serial number Description
    Chassis xxxxxxxxxxxxx SRX550M
    Midplane REV 10 750-063950 ACPW0869
    Routing Engine REV 06 711-062269 ACPT0458 RE-SRX550M
    FPC 0 FPC
    PIC 0 6x GE, 4x GE SFP Base PIC
    Power Supply 0 Rev 04 740-024283 BF95913 PS 645W AC

     

    with 1 isp connected I don't see errors for flows, but still see these errors:

     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
     /kernel: last message repeated 4 times
     /kernel: RT_PFE: RT msg op 2 (PREFIX DELETE) failed, err 5 (Invalid)
     RT-HAL,rt_msg_handler,673: route check failed 22

    Attachment(s)

    txt
    24.txt   12 KB 1 version
    txt
    23.txt   12 KB 1 version
    txt
    1isp23.txt   12 KB 1 version


  • 10.  RE: SRX550 High Memory strange issue

     
    Posted 01-21-2019 19:00

    Hello,

     

    Thanks for sharing the outputs.

    > There are FTO allocation failures and I see them to have increased.

    FTO_NEW_FAIL 190943 191045

    FTO_ROUTE_UPDATE_CB_MEM 190943 191045

     

    > The case with 2 X ISP receiving with filtered /24 active routes are over 712K which is over whats mentioned in the datasheet

     

    > In all three cases, it is a bit strange that there is a big difference between the two "show route summary" outputs, the from the RE and one from the fwdd

     

    > Its possible that these messages may be normal on the SRX550 considering the size of the Internet routing table > Perhaps you are not seeing the issue on the other SRX simply because of the logging level. Is the logging level the same? (show configuration system syslog)

     

    > I would also compare the "show usp fto stats" on the working SRX to see if there are similar number of failures

     

    Regards,

     

    Vikas



  • 11.  RE: SRX550 High Memory strange issue

    Posted 01-22-2019 11:18

    Hi!

    Thx for you reply.

    The log level is absolutely the same, more over they have identical configuration.

    On working srx "show usp fto stats" retrurns:

    ================ master ================
    SENT: Ukern command: show usp fto stats

    Tesla/USP Flow Tracking Object Statistics; currently 736246 FTOs
    FTO_ALLOCATES 1595556
    FTO_DESTROYS 859310
    FTO_FLOW_ADDS 32912650
    FTO_FLOW_ADD_OVERWRITE_FTO 0
    FTO_FLOW_ADD_FREEING_FTO 0
    FTO_FLOW_DELS 32909787
    FTO_FLOW_DEL_ALREADY 0
    FTO_ROUTE_UPDATES 2379477
    FTO_ROUTE_OIF_NOT_CHANGED 0
    FTO_ROUTE_DELETES 859310
    FTO_ROUTE_DELETE_NULL 0
    FTO_NH_WORD_GETS 0
    FTO_MORE_SPEC_ROUTE 1595549
    FTO_ROUTE_RESET_CB_FTO_ALLOC 0
    FTO_ROUTE_UPDATE_CB_FTO_ALLOC 1595556

    FTO errors:
    FTO_GET_1ST_BAD_PARAM 0
    FTO_NEW_FAIL 0
    FTO_DESTROY_FAIL 0
    FTO_ROUTE_UPDATE_BAD_PARAM 0
    FTO_ROUTE_UPDATE_NO_NH 0
    FTO_FLOW_ADD_BAD_PARAM 0
    FTO_FLOW_DEL_BAD_PARAM 0
    FTO_ROUTE_RESET_CB_MEM 0
    FTO_ROUTE_UPDATE_CB_MEM 0

    Route lookup statistics:
    Route table read lock not acquired 1508
    rtt_spinlock acquired 2454873

    L2FTO:
    FTO_MAC_RESET_CB_FTO_ALLOC 0
    FTO_MAC_UPDATE_CB_FTO_ALLOC 0
    FTO_MAC_UPDATE_BAD_PARAM 0
    FTO_MAC_UPDATE_NO_NH 0
    FTO_MAC_RESET_CB_MEM 0
    FTO_MAC_UPDATE_CB_MEM 0

    MCAST FTO:
    FTO_MCAST_ROUTE_CHANGE 0
    FTO_PIM_ROUTE_CHANGE 0
    FTO_MCAST_ROUTE_NOT_READY 0
    FTO_ROUTE_UPDATE_MCAST 0
    FTO_ROUTE_UPDATE_MCAST_RESET_PARENT 0
    FTO_ROUTE_UPDATE_MCAST_CHANGED 0
    FTO_ROUTE_UPDATE_MCAST_NOT_CHANGED 0
    FTO_ROUTE_UPDATE_MCAST_NO_IFL 14
    FTO_ROUTE_MCAST_FANOUT_INIT 0
    FTO_ROUTE_MCAST_LIST_ALLOC_FAIL 0
    FTO_ROUTE_MCAST_NO_IFL 0
    FTO_ROUTE_MCAST_FANOUT_NEW_FTO 14
    FTO_ROUTE_MCAST_FANOUT_OLD_FTO 0
    FTO_ROUTE_MCAST_FANOUT_REINIT 0
    FTO_ROUTE_MCAST_FTRPC_FAIL 0
    FTO_ROUTE_MCAST_SKIP_NOROUTE_SESSION 0

    As you can see all FTO error counters have zero value.

     



  • 12.  RE: SRX550 High Memory strange issue

    Posted 01-22-2019 23:03

    Hi,

     

    Can you get the following command outputs from the working and not working SRX?

     

    > request pfe execute target fwdd command "show arena"

    > request pfe execute target fwdd command "show heap"

    > request pfe execute target fwdd command "show services mum"

    > request pfe execute target fwdd command "show memory"

     



  • 13.  RE: SRX550 High Memory strange issue

    Posted 01-23-2019 12:17

    Hi!

    Please, find requested info attached.

    Attachment(s)

    txt
    not-working.txt   2 KB 1 version
    txt
    working.txt   2 KB 1 version


  • 14.  RE: SRX550 High Memory strange issue

    Posted 01-23-2019 12:48

    Dmytro,

     

    Thanks for the info. Could you reboot the non-working SRX and gather "request pfe execute target fwdd command "show services mum" right after the bootup and some minutes after it? I am trying to confirm if there is a memory problem:

     

    Faulty SRX:

     

    request pfe execute target fwdd command "show services mum"
    ================ master ================
    SENT: Ukern command: show services mum
    
    Memory usage manager:  gsm
    Total free space to start with: 127898960
    Active customers: 1
    Max customers: 12
    Yellow zone limit: 31974740
    Orange zone limit: 23021812
    Red zone limit: 11510906
    Operational zone:      Red
    
    cust_id      in use       limit
    -------------------------------
          0    44153889    63949480
          1         512    63949480
    
    -------------------------------
    
    actual free space =      95048
    est. free space =        95016

     

    Working SRX:

     

    request pfe execute target fwdd command "show services mum"
    ================ master ================
    SENT: Ukern command: show services mum
    
    Memory usage manager:  gsm
    Total free space to start with: 127898960
    Active customers: 1
    Max customers: 12
    Yellow zone limit: 31974740
    Orange zone limit: 23021812
    Red zone limit: 11510906
    Operational zone:      Red
    
    cust_id      in use       limit
    -------------------------------
          0    58731113    63949480
          1         512    63949480
    
    -------------------------------
    
    actual free space =    5695552
    est. free space =      5695552

     



  • 15.  RE: SRX550 High Memory strange issue

    Posted 01-24-2019 05:20

    Hi!

    After being idle for about 15 hours it shows:

     

    actual free space =   31977208
    est. free space =     31977208

    and just after reboot:

    ================ master ================
    SENT: Ukern command: show services mum
    
    Memory usage manager:  gsm
    Total free space to start with: 127898960
    Active customers: 1
    Max customers: 12
    Yellow zone limit: 31974740
    Orange zone limit: 23021812
    Red zone limit: 11510906
    Operational zone:      Green
    
    cust_id      in use       limit
    -------------------------------
          0     2771085    63949480
          1         512    63949480
    
    -------------------------------
    
    actual free space =   43659768
    est. free space =     43659768
    
    Proactive reclaim              : ENABLED
    Seconds between reclaims       : 4
    Number of proactive reclaims   : 0

     



  • 16.  RE: SRX550 High Memory strange issue

    Posted 01-24-2019 12:52

    Dmytro,

     

    Can you gather the following information on the non-working SRX? make sure that the problem is happening and that the BGP routes showing on "show route summary" are less than 712k routes (any number higher than that is not recommended for SRX550 as shown in the datasheet. I know the working SRX has twice that amount of BGP routes but it doesnt mean that it is a recommended implementation):

     

    > show log messages (to confirm the issue)
    > show route summary
    > show chassis routing-engine
    > show security monitoring fpc 0
    > request pfe execute target fwdd command "show services mum"
    > request pfe execute target fwdd command "show arena"
    > request pfe execute target fwdd command "show heap"
    > request pfe execute target fwdd command "show mbuf host"
    > request pfe execute target fwdd command "show services objcache"
    > request pfe execute target fwdd command "show usp fto stats"

     

    If possible gather the same commands on the working SRX for comparisson purposes.

     

    If you leave the non-working SRX running for around 1 day(connected to both ISPs) will it get stable?

     



  • 17.  RE: SRX550 High Memory strange issue

    Posted 01-25-2019 12:55

    Hi!

    Don't you think that limit for 712k is for active routes only?

    The problem is that I can't reduce the total number of routes as I'm getting full
    view table from both ISPs.

    So, filtering routes on my side makes effect for active routes only, not the total number of them.

    Anyway, I collected requested info from both SRXs (please see attached files).

    For non-working SRX /22 filter is currently applied for incoming BGP prefixes.
    For working SRX /24 filter is applied.

    With /22 filter for non-working SRX there are no errors in messages log.

    I will leave non-working SRX connected to ISPs and will monitor the logs on weekend.

    If errors come back I will collect the same set of info once again.

     

    P.S. Actually I have 2 branches connected with SRX550 the same way (2 ISPs, full view /24 etc.) and 3 SRX550  routers.

    2 routers works like a charm and only one has these problems.

    Attachment(s)

    txt
    working2.txt   26 KB 1 version
    txt
    not-working2.txt   27 KB 1 version


  • 18.  RE: SRX550 High Memory strange issue

    Posted 01-29-2019 22:32

    Dmytro,

     

    The limit is not for the active routes but for the BGP routes database, so with 1.4 million BGP routes you are exceeding the capacity of the SRX and you could expect problems:

     

     show route summary
    Autonomous system number: xxxxx
    Router ID: x.x.x.x
    
    inet.0: 740559 destinations, 1464998 routes (241210 active, 0 holddown, 985427 hidden)
                  Direct:      4 routes,      4 active
                   Local:      4 routes,      4 active
                     BGP: 1464986 routes, 241198 active
                  Static:      3 routes,      3 active
               Aggregate:      1 routes,      1 active
     request pfe execute target fwdd command "show services mum"
    ================ master ================
    SENT: Ukern command: show services mum
    
    Memory usage manager:  gsm
    Total free space to start with: 127898960
    Active customers: 1
    Max customers: 12
    Yellow zone limit: 31974740
    Orange zone limit: 23021812
    Red zone limit: 11510906
    Operational zone:      Yellow
    request pfe execute target fwdd command "show services objcache"
    ================ master ================
    SENT: Ukern command: show services objcache
    
                                                objs    objs
                          obj   obj     objs  in cpu      in     total       total
    obj cache name       size align   in use  caches   depot   objects       bytes
    ------------------------------------------------------------------------------
    ADVPN Trigger Pool     88     4        0       0       0         0           0
    ALG PST NAT BINDING POOL 16     4        0       0       0         0           0
    Client Group Name      72     4        0       0       0         0           0
    DIP IN pool            76     4        0       0       0         0           0
    FTO pool               76     4        241261      19       0    241280    18337280

     

    Maybe this specifc SRX has extra firewall features enabled or is processing more traffic, that will consume more memory than the others firewalls thus making that the exceed on the BGP routes affects it more?  How has the SRX behave when you left it for several days with the problem, does it eventually gets stable?

     

    Note that this problem could eventually trigger high CPU utilization as well. The numbers mentioned in the datasheet are those which have been calculated when running BGP alone on the device and nothing else. 

     

    It looks like you might need to work with your ISP to reduce the subnets length to try and keep the routes' database size smaller than the limit specified. If all you need is routing with multiple BGP routers sending the complete Internet routing table then I suggest looking at Juniper’s routing platforms (like MX series) which are better suited for such requirements.

     

    I dont like to provide bad news but Juniper will not support scenarios where the limits specified are exceeded.

     



  • 19.  RE: SRX550 High Memory strange issue

    Posted 01-30-2019 02:29

    Hi!

    Thx for the update.

    I can't keep it in unstable state as I can't leave staff without access to the Internet 😞

    So, I will leave it with /22 filter and keep fingers crossed.

    Btw, "working" SRX with /24 filter was in red zone and got to the green as soon as I changed filter to /22.

    Anyway, I really appriciate the time you spent and the info you provided, it was very useful for me.

    Thank you very much!

     



  • 20.  RE: SRX550 High Memory strange issue
    Best Answer

    Posted 01-30-2019 06:16
    Dmytro,

    You are very welcome, I am glad that the information was helpful. Please mark the post as Resolved if it applies.