Junos
Highlighted
Junos

Curious JunOS PPPoE CoS Log

‎01-16-2019 12:46 PM

Yet again I am getting some very recurrent errors in the log file of my MX204 PPPoE BNG:

 

Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 COS(cos_ifl_walk:6689): Bad return status IFD:146
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 COS(cos_bind_final_scheduler_on_ifl:1745): Bind TC profile failed IFL:1073787522
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 cos_ifl_tcprofile_add_action_wrapper:1671: IFL TCP add failed for ifl 4000b282 msg COS_IFL_TCP ret 1
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_cos_ifl_sched_node_op:829: vbf_cos_ifl_tcp_add failed
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_var_create:11168: var IFL SCHED NODE: add failed with error=1
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_update_var_list:1304: Failed to create var
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_flow_msg_handle_add:3977: Flow 155991 => Failed to create variable list for tmpl ifl
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_flow_msg_handler:5417: Flow 155991 => Failed to create flow with error=1
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_flow_msg_handle_add:4146: Flow 16829865 => Underlying Flow 155991 not found
Jan 16 13:41:54 dnvrco-cmfl-mx204-1 fpc0 vbf_flow_msg_handler:5417: Flow 16829865 => Failed to create flow with error=15
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 qchip_allocate_l4_node:5120 XQSS-chip(0): all L4 nodes are allocated (allocated:2048 max:2048)
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 COS_HALP_JAM(cos_halp_jam_alloc_l4_node:4379): sched_alloc_l4_node for ifd lt-0/1/0 type IFL 1073787523 failed
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 COS_HALP(cos_halp_alloc_sched_nodes:1303): L4 Scheduler allocation failure IFL:1073787523[IFD:146]
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 COS_HALP(cos_halp_set_sched_config:1765): Scheduler node allocation failure IFL:1073787523[IFD:146]
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 COS_HALP(cos_halp_bind_tc_profile_on_ifl:320): IFL:1073787523[IFD:146] set sched config failed
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 COS_HALP(cos_halp_bind_tc_profile_on_ifl:380): Failed to bind IFL:1073787523[ifd:146] tc profile
Jan 16 13:42:15 dnvrco-cmfl-mx204-1 fpc0 COS(cos_ifl_walk:6689): Bad return status IFD:146

 

Based on that log, it seems as though CoS is failing to configure on some pppoe interfaces?  Here is my dynamic profile:

 

pppoe-profile {
interfaces {
pp0 {
unit "$junos-interface-unit" {
ppp-options {
chap;
pap;
ipcp-suggest-dns-option;
}
pppoe-options {
underlying-interface "$junos-underlying-interface";
server;
}
keepalives interval 30;
family inet {
rpf-check;
tcp-mss 1460;
filter {
input "$junos-input-filter";
}
unnumbered-address lo0.0;
}
}
}
}
class-of-service {
traffic-control-profiles {
tcp-dynamic {
shaping-rate "$junos-cos-shaping-rate" burst-size "$junos-cos-shaping-rate-burst";
excess-rate proportion 0;
}
}
interfaces {
pp0 {
unit "$junos-interface-unit" {
output-traffic-control-profile tcp-dynamic;
}
}
}
}
routing-options {
access {
route $junos-framed-route-ip-address-prefix {
qualified-next-hop "$junos-interface-name";
}
}
}
}

 

 

JunOS version 17.4R2-S1.2.

9 REPLIES 9
Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-16-2019 01:51 PM

"fpc0 qchip_allocate_l4_node:5120 XQSS-chip(0): all L4 nodes are allocated (allocated:2048 max:2048)"

 

Why would the max L4 nodes be only 2048?  I can't find any documention on this for the MX204 platform.

 

Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-16-2019 06:22 PM

Could you share the interface config of subscriber? and get show system resource-monitor summary << is hidden.

& have you noticed any core dump yet?

 

 

 

/Karan Dhanak
Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

[ Edited ]
‎01-16-2019 08:07 PM

Well the dynamic-profile is the interface config?  Here is the config applied to the PS interfaces where PPPoE subs enter:

 

    interfaces {
        <ps*> {
            flexible-vlan-tagging;
            auto-configure {
                vlan-ranges {
                    dynamic-profile single-vlan-prof-nocos {
                        accept pppoe;
                        ranges {
                            2-4094;
                        }
                    }
                    access-profile aaa-profile;
                }
                remove-when-no-subscribers;
            }
            mtu 9100;
            no-gratuitous-arp-request;
            unit 0 {
                encapsulation ethernet-ccc;
            }
        }
    }

 

Resource Usage Summary

Throttle                       : Enabled
Load Throttle                  : Enabled
Heap Mem Threshold             : 70  %
IFL Counter Threshold          : 95  %
Round Trip Delay Threshold(ms) : 3000
Filter Counter Threshold       : 100 %
Expansion Threshold            : 95  %
CoS Queue Threshold            : 100 %
MFS threshold                  : 70  %        Used : 0

Slot # 0
     Client allowed                   : Yes
     Service allowed                  : Yes
     Heap memory used                 : 492948640       In % : 18
     Average Round-trip Delay(ms)     : 103  (30  )     Round-trip Delay(ms) : 100

     MAX session rate allowed(%)      : 100
     Client denied                    : 221
     Service Denied                   : 0
     Performance Denial Client        : 0
     Performance Denial Service       : 0

                Filter counter memory      IFL counter memory   Expansion memory
      PFE #         used  |   %             used  |   %          used  |   %
          0       25448208     31           551904      1       9248456      4

     CoS Queue Utilization
     PFE #   Scheduler Block #        Used      %
         0                   0       15408     94

 

 

 

What I did to temporarily 'solve' the problem was I created a new dynamic profile which did not utilize any class-of-service and I applied that to one of the PS interfaces so that approx 150 customers of the 2000 on the BNG are no longer using CoS and this seemed to stop the non-stop log entries of failed CoS configuration.  Probably because that total dropped to <2048.

 
I do not see any coredumps in /var/tmp.

 

I am assuming I have somthing wrong in the class-of-service or the hierarchical-scheduler config on the lt interface that is causing way too many resources to be gobbled up?

 

That CoS resource usage says 15408 used which is 94%.  15408 is around 94% of 16384 which interestingly is also 2048*8.  Is it because I have 2048 subs and each are getting 8 queues?  How would the CoS (just shaping) be configured so that this scales to the 32,000 subs that the 204 is rated for?

 

Do I really need the "implicit-hierarchy" on the lt interface?

 

Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-18-2019 07:43 AM

Nobody has any ideas on this one?

Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-18-2019 08:25 AM

Hi,

 

L2, L3, L4 each have 2k max queue and 16k logical queue. could you get the following info

 

request pfe execute command "show xq 0 sche info" target fpc0

show subscriber summary port

show configuration chassis

show interface lt-x/y/z

 

 

/Karan Dhanak
Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-21-2019 07:33 AM

skennedy@dnvrco-cmfl-mx204-1> request pfe execute command "show xq 0 sche info" target fpc0
SENT: Ukern command: show xq 0 sche info


 Scheduler Enhanced Priority Mode : FALSE

 Scheduler Active pools      : 1

 Scheduler Node Allocation:
     Level        Allocated     Maximum
     -----       -----------   ---------
      L1                  31         127
      L2                  32        2048
      L3                1827        2048
      L4                1829        2048
       Q               14632       16384


skennedy@dnvrco-cmfl-mx204-1> show subscribers su
                                                 ^
'su' is ambiguous.
Possible completions:
  subscriber-state     State of subscriber
  summary              Display subscriber summary
skennedy@dnvrco-cmfl-mx204-1> show subscribers summary port

Interface           Count
ae1: xe-0/1/1       59
ae1: xe-0/1/3       59
ps0                 77
ps1                 133
ps10                41
ps11                31
ps12                82
ps13                20
ps15                11
ps16                16
ps17                15
ps18                19
ps19                70
ps2                 90
ps20                10
ps21                17
ps22                34
ps23                20
ps24                89
ps25                35
ps26                14
ps27                6
ps28                2
ps29                193
ps3                 101
ps30                11
ps31                8
ps32                9
ps33                12
ps34                17
ps35                63
ps36                1
ps37                4
ps39                17
ps4                 78
ps40                11
ps41                5
ps42                6
ps43                2
ps44                10
ps45                96
ps46                2
ps47                1
ps48                6
ps49                2
ps5                 19
ps50                175
ps51                5
ps52                1
ps6                 43
ps7                 78
ps8                 21
ps9                 35

Total Subscribers: 1923

skennedy@dnvrco-cmfl-mx204-1> show configuration chassis
aggregated-devices {
    ethernet {
        device-count 4;
    }
}
pseudowire-service {
    device-count 1024;
}
fpc 0 {
    pic 1 {
        tunnel-services {
            bandwidth 20g;
        }
    }
}
alarm {
    management-ethernet {
        link-down ignore;
    }
}
network-services enhanced-ip;

skennedy@dnvrco-cmfl-mx204-1> show interfaces lt-0/1/0
Physical interface: lt-0/1/0, Enabled, Physical link is Up
  Interface index: 146, SNMP ifIndex: 548
  Type: Logical-tunnel, Link-level type: Logical-tunnel, MTU: Unlimited, Speed: 20000mbps
  Device flags   : Present Running
  Interface flags: Point-To-Point SNMP-Traps Internal: 0x4000
  Physical info  : 13
  Current address: ce:e1:94:67:40:1b, Hardware address: ce:e1:94:67:40:1b
  Last flapped   : 2019-01-03 04:29:15 MST (2w4d 04:03 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 1482641264 bps (138310 pps)

  Logical interface lt-0/1/0.32767 (Index 333) (SNMP ifIndex 621)
    Flags: Up SNMP-Traps 0x4000 VLAN-Tag [ 0x0000.0 ]  Encapsulation: ENET2
    Input packets : 0
    Output packets: 291543538111

skennedy@dnvrco-cmfl-mx204-1>

Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-21-2019 07:47 AM

The docs say "Configuring two levels of hierarchy on MPCs that support more levels preserves resources and allows the system to scale higher.  In a two-level scheduling hierarchy, all logical interfaces and interface sets share a single node; no hierarchical relationship is formed."

 

The issue is the docs don't really go into any partiular use-cases for each "level" and what different configs would look like utilizing these different levels.   So it's a bit unclear in my case where I am just doing shaping on pppoe subs if I can switch to a "two level" design.

 

I believe I need to remove "implicit-hierarchy" on my lt interface and set it to max of 2, but looking for confirmation on that.

Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎01-22-2019 08:19 PM

...and perhaps I also need to 'set chassis fpc 0 max-queues 256k' ?  I read that the MX204 has 256k queues (NOT only 16k).  So do I just need to set max-queues higher?

 

If each sub gets 8 queues (which apparently can be changed to 4 each instead) then that would come out to 32k subs which is what Juniper said the box can do.   If 16834 is the max then that would only be 2048 subs...which is the issue I am running into.  This has got to be configurable?

Highlighted
Junos

Re: Curious JunOS PPPoE CoS Log

‎02-12-2019 04:57 PM

So I updated the max-queues to 256k and that did work and allow for more Qs, however I still have an issue with L3 showing 8191 max usage and I'm already using 2200 of that:

 

> request pfe execute command "show xq 0 sche info" target fpc0
SENT: Ukern command: show xq 0 sche info


Scheduler Enhanced Priority Mode : FALSE

Scheduler Active pools : 1

Scheduler Node Allocation:
Level Allocated Maximum
----- ----------- ---------
L1 31 127
L2 32 4095
L3 2256 8191
L4 2258 32767
Q 18064 262136

 

 

 

How can I get my CoS/queues of these subs to NOT use L3?  My next issue will be once I hit 8k subs, my L3 will max out and I'll have the same issue -- this is no where near the 32k subs it's rated for....

Feedback