Some times FXPC on ex4600 is getting very high level like (80% or 90%). How to investigate which thing is causing this. And what is the solution? Any one experience this. And also there is a interrupt 2%. Can I ask what is this and how to troubleshooting?
>show system processes extensive | except 0.00 last pid: 86889; load averages: 0.45, 0.50, 0.48 up 217+00:57:54 11:04:23 146 processes: 4 running, 124 sleeping, 18 waiting
>show chassis routing-engine Routing Engine status: Slot 0: Current state Master Temperature 40 degrees C / 104 degrees F CPU temperature 40 degrees C / 104 degrees F DRAM 1953 MB Memory utilization 52 percent CPU utilization: User 13 percent Background 0 percent Kernel 12 percent Interrupt 2 percent <------------------ Idle 74 percent
Is the fxpc stuck high or fluctuates? Are there any cores in /var/tmp? Check if there was a significant config change or network change that marked this high CPU and troubelshoot around reverting that. Following commands are a first step: show system core-dumps -------------->look for any fxpc cores show log messages | last 100 show chassis fpc -------------------->look for utilization
Please check for any unusual spikes in the interface utilization and from above outputs look for log messages related to topology/next-hop changes etc. Sometimes loops causing unwanted traffic forwarding back and forth across the switch can create this issue. Also look for any other repetitive error log messages, they could also give a clue.
If the above doesn't help narrow it down or if you see cores, you might need some advanced help and I'd suggest you to contact TAC with this info:
request pfe execute command "show pfe manager statistics" target fpc0 | no-more request pfe execute command "set dc bc \"show c cpu0\"" target fpc0 | no-more request pfe execute command "show threads" target fpc0 | no-more start shell gcore -c /var/tmp/fxpc.live.core <pid> ------------> If there are no cores already
Some juniper device won't give the irq to device association as show in above output. In that case, you can check following output and determine what are the devices attached to particular irq which is getting flooeded and you can dtermine if something can be done about that device:
show system boot-messages | match "irq"
Kudos are appreicated if it answered part of your query!!