is there any standard documentation that will outline troubleshooting proceedures on an ex switch?
we have a EX3200 in production that runs IVR's with VRRP and OSPF up toward a MX80. For some reason, the backup switch lost connectivity to the MX80. We couldnt even console into the switch. After a reboot or two it all came back up. Now ive been tasked to do a root cause analysis on the issue, but where would i even begin.
ive looked through log files (100;s of them) but it all looks like garbage - what log could i possibly look at to give me the answers im looking for. JTAC must follow some sort of proceedure to determain cause of outage on their hardware. I need to know what to look for and what log to access.
Always take RSI (request support information | no-more) in the problem state.
Call JTAC and open a P1 case if you would be able to troubleshoot for sometime otherwise reboot the switch. Although it is not advisable to reboot the switch when in problem state, but you can do so if you thought it would resolve the issue.
In additional to the previous suggestions, I also copy off the log files (during the problem, or right after restoring service) via SCP to avoid that any valuable information is lost and so that I can calmly go over each log file looking for any thing interesting.
In general the most revealing log has always been "messages" but I have found interesting information in others as well.
Logs can be collected in the problem state, although that is not mandatory as the logs can be collected after the issue will get resolved. RSI is the key to the issue in finding the root cause. Lof file "messages" collects all the critical logs as per the configuration under hierarchy "system syslog" and can be modified.