05-12-2011 05:29 PM
The release notes for Junos 10.4R3 include the error messages that would show up on the console and if you issue the command "show chassis alarms". What I'm looking for is what the syntax will be in syslog? Sadly, the release notes do not include what would show up in the boot log. Hopefully something gets logged.
I've got my Juniper devices forwarding syslog messages to a STRM box and I want to setup an alert so if a switch has a corrupt primary partition and needs to boot from the alternate boot partition I receive an email alert so we can fix it. The easiest way is to just create a STRM rule that looks for the appropriate syntax in the syslog message.
Since this occurs at boot, if no log entry is created anywhere, does anyone have a suggestion on how we can get an email alert if this situation occurs?
Thank you for your time.
05-13-2011 09:35 AM
Not 100%, but these might be the messages you are looking for:
<timestamp> <hostname> alarmd: Alarm set: FPC color=YELLOW, class="CHASSIS", reason=Host 0 Boot from backup root
<timestamp> <hostname> craftd: Minor alarm set, Host 0 Boot from backup root
I recently had an EX2200 boot into the alternate root partition when performing an upgrade to 10.4R3. I performed a re-install to fix the primary root partition. These are the log messages that are left from around that time.
05-13-2011 05:22 PM
You're pretty much dead on with the alert.
One problem, and it's a big one, is that these messages aren't making it to STRM. We had an EX2200 that corrupted and I see the log messages in /var/log/messages but going through all the alerts for this switch on STRM these don't show up.
I'm thinking that since these messages are being generated at boot, they're injected into /var/log/messages before syslog forwarding actually starts occuring and that once logs are forwarded, it just starts forwarding logs live and not any posted before this point.
Here's my syslog forwarding configuration on the switch:
Here's the log entries that showed up in /var/log/messages:
May 13 11:47:38 HAS-EX2200-5 alarmd: Alarm set: Configuration color=YELLOW, class="CHASSIS", reason=Rescue configuration is not set
May 13 11:47:38 HAS-EX2200-5 alarmd: shutting down chassisd connection: chassisd ipc pipe read error
May 13 11:48:03 HAS-EX2200-5 alarmd: Alarm set: Configuration color=YELLOW, class="CHASSIS", reason=Rescue configuration is not set
May 13 11:48:03 HAS-EX2200-5 alarmd: shutting down chassisd connection: chassisd ipc pipe read error
May 13 11:50:25 HAS-EX2200-5 alarmd: Alarm set: FPC color=YELLOW, class=CHASSIS, reason=Host 0 Boot from backup root
May 13 11:50:25 HAS-EX2200-5 craftd: Minor alarm set, Host 0 Boot from backup root
Any ideas? Maybe Service Now's AI Scripts will generate an alert but until they release a version recommended for the EX switching line I'm stuck. We 've already had at least a half dozen switches that became corrupted due to power loss and while the recovery partition really helps, it would be a real help to receive an alert when one becomes corrupted rather than having to login to each switch/VC in a building after an unscheduled power outage.
Again, thank you for the help.
05-13-2011 05:55 PM
I wonder if putting a 'backup-router' statement that covers the STRM network into the config would help. It might be that the routing daemon is not yet running and the SRX does not know how to send the packets to STRM - just a thought.
05-17-2011 09:37 AM
Thank you for the response. I have a few questions, though.
You mentioned the SRX. That may have been a typo (I always have about 10 things on my mind at the same time so it happens to me frequently as well) so I just wanted to clarify that this relates to EX switches.
I did a little reading up on the backup-router command. Didn't find much besides a very superficial explanation of it in the Juniper KB, documentation & the books I have. Do you know where more indepth information is on this and how it works? I'm curious as to whether this allows faster routing on strictly L3 switches (specially ones running routing protocols) or if it also helps L2. The switch I'm using for testing this is strictly an L2 switch that has a single IP and a static route configured.
Again, thanks for the help.