sbondhus Posted March 13, 2014 Share Posted March 13, 2014 We get watchdog faults fairly often on my R&D project (once every few days). I'm trying to understand why they're happening and how we can prevent them or, at the very least, understand the cause. Currently I'm struggling to understand whether we're having "soft" or "hard" watchdog faults. I believe they're hard watchdog faults. Setup: Power UMAC rack with PMAC CPU and a variety of other accessory cards (almost a full rack) connected via Acc5E/fiber to a MACRO16 station with several other cards. Using firmware 1.5.8.0 and IDE 1.5.0.21. Symptoms: (1) "PWR" LEDs turn red (on a system with 59E3s) or indicator LEDs turn off ("PWR" and "Enc" on a system with 24E2As) , (2) PMAC unresponsive (via ethernet connection through company network; we haven't tried via RS232 yet), (3) "WD" led shows red on PMAC CPU card. As far as what is happening when the watchdog is tripped, I'm not so sure. It definitely happens if we do a "Build and Download" without doing a "$$$***" first (we think this is related to the three different user-written servo algorithms we have running on motors 0-30). Sometimes it happens when we're not heavily taxing the processor but the project is running (servo tasks, rticplc tasks, and background tasks, including UDP communication with a PC over the company network). I'd like help understanding: Which watchdog tripped ("hard" or "soft"). For either type of fault, what is the best way to find the cause of the fault. Link to comment Share on other sites More sharing options...
curtwilson Posted March 15, 2014 Share Posted March 15, 2014 If you have lost communications and the watchdog LED comes on, you have a hard watchdog failure. It appears that it is possible that if you rebuild and download a project with user-written servos active, then as it tries to execute these when the addresses have changed, the call to one of these routines can cause the processor to get lost before the pointers are straightened out. Make sure these routines are not running first: the command "I0,31,100=0" is a quick way of setting Motor[x].ServoCtrl (Ix00) to 0 for Motors 0-30. Or just re-initialize as you have been doing. Link to comment Share on other sites More sharing options...
sbondhus Posted March 27, 2014 Author Share Posted March 27, 2014 We're still trying to figure out the hard watchdog issues, but now I have another issue. We are occasionally seeing something that appears to be similar to a watchdog but with different indications. Normally with a soft watchdog I'd expect to be able to still communicate with the system. With a hard watchdog I expect to lose communication and see the red "WD" led turn on. Occasionally we see a different issue: we lose communication, the WD led stays off, but ALL accessory cards in the rack lose power (PWR leds are off, any Enc lights are off, etc). The Ethernet ports are still lit up as if they're communicating. Any idea what this is? Link to comment Share on other sites More sharing options...
sbondhus Posted April 4, 2014 Author Share Posted April 4, 2014 Yet another similar issue: sometimes if we leave a machine running over night, even with no software loaded, when we come in the next morning we can still communicate with the machine on any processes that were previously open, but we cannot open new processes (such as a new GPASCII thread). If we attempt to SSH in (telnet connection attempts are refused entirely) with the 'deltatau' or 'root' usernames, we get the following messages: login as: deltatau deltatau@xxx.xxx.xxx.xxx's password: Linux powerpmac 2.6.30.3 #24 Mon Sep 10 11:26:14 PDT 2012 ppc --------------------------------- -- PowerPMAC Motion Controller -- --------------------------------- Could not chdir to home directory /home/deltatau: Input/output error /bin/bash: Input/output error login as: root root@xxx.xxx.xxx.xxx's password: Linux powerpmac 2.6.30.3 #24 Mon Sep 10 11:26:14 PDT 2012 ppc --------------------------------- -- PowerPMAC Motion Controller -- --------------------------------- Last login: Wed Oct 31 23:56:54 2012 from 10.32.4.132 /bin/bash: Input/output error Link to comment Share on other sites More sharing options...
steve.milici Posted April 7, 2014 Share Posted April 7, 2014 Capture the output of the serial port during boot (115KBAUD). This may indicate or give clues as to the problem. Link to comment Share on other sites More sharing options...
Recommended Posts