piefum Posted February 22, 2018 Share Posted February 22, 2018 Hi all I have some strange watchdog error that I cannot really debug. The situation is the following: - I have a complete system (6 degrees of freedom) running with a Power Brick LV IMS - The system is expanded with an ethercat network to read additional serial encoders and temperature sensors - The system is considered finished, ready to be shipped to customer - The system is here in my lab to execute some run-in tests, that is moving the system day and night, trying to detect some error or faults The strange situation comes from some watchdog faults that appears at random and I cannot really see where they come from: sometimes it happens that the system stays alive for 80 hours and nothing happens; then, I perform a reboot and after few minutes the watchdog trips and I need to power-cycle the system. How can I debug a situation like that? Is there a log inside the PMAC that tells us why the watchdog has tripped? Thanks a lot gigi Link to comment Share on other sites More sharing options...
steve.milici Posted February 23, 2018 Share Posted February 23, 2018 If this always happens after a re-boot, always log the boot response and send the “offending” one to support@deltatau.com. It may also be useful to have the serial port capturing any messages during operation. Link to comment Share on other sites More sharing options...
Alex Anikstein Posted February 28, 2018 Share Posted February 28, 2018 When it does watchdog, what type is it? If it is a "soft" watchdog, there is some debugging that can be done. You can set up a gather of PMAC Status Elements and also of status elements for your program--if you have a PLC that functions as a state machine, for instance, you can record what states you are in--and then in a separate PLC, set Gather.Enable=3 to perform an indefinite gather. Once the status indicates that a watchdog has occurred, the plot can be stopped and the data can be analyzed on the computer. The most common cause of a hard watchdog is, ultimately, the 5V supply on the unit dipping too low. On a Clipper, this may be easy to troubleshoot (as the 5V is brought in directly), but on other form factors, it may be harder to address, as the 5V is likely stepped down from an external 24V supply. Link to comment Share on other sites More sharing options...
piefum Posted March 1, 2018 Author Share Posted March 1, 2018 When it does watchdog, what type is it? If it is a "soft" watchdog, there is some debugging that can be done. at the moment I am debugging the system without the IDE, only our custom software is connected. We see that the front watchdog led switch-on, then after few moments the system reboots by itself. From our software, I can see that the sys.uptime restart from zero. The debug is painfully slow, since the watchdog trips after some hours of work: as an example, the system has just got into error after 12400 s (3.5 hours) of use. I believe that a power supply fault should power off the system immediately, correct? thanks a lot gigi Link to comment Share on other sites More sharing options...
tecnico Posted March 8, 2018 Share Posted March 8, 2018 Ciao Gigi, it must be something in Lecco's area (joke) I experienced a similar problem, albeit with a different CPU (UMAC 465), when using a ethercat network. After some weeks of debugging we came to conclusion together with DTCH that there is something in the critical interrupt routine that causes a kernel panic in these conditions. Initially I thought it was related to the number of Ethercat axes (16) I was using, but then it happened (apparently in a random fashion) to "lighter" machines (just the WD, not the reboot). So it could be an idea to turn the critical interrupt off Ciao Andrea Link to comment Share on other sites More sharing options...
piefum Posted March 16, 2018 Author Share Posted March 16, 2018 Ciao Andrea et all ... After some weeks of debugging we came to conclusion together with DTCH that there ... I installed the patch that disables the interrupt one week ago, and since then the system did not encoured any WD trip or reboot or something strange. I would say now that the problem is fixed. Many thanks for the help, I would never ever fixed that in time for delivery. DeltaTau guys: is it possible to make this update something "official" and known to public? Ciao gigi Link to comment Share on other sites More sharing options...
J0hann Posted March 22, 2018 Share Posted March 22, 2018 Hello guys, I have the same problem here: two PowerBrickAC-based system with multiple axes falling into hardware watchdog state randomly. Having a lot of C code, both in background programs and RTI, I am familiar with DT software watchdogs, but hardware ones? I have no idea how to debug them. I am curious about your solution, the critical interrupt disabling. How could you do this ? You mentioned a patch; could you tell me where did you find it ? Thanks a lot ! Johann Link to comment Share on other sites More sharing options...
J0hann Posted March 30, 2018 Share Posted March 30, 2018 OK I find what my problem was. Using the linux "top" and "watch -n 0.5 cat /proc/xenomai/stat" commands, I could see that one of my debug process overload the CPU (idle dropped less that 1%). Disabling this debug process (basically high frequency logs) help the idle to rise back to 40%: no more hardware WD. Link to comment Share on other sites More sharing options...
Recommended Posts