random watchdog error

piefum · February 22, 2018

Hi all

I have some strange watchdog error that I cannot really debug.

The situation is the following:

- I have a complete system (6 degrees of freedom) running with a Power Brick LV IMS

- The system is expanded with an ethercat network to read additional serial encoders and temperature sensors

- The system is considered finished, ready to be shipped to customer

- The system is here in my lab to execute some run-in tests, that is moving the system day and night, trying to detect some error or faults

The strange situation comes from some watchdog faults that appears at random and I cannot really see where they come from: sometimes it happens that the system stays alive for 80 hours and nothing happens; then, I perform a reboot and after few minutes the watchdog trips and I need to power-cycle the system.

How can I debug a situation like that? Is there a log inside the PMAC that tells us why the watchdog has tripped?

Thanks a lot

gigi

steve.milici · February 23, 2018

If this always happens after a re-boot, always log the boot response and send the “offending” one to support@deltatau.com.

It may also be useful to have the serial port capturing any messages during operation.

Alex Anikstein · February 28, 2018

When it does watchdog, what type is it? If it is a "soft" watchdog, there is some debugging that can be done.

You can set up a gather of PMAC Status Elements and also of status elements for your program--if you have a PLC that functions as a state machine, for instance, you can record what states you are in--and then in a separate PLC, set Gather.Enable=3 to perform an indefinite gather. Once the status indicates that a watchdog has occurred, the plot can be stopped and the data can be analyzed on the computer.

The most common cause of a hard watchdog is, ultimately, the 5V supply on the unit dipping too low. On a Clipper, this may be easy to troubleshoot (as the 5V is brought in directly), but on other form factors, it may be harder to address, as the 5V is likely stepped down from an external 24V supply.

piefum · March 1, 2018

When it does watchdog, what type is it? If it is a "soft" watchdog, there is some debugging that can be done.

at the moment I am debugging the system without the IDE, only our custom software is connected. We see that the front watchdog led switch-on, then after few moments the system reboots by itself. From our software, I can see that the sys.uptime restart from zero.

The debug is painfully slow, since the watchdog trips after some hours of work: as an example, the system has just got into error after 12400 s (3.5 hours) of use. I believe that a power supply fault should power off the system immediately, correct?

thanks a lot

gigi

tecnico · March 8, 2018

Ciao Gigi,

it must be something in Lecco's area (joke)

I experienced a similar problem, albeit with a different CPU (UMAC 465), when using a ethercat network.

After some weeks of debugging we came to conclusion together with DTCH that there is something in the critical interrupt routine that causes a kernel panic in these conditions. Initially I thought it was related to the number of Ethercat axes (16) I was using, but then it happened (apparently in a random fashion) to "lighter" machines (just the WD, not the reboot).

So it could be an idea to turn the critical interrupt off

Ciao

Andrea

piefum · March 16, 2018

Ciao Andrea et all

...
After some weeks of debugging we came to conclusion together with DTCH that there
...

I installed the patch that disables the interrupt one week ago, and since then the system did not encoured any WD trip or reboot or something strange.

I would say now that the problem is fixed.

Many thanks for the help, I would never ever fixed that in time for delivery.

DeltaTau guys: is it possible to make this update something "official" and known to public?

Ciao

gigi

J0hann · March 22, 2018

Hello guys,

I have the same problem here: two PowerBrickAC-based system with multiple axes falling into hardware watchdog state randomly.

Having a lot of C code, both in background programs and RTI, I am familiar with DT software watchdogs, but hardware ones? I have no idea how to debug them.

I am curious about your solution, the critical interrupt disabling. How could you do this ? You mentioned a patch; could you tell me where did you find it ?

Thanks a lot !

Johann

J0hann · March 30, 2018

OK I find what my problem was. Using the linux "top" and "watch -n 0.5 cat /proc/xenomai/stat" commands, I could see that one of my debug process overload the CPU (idle dropped less that 1%). Disabling this debug process (basically high frequency logs) help the idle to rise back to 40%: no more hardware WD.

Sign In

random watchdog error

Recommended Posts

piefum

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

steve.milici

Link to comment

Share on other sites

Alex Anikstein

Link to comment

Share on other sites

piefum

Link to comment

Share on other sites

tecnico

Link to comment

Share on other sites

piefum

Link to comment

Share on other sites

J0hann

Link to comment

Share on other sites

J0hann

Link to comment

Share on other sites

Announcements

Activity

Browse