Ghost in my machine? Randomly stalls out

Scott and Roy! Awesome additional paths to try! Wish I had seen this before the day was up. Excited to try both when I get back in the shop.
Will report back when I’ve tested the ideas.
Thanks everyone for the suggestions

2 Likes

Christopher,

Rolling back to 1.x will require a restoration CD (not USB drive). We have not stocked restoration CD’s for years, I don’t even know if we’d be able to make one again. But to me it sounds like either your motherboard or hard drive is dying and you should look into getting a new controller.

Thank you,
Norman

Dear Lord let’s hope this is not the case. A new controller’s costs compared to the value of a machine this old is not a great ratio.

1 Like

OK, sitting here watching it ghost run the exact same program over and over again that it originally was failing on. So far no issues! I’m going to throw a bit in next run and watch. The only difference? I pulled the controller out of the cabinet and plugged everything directly into the back of the machine. No more extension cords. No more controller inside the Tormach base.

I am running 2.10.2, so I’m not far behind the latest at all, so I’m guessing that’s not the issue. Machine has never been on network since there’s no network out in the shop.

I did pull the logs and admittedly I don’t know what I’m looking for. But on the logs for a day that I know it messed up several times, I have the following errors and warning showing up several times. So I’m going to assume they’re related despite not knowing the exact timestamps when the stalls were happening.

  • kernel: [ 9.641762] EXT4-fs (sda3): re-mounted. Opts: data=journal,commit=1,errors=remount-ro
  • udisksd[2214]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD2500AAKX_00ERMA0_WD_WCC2E7CN2D93: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)
  • kernel: [ 2.194697] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
  • This block
    kernel: [ 9.330959] ACPI Warning: SystemIO range 0x0000000000001828-0x000000000000182F conflicts with OpRegion 0x0000000000001800-0x000000000000187F (\PMIO) (20170119/utaddress-247)
    kernel: [ 9.330964] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
    kernel: [ 9.330966] ACPI Warning: SystemIO range 0x0000000000001C40-0x0000000000001C4F conflicts with OpRegion 0x0000000000001C00-0x0000000000001FFF (\GPR) (20170119/utaddress-247)
    kernel: [ 9.330969] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
    kernel: [ 9.330970] ACPI Warning: SystemIO range 0x0000000000001C30-0x0000000000001C3F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C3F (\GPRL) (20170119/utaddress-247)
    kernel: [ 9.330972] ACPI Warning: SystemIO range 0x0000000000001C30-0x0000000000001C3F conflicts with OpRegion 0x0000000000001C00-0x0000000000001FFF (\GPR) (20170119/utaddress-247)
    kernel: [ 9.330974] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
    kernel: [ 9.330974] ACPI Warning: SystemIO range 0x0000000000001C00-0x0000000000001C2F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C3F (\GPRL) (20170119/utaddress-247)
    kernel: [ 9.330976] ACPI Warning: SystemIO range 0x0000000000001C00-0x0000000000001C2F conflicts with OpRegion 0x0000000000001C00-0x0000000000001FFF (\GPR) (20170119/utaddress-247)
    kernel: [ 9.330978] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
    kernel: [ 9.330979] lpc_ich: Resource conflict(s) found affecting gpio_ich
    kernel: [ 9.641762] EXT4-fs (sda3): re-mounted. Opts: data=journal,commit=1,errors=remount-ro
  • mdm[1173]: WARNING: Plymouth is running, asking it to stop…
  • mdm[1173]: WARNING: Plymouth stopped
  • This block
    mate-session[1669]: EggSMClient-WARNING: Invalid Version string ‘0.9.4’ in /home/operator/.config/autostart/xfce-autostart-wm.desktop
    mate-session[1669]: WARNING: Unable to find provider ‘’ of required component ‘panel’
    mate-session[1669]: WARNING: Unable to find provider ‘’ of required component ‘dock’

Does anything look particularly juicy to anybody?

Has survived several multi-hour runs of the exact program that worked before, then was giving problems, and is now working again. So I’m pretty sure either one of the USB extensions or the monitor extension was the culprit. Not sure why any of those malfunctioning would cause a machine to just up and stop running, but hey what do I know.
Could possibly also be something dealing with micro-vibration and the controller being inside the machine cabinet. Will test that too. But my money is on cables.

And that’s why they call me Dr. Jones
Half the time it’s the usb controller. The other 90% of the time it’s the software.

2 Likes

That seems absolutely wild to me! Why in the world would an autonomous machine care at all whether I suddenly lost connection with a mouse/keyboard/jogwheel/screen. None of those things are required to continue working and I can smash the e-stop if I really need to stop it without those things available.
Crazy.

OK brief diatribe about USB, since I was involved in its inception way back when.

It’s USB. Its meant to be “plug and play”, so you can unplug and plug things in willy-nilly and it’ll be fine with that. In this day and age people expect it, but way back when it was bad juju to yank something out of a PC without turning it off first because all kinds of bad things could happen.

So, why is that relevant? Your controller is a PC at the heart and when it ‘sees’ a new USB device or one goes away, it does things it needs to do under the hood to take care of the sudden appearance or disappearance and keep the juju at bay. Those both take time and briefly interrupt whatever else the PC was doing. I imagine PP/LinuxCNC, which is trying to be ‘real time’, gets upset when it’s interrupted by the motherboard dealing many sudden arrivals or disappearances of a USB device or worse yet, one that starts sending random signals to it.

“extension cables” make all this worse. Officially in the USB specification there was no such thing as a USB extension cable, unless it is separately powered to boost the signals along the wire-- otherwise the signals get faint and noisy like a radio station far away. But people being people, they found that unpowered extension cables ‘worked’ (more or less), and so made and sold them. Then they started cheaping-out on the shielding as well, and those ‘worked’ (less well, but usually) and were cheaper, so everyone did that except for a few specialty expensive shielded powered USB extensions. Without those it’s anyone’s guess if an “extension” will work and for how long, especially in an electrically-noisy environment like a shop.

FWIW, If a USB device is noisy enough for long enough, the PC will “hang up” the noisy port and disconnect it. So if a person sees randomly USB devices not working until they are turned off and on again or plugged and unplugged, it’s a good sign there is too much noise on the wire.

2 Likes

This right here is the type of knowledge I desire. Thank you so much for explaining this as the issue is much more understood in my head now. Amazing.

1 Like