Lijsterbes Curious, what kernel are you using? I have freezing issues with certain kernels and not others and I don't know if it's an arch issue or obarun issue.
Managing misbehaving programs
I run kernel 6.12.10-arch1-1
. But the same behavior has occcurred over the past months/year+ with previous kernels, with no change.
Lijsterbes Transcoding a big video file can take 50+ hours,
Maybe the program used to transcode leak. Did you tried transcoding with a very short and see what happens with the RAM?
Do you use swap? Does the program at some point use the swap? Did it fulfill the swap completely?
Have you tried to change the nice value of the program?
I don't know how you start your program, but have you tried to limit it using e.g. s6-softlimit.
Can you start your program is a container? If yes, try it with a container and see if you have the same behavior, but it will freeze the container instead of the system entirely (well, normally).
Memory leaks are unlikely; the crashes happen within a minute of the system still having 14 out of 16 GB of RAM available. There is a swap, which remains untouched throughout the entire run of the system. Short files have a greater chance of getting transcoded correctly, but if I process a bunch of them, at some point the system will still freeze. It seems probabilistic in time.
The nice value hasn't occurred to me yet, I'll try it.
I'm unfamiliar with containers, I'd have to learn how. And be good enough at it not to just be adding an extra complication.
s6-softlimit sounds like the thing I might be looking for! I'll read up on it and implement.
Did you have tried to make a test with an alternative of the avidemux program? Do you use it with CLI or QT?
The machine doing the work is headless, so I work from CLI.
Do you mean alternatives as in different versions of the avidemux program?
- Edited
Lijsterbes Do you mean alternatives as in different versions of the avidemux program?
as a complete different program, to be sure that your trouble come from your transcoder program. (assuming you use 'avidemux' as your encoding program)
I will try that. At the moment I'm still running a new attempt with avidemux, this time launched from s6-softlimit, to see if that works. The freezes are usually several days apart, so I'll inform you when I have more data.
After some experimentation, it seems that I can influence the probability of a freeze by changing which software I run for heavy loads. And that works fine for my purposes.
But what I REALLY am looking for with this thread:
A misbehaving program shouldn't be able to disable the entire system like this. I understand that nothing is invulnerable, and things like fork bombs, DDoS attacks or hardware failures can't be survived, but it strikes me that a simple machine with hardly anything running on it should not get frozen by a misbehaving program. Linux is renowned for its stability, and it can't even handle a simple problem like this? That seems odd, the system is in charge of handing out timing and resources to programs, it should be able to keep control of that.
Is the system so slavishly devoted to userspace programs that it allows itself to be frozen? Or are there settings to change to disallow this behavior to happen?
i have experimented a similar behavior with shutter. It do memory leak and block entirely the machine. I was forced to make an hard shutdown.
Finding out why the program freezes can be very complicated. The kernel does its best, but it won't stop dev from making certain mistakes.
- Edited
In the course of experimentation with this problem, I tried to force memory issues to the fore by removing the swap from my system. Basically, I executed # swapoff
and set swap off in boot@system
.
Instead of making the problem manifest more clearly and quicker, ever since eliminating swap I have not experienced a single freeze. If anything, the machine has become more responsive and significantly [10-15%] quicker in completing its tasks. I've been shoveling workload onto it in a completely unreasonable manner, which would have frozen the machine within the day before I turned off swap, but since turning swap off, it completely refuses to show any bad behavior.
I have no clue what happened, and it doesn't clarify anything about the root of the issue, but I hope this will point some others in the correct direction.