On November 19 2024, LDLC's off-brand SSD died on me. RIP. Re-installed Tumbleweed on the replacement (Kingston SA400S3) on November 28. Since then, I have been getting uncannily reproducible stuttering and frame drops (60β†˜40Β±10) in Hades β…‘ when moving toward effect- or particle-heavy areas of the hub rooms (Crossroads, Training Grounds). No idea WTF, those areas ran fine before. - "High" graphics setting at native 1920Γ—1080 resolution. - Tried "Low" graphics, lowered resolution, disabled vsync, switched to Windowed mode: symptoms persist. - Proton Experimental. - Tried a couple of old Proton versions: symptoms persist. - Reinstalled game & nuked everything under - =~/.cache/mesa_shader_cache*= - =~/.cache/radv_builtin_shaders*= - =~/.config/unity3d= - =~/.local/share/Steam= - =~/.local/share/vulkan/= - =~/.steam*= in case "stale shaders" were to blame or something. - Tumbleweed/Plasma/Wayland session. - Tried X11: symptoms persist. - Reducing noise with - ~balooctl6 suspend~ - ~swapoff -a~ (RAM nowhere near exhausted) Well then. * CPU frequency scaling? (Hey πŸ‘‹ A warning: this was the first rabbit hole I burrowed into. Spoiler alert: nothing I learned here solved the problem. Feel free to skip to the next section if you want to know how this ends {{{narrator(he wrote\, furiously hoping against hope that he would indeed see the end of this someday)}}}) Started by noticing that the Plasma "Power Management" tray widget says "Power Profile" is "Not available". Not sure whether that was the case with the old installation; maybe I had something configured or installed to enable this? Internet says "install and enable power-profiles-daemon", except that's on: #+begin_example $ systemctl status power-profiles-daemon.service ● power-profiles-daemon.service - Power Profiles daemon Loaded: loaded (/usr/lib/systemd/system/power-profiles-daemon.service; disabled; preset: disabled) Active: active (running) since Sun 2024-12-01 11:46:32 CET; 45min ago Invocation: b2545a02bc9642b7aeb5f370e8b50e7c Main PID: 2289 (power-profiles-) Tasks: 4 (limit: 18320) CPU: 52ms CGroup: /system.slice/power-profiles-daemon.service └─2289 /usr/libexec/power-profiles-daemon #+end_example But: #+begin_example $ powerprofilesctl ,* balanced: PlatformDriver: placeholder power-saver: PlatformDriver: placeholder #+end_example Internet says I am missing the right scaling driver, and sounds very keen on enabling =amd_pstate=, which I do not seem to have available. =/proc/config.gz= suggests the kernel configuration supports it, but =cpupower= does not appear to know about it: #+begin_example $ zcat /proc/config.gz | grep -i pstate CONFIG_X86_INTEL_PSTATE=y CONFIG_X86_AMD_PSTATE=y CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 # CONFIG_X86_AMD_PSTATE_UT is not set $ cpupower frequency-info analyzing CPU 5: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 5 CPUs which need to have their frequency coordinated by software: 5 maximum transition latency: Cannot determine or is not supported. hardware limits: 1.40 GHz - 3.70 GHz available frequency steps: 3.70 GHz, 1.70 GHz, 1.40 GHz available cpufreq governors: ondemand performance schedutil current policy: frequency should be within 1.40 GHz and 3.70 GHz. The governor "schedutil" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.30 GHz (asserted by call to kernel) boost state support: Supported: yes Active: no #+end_example =dmesg= offers: #+begin_example $ sudo dmesg -H […] amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled #+end_example Though: #+begin_example $ lscpu | grep -i cppc Flags: […] cppc […] #+end_example So ACPI problem? Lots of posts mentioning =amd_= parameters on the kernel command-line, but AFAIU those posts are stale with newer kernels (6.11 here) which automatically (attempt to) load the =amd_pstate= driver. Went through the UEFI menu and found nothing related to ACPI or [[https://forum.level1techs.com/t/amd-p-state-driver/197885/24][X2APIC]]. Skeptical of UEFI settings anyway, since I did not change them between the old and new installations. {{{narrator(Some time later)}}} Probably not ACPI, =dmesg= is choke full of ACPI noise. OTOH, using some diagnosis methods from [[https://bugzilla.kernel.org/show_bug.cgi?id=218171][this kernel bug report]]: #+begin_example $ find /sys/devices -name '*cppc*' πŸ¦— #+end_example (~acpidump ; acpixtract ; iasl ; grep -i cpc *.dsl~ also yields πŸ¦—, but =iasl= complains about "unresolved" "control methods", so 🀷) {{{narrator(Some time later)}}} [[https://wiki.archlinux.org/title/CPU_frequency_scaling#amd_pstate][ArchWiki]] does say "Change /Enable CPPC/ […] from /Auto/ to /Enabled/". My UEFI menu tucks that under /Overclocking β†’ Advanced CPU Configuration β†’ AMD CBS β†’ CPPC CTRL/. That change *does* convince Linux to enable =amd_pstate=; going over the previous tests in reverse order: #+begin_example $ [… acpidump && acpixtract && iasl … ] && grep -i cpc *.dsl ssdt1.dsl: Name (_CPC, Package (0x17) // _CPC: Continuous Performance Control [… repeats 12 times …] $ find /sys/devices -name '*cppc*' -o -name '*pstate*' | tr -s '[:digit:]' N | sort -u /sys/devices/system/cpu/amd_pstate /sys/devices/system/cpu/cpufreq/policyN/amd_pstate_highest_perf /sys/devices/system/cpu/cpufreq/policyN/amd_pstate_hw_prefcore /sys/devices/system/cpu/cpufreq/policyN/amd_pstate_lowest_nonlinear_freq /sys/devices/system/cpu/cpufreq/policyN/amd_pstate_max_freq /sys/devices/system/cpu/cpufreq/policyN/amd_pstate_prefcore_ranking /sys/devices/system/cpu/cpuN/acpi_cppc $ sudo dmesg -H [… ominous silence about amd_pstate …] $ cpupower frequency-info analyzing CPU 1: driver: amd-pstate-epp CPUs which run at the same hardware frequency: 1 CPUs which need to have their frequency coordinated by software: 1 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 4.31 GHz available cpufreq governors: performance powersave current policy: frequency should be within 2.38 GHz and 4.31 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.57 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes AMD PSTATE Highest Performance: 255. Maximum Frequency: 4.31 GHz. AMD PSTATE Nominal Performance: 219. Nominal Frequency: 3.70 GHz. AMD PSTATE Lowest Non-linear Performance: 141. Lowest Non-linear Frequency: 2.38 GHz. AMD PSTATE Lowest Performance: 24. Lowest Frequency: 400 MHz. $ powerprofilesctl performance: CpuDriver: amd_pstate Degraded: no ,* balanced: CpuDriver: amd_pstate PlatformDriver: placeholder power-saver: CpuDriver: amd_pstate PlatformDriver: placeholder #+end_example And lo, the πŸƒβ†”πŸš€ slider appears in the Power Management tray widget. Nervous about entering the "Overclocking" UEFI zone tho, and concerned about these "Maximum frequencies". /And does it even help with the game?/ πŸ₯ No. No it does not; no discernible difference in FPS nor vibes. Will assume this new baseline cannot hurt - OT1H "overclocking" is scary, OTOH Linux now has a finer handle on the CPU and hopefully will not overwork it to death? * Sα΄‡α΄ α΄‡Κ€α΄€ΚŸ Wα΄‡α΄‡α΄‹κœ± Lᴀᴛᴇʀ - [[https://www.gamingonlinux.com/forum/topic/5475/page=1/][ridge reports]] "bad frame pacing on ADMGPU", - when vsync is turned off: a non-factor in my testing, - lots of useful information in that thread tho and interesting-sounding pointers, - [[https://www.gamingonlinux.com/forum/topic/5475/page=2/#r42519][Shmerl]] says: - games can cause stutter by underloading the GPU, causing it to drop out of "high performance mode", - (=amdgpu_top= and =radeontop= do confirm that lag spikes correlate with GPU usage drop) - see [[https://gitlab.freedesktop.org/drm/amd/-/issues/1500][drm/amd#1500]]: - /lots/ of sysfs noodling there; unfortunately, none of the suggested settings for =power_dpm_force_performance_level= & =pp_power_profile_mode= change the symptoms. - Since this forum seems full of knowledgeable folks, posted [[https://www.gamingonlinux.com/forum/topic/6437/][a new topic]] there… but then [[https://www.gamingonlinux.com/forum/topic/6463/][the UK OSA dropped]]. - In [[https://gitlab.freedesktop.org/drm/amd/-/issues/3618#note_2689087][this drm/amd#3618 thread]], @agd5f suggests "6.11 stable kernels" include a fix for the issue at hand there and a further rework "was submitted to 6.13"; @mattipulkkinen reports happy results with 6.13-rc2 (FTR, symptoms persist here with 6.12.8). - Piggybacked onto [[https://gitlab.freedesktop.org/mesa/mesa/-/issues/11300][mesa/mesa#11300]]: - common: Hades β…‘, iGPU, recent kernel & Mesa, Proton Experimental, - differences: Fedora, GNOME, X11, - noteworthy: good performance on Windows, - suggestion by @Venemo: downgrade & bisect Mesa; - tempting, though scared of bricking graphical sessions and/or ending up with a frankensystem (intalling binaries under a prefix is probably easy, but then keeping track of config tweaks and cache artifacts sounds fraught). - In [[https://gitlab.freedesktop.org/upower/power-profiles-daemon/-/issues/164][upower/power-profiles-daemon#164]], @Nyan reports problematic iGPU capping; not convinced this is applicable though, given the reported symptoms (video playback is fine here). - Seen reports of Variable Refresh Rate causing problems: - searched high and low to understand why VRR appears nowhere in Plasma settings, despite the start menu turning up "Display Configuration" when searching for "VRR", - mystery solved by ~kscreen-doctor -o~: =Vrr: incapable= 🀷 - [[https://www.techpowerup.com/forums/threads/what-fixed-stuttering-and-random-framerate-spikes-in-games-for-me.327264/][aska33j proclaims]] that /disabling CPPC/ "fixed stuttering and random framerate spikes in games for [them]" so… roundtrip to UEFI, disabling that. The =amd_pstate= warning is back; the "Power Profile" slider is no longer accessible in the systray widget; no discernible effect in-game anyway. - Looking at Steam forums, [[https://steamcommunity.com/app/1145350/discussions/1/596260472619121965/][some folks]] do report FPS drops /shortly after the update/: #+begin_quote it started fine after the major update, now suddenly im stuck with 40~50 fps with micro sutters β€” December 6 2024 #+end_quote - After AMD drivers & Mesa, figured I could look at vkd3d's issue tracker. [[https://github.com/doitsujin/dxvk/issues/4436][doitsujin/dxvk#4436]] and [[https://github.com/ValveSoftware/steam-for-linux/issues/11446][ValveSoftware/steam-for-linux#11446]] looked somewhat promising: reports of lag on "KDE Tumbleweed Wayland", reported not long before my symptoms began (November 2024)); alas, ~LD_PRELOAD=~ does not help. - {{{narrator(clicks through duplicates\, out of GitHub & into [[https://reddit.com/r/linux_gaming/comments/1htcxfj/system_green_screens_regularly_during_more/m5da9ey/][Reddit]])}}} #+begin_quote Alternatively, remove the offending line in =/usr/share/drirc.d/00-radv-defaults.conf= #+end_quote {{{narrator(discovers [[https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/00-radv-defaults.conf][=/usr/share/drirc.d/=]])}}} Computers were a mistake. - Peeked at [[https://github.com/HansKristian-Work/vkd3d-proton/blob/master/.github/ISSUE_TEMPLATE/bug_report.md][vkd3d-proton's issue template]] and idly ran with ~PROTON_LOG=1~. Over the course of 30 seconds or so, the log file gets flooded with 3MB's worth of =trace:unwind:dump_unwind_info= 🀨 - VRAM usage is always close to full, even when not playing games. "At rest", the Plasma shell consumes β‰ˆ410MB over 512MB available. - [[https://forum.kde.org/viewtopic.php%3Ff=111&t=165779.html][Lissanro reported]] in 2020 that changing Plasma's rendering backend to /Software/ freed up some VRAM. - Indeed, bringing up the Plasma Renderer menu, switching to /Software/, logging out & back in frees up some VRAM. It also yields compositing glitches 🀷 - More to the point, /it has no effect on the symptoms in-game/. - Figured I would ask ValveSoftware/Proton about the logs; filed [[https://github.com/ValveSoftware/Proton/issues/8424][#8424]]; got dup'd into [[https://github.com/ValveSoftware/Proton/issues/7805][#7805]], per the "one report per game" policy. That issue is about a /crash on Alt-Tab/, with an /Nvidia dGPU/; unsure how lumping our two reports together will help. Had to try 🀷 - Found [[https://gitlab.freedesktop.org/drm/amd/-/issues/2516][drm/amd#2516]]; noticed that I have - =/sys/module/gpu_sched/parameters/sched_policy=: 1 - =/sys/module/amdgpu/parameters/sched_policy=: 0 Changed the kernel command-line to set the former to 0, as suggested in that issue; symptoms persist. No idea what the latter is about, nor how it differs from the former. I can find [[https://docs.kernel.org/gpu/amdgpu/module-parameters.html#sched-policy-int][the docs for amdgpu]] but nor for gpu_sched. - [[https://old.reddit.com/r/archlinux/comments/1gzy0xd/amdgpu_regression_on_kernel_612_choppy/m1dn05z/][Some folks]] report =amdgpu.dcdebugmask=0x10= (≑ =DC_DISABLE_PSR=) fixing "choppy performance". No effect here. Could try setting [[https://www.kernel.org/doc/html/v6.13/gpu/amdgpu/module-parameters.html#dcdebugmask-uint][other values]]… * This is insane Selected subset of moving parts; "testability" considering ease of clean reverts: | Part | Testability | |--------------+-------------------------------------------------------------------------------------| | Linux kernel | 🫣 [[https://en.opensuse.org/SDB:InstallNewerKernel][some distro documentation]]; afraid of side-effects | | AMD drivers | 🀷 no clue; maybe inextricable from kernel? | | Mesa | 😬 easy to recompile; hard to control transient state in cache & config folders | | Steam | πŸ«₯ under Steam's control | | Wine | πŸ«₯ under Steam's control | | Proton | πŸ‘Œ as long as I stick to versions under Steam's control; have not considered GE yet | | vkd3d-proton | πŸ«₯ under Steam's control | | Hades β…‘ | πŸ«₯ under Steam's control | That's looking at software packages as individual blackboxes; config-wise, worth noting: | Part | Testability | |-----------------+---------------------------------------------------| | AMD pstate | 😬 UEFI roundtrip | | sysfs | OK; worst case: reboot & edit kernel command-line | | Plasma Renderer | OK | Let's throw in: | Part | Testability | |---------------+-----------------------------------| | Mobo firmware | πŸ”₯ [[file:maintenance.org::*Firmware updates][reports]] of nuked boot settings |