summaryrefslogtreecommitdiff
path: root/guides/sysadmin/machines
diff options
context:
space:
mode:
authorKévin Le Gouguec <kevin.legouguec@gmail.com>2025-01-18 21:41:30 +0100
committerKévin Le Gouguec <kevin.legouguec@gmail.com>2025-01-18 21:42:34 +0100
commitc72fba7717067a5887f41974b055a5ebf40eb1cf (patch)
treef28fd208681e45a95548b113388171090847768a /guides/sysadmin/machines
parent1039d44f0ad2238eb6d15ccf98981c77c552e2b2 (diff)
downloadmemory-leaks-c72fba7717067a5887f41974b055a5ebf40eb1cf.tar.xz
Split off "hardware" notes
Into bona-fide hardware notes, and notes specific to desktop maintenance that bear increasingly little relation to "hard"ware.
Diffstat (limited to 'guides/sysadmin/machines')
-rw-r--r--guides/sysadmin/machines/amdahl30/README.org1
-rw-r--r--guides/sysadmin/machines/amdahl30/assembly.org43
-rw-r--r--guides/sysadmin/machines/amdahl30/maintenance.org397
3 files changed, 441 insertions, 0 deletions
diff --git a/guides/sysadmin/machines/amdahl30/README.org b/guides/sysadmin/machines/amdahl30/README.org
new file mode 100644
index 0000000..07cdc45
--- /dev/null
+++ b/guides/sysadmin/machines/amdahl30/README.org
@@ -0,0 +1 @@
+An [[https://www.ldlc.com/fiche/PB00227011.html][LDLC PC Zenifier-SSD]] running Tumbleweed since April 2021.
diff --git a/guides/sysadmin/machines/amdahl30/assembly.org b/guides/sysadmin/machines/amdahl30/assembly.org
new file mode 100644
index 0000000..873caa3
--- /dev/null
+++ b/guides/sysadmin/machines/amdahl30/assembly.org
@@ -0,0 +1,43 @@
+Lots of impedance mismatches between "documentation" and actual
+hardware:
+- CPU cooler (fan) has spring screws; diagrams show retention clips.
+ Had to dig into the [[https://www.amd.com/en/support/kb/faq/cpu-7][AMD knowledge base]] to find that some
+ motherboards come with "speculative" clips, which must be unscrewed
+ and removed in order to install the spring-screw cooler.
+- Diagrams say to add thermal paste, but the fan already comes with a
+ pre-applied layer.
+- Documentation shows RAM clips for both ends of the sticks; the
+ motherboard seems to only have clips for one end.
+- =SYS_FAN1= header has 4 pins; front fan plug has 3 holes. The
+ Internet says it's fine ([[https://old.reddit.com/r/buildapc/comments/4139k8/3_pin_sys_fan2_vs_4_pin_sys_fan1/][[1]​]], [[https://forums.tomshardware.com/threads/sys_fan1-and-sys_fan2.3195778/][[2]​]]).
+- Motherboard has [[https://www.msi.com/Motherboard/B550M-A-PRO/Specification]["8 mounting holes"]] but covers only 6 of the case's
+ standoffs; none of the diagrams in the case's manual match the
+ format of the motherboard.
+- The diagram for inserting the power supply unit leaves a lot to the
+ imagination.
+- The [[https://www.snia.org/forums/cmsi/knowledge/formfactors#U2][SSD dimension nomenclature]] is weird as hell. The SSD's user
+ manual seems to imply that I have a 2.5″ model, but my measuring
+ tape says the drive is 2.75″×3.875″ (diagonal 4.625″).
+- The link to the LDLC guide for mounting the SSD is dead; the page is
+ [[https://web.archive.org/web/20170901191800/http://www.ldlc.com/guides/AL00000817/comment-installer-un-ssd-dans-un-pc/][archived]], and merely contains a link to a [[https://www.youtube.com/watch?v=t1dHVb6VuWU][video]]. No matter though,
+ since it does not describe how to mount the drive on a 2.5″ bay.
+- The case user manual says to use specific screws for the SSD drive;
+ the SSD comes with its own set of screws. Are they meant for the
+ 3.5″ adapter? 🤷
+
+For novices, some steps range from "not very reassuring" to "downright
+hostile":
+- The amount of force needed to connect the CPU fan's first two
+ diagonal screws is terrifying.
+- The fan's case is asymmetric: one side has a small bump featuring
+ the maker's brand. If one does not attention when mounting the fan,
+ there is a 50% chance that this bump will get in the way of a RAM
+ stick.
+- No instruction on [[https://www.youtube.com/watch?v=XAWNzd-gc3Q&t=74s][how to force that I/O shield in]].
+- No instruction on how to snap the motherboard into the I/O shield.
+- Holy =$DEITY= that power supply unit has a *lot* of cables. And of
+ course I enthusiastically passed most of the small-headed ones
+ through the designated case hole, and had to pass them back out
+ because there was no room left to pass the 20-pin ATX connector.
+- Power supply user manual was taped to the bubble wrap, so part of
+ the "warnings" section got torn off.
diff --git a/guides/sysadmin/machines/amdahl30/maintenance.org b/guides/sysadmin/machines/amdahl30/maintenance.org
new file mode 100644
index 0000000..538ee38
--- /dev/null
+++ b/guides/sysadmin/machines/amdahl30/maintenance.org
@@ -0,0 +1,397 @@
+* Front panel
+The case's manual has a terse illustration with two arrows to pull the
+front panel "away and up" from the rest of the case.
+
+Here too, the amount of force required to do that is terrifying.
+Notice how [[https://www.youtube.com/watch?v=nUD0HyzVpLg][our friend here]] cuts abruptly at 8:17; that's because the
+levels of violence required to tear that panel off are too graphic for
+YouTube.
+* Front fan
+Remember that fan from earlier, the one with only 3 holes for the
+motherboard's 4 pins? Turns out
+
+1. that last "optional" pin is supposed to allow speed control;
+ without it, the fan always spins at full speed;
+2. the fan itself (ZA1225ASL) is [[https://www.youtube.com/watch?v=pd6gDY7LPlU][complete and utter crap]]: it cannot be
+ disassembled, so no cleaning off the dirt, no greasing.
+
+So the thing is loud, it always spins at full speed, and if one day it
+decides to become even louder than usual, you're SOL.
+* Motherboard
+** Firmware updates
+Quoth ~fwupdmgr get-devices~:
+
+#+begin_example
+WARNING: UEFI capsule updates not available or enabled in firmware setup
+See https://github.com/fwupd/fwupd/wiki/PluginFlag:capsules-unsupported for more information.
+#+end_example
+
+Quoth the wiki:
+
+#+begin_quote
+Most typically entering the firmware setup screen and enabling capsule
+updates will cause this warning to disappear, and also make firmware
+updates possible. The relevant option may be poorly labelled, for
+example "allow Windows UEFI updates".
+#+end_quote
+
+Not seeing any such option in the boot menu.
+
+#+begin_quote
+It is possible, but unlikely, that flashing the latest vendor BIOS,
+using either Windows or a LiveCD, will add support for [the thing that
+correlates with capsule updates being enabled].
+#+end_quote
+
+Well then. [[https://www.msi.com/Motherboard/B550M-A-PRO/support#bios][Vendor says]] "put this on a stick; reboot; ask the menu to
+flash from the stick". Putting some feelers out first:
+
+#+begin_quote
+If you execute a UEFI update, this update might delete the existing
+UEFI boot entries
+
+— [[https://wiki.archlinux.org/title/GRUB#Installation][ArchWiki]], 2024
+#+end_quote
+
+#+begin_quote
+Like others in this forum, I too suffered from a reformatted EFI
+partition following a BIOS update on my desktop pc. I had no idea
+that the MSI BIOS team doesn’t care about Linux installs, so to my
+surprise, following the update, my system booted straight to windows.
+
+[…]
+
+Ultimately, I completely wiped and recreated the EFI partition with
+gparted (fat32), changed the structure to GPT with gdisk, and then
+mounted that partition in the /mnt/efi location, and then proceeded to
+generate a new fstab with genfstab. After arch-chroot’ing into my
+endeavoros install, I ran bootctl install (which complained about boot
+loader not setting esp information) and then reinstall-kernels. I
+updated the loader.conf with the correct default boot ID, and set the
+recommended options. That got me back into my system after quite a
+bit of trial and error.
+
+— [[https://forum.endeavouros.com/t/endeavoros-efi-partition-wiped-by-msi-bios-update/54740][EndeavorOS forums]], May 2024
+#+end_quote
+
+#+begin_quote
+when updating the bios, it cleared all my settings. Apparently, this
+includes clearing the list of boot loaders, which it set back to the
+default of just Windows. Sadly this bios does not provide the tools
+to add boot entries as, apparently, some do. To fix it, I managed to
+boot to a Linux live USB and add the missing entry using the efiboomgr
+command line tool.
+
+— [[https://forum-en.msi.com/index.php?threads/updating-to-bios-7a32v1q1-wont-see-linux-uefi-boot.388109/][MSI AMD forums]], August 2023
+#+end_quote
+
+Welp.
+
+OT1H, I could dedicate a couple of week-ends learning the joys and
+wonders of efibootmgr, gdisk & friends. OTOH I sort of like keeping
+my desktop station… not bricked?
+
+Pity, because otherwise I've had smooth and incident-free firmware
+updates on other stations with ~fwupdmgr~ 🤷
+* SSD
+** Failure
+On November 19 2024, LDLC's off-brand SSD died on me. RIP.
+Re-installed Tumbleweed on the replacement (Kingston SA400S3) on
+November 28. Since then…
+*** Performance loss
+Getting uncannily reproducible frame drops (60 ↘ 40±10, movement
+visibly choppy) in Hades Ⅱ when moving toward effects/particles-heavy
+areas. No idea WTF, those areas ran fine before.
+
+- "High" graphics setting at native 1920×1080 resolution.
+ - Tried "Low" graphics, lowered resolution, disabled vsync: symptoms
+ persist.
+- Not forcing any "compatibility tool" version, assuming this yields
+ "Proton Experimental".
+ - Tried a couple of old Proton versions: symptoms persist.
+- Reinstalled game & nuked everything under
+ - =~/.cache/mesa_shader_cache*=
+ - =~/.cache/radv_builtin_shaders*=
+ - =~/.config/unity3d=
+ - =~/.local/share/Steam=
+ - =~/.local/share/vulkan/=
+ - =~/.steam*=
+ in case "stale shaders" were to blame or something.
+- Tumbleweed/Plasma/Wayland session.
+ - Tried X11: symptoms persist.
+- Reducing noise with =balooctl6 suspend=, =swapoff -a= (RAM nowhere
+ near exhausted).
+
+Well then.
+**** CPU frequency scaling?
+Started by noticing that the Plasma "Power Management" tray widget
+says "Power Profile" is "Not available". Not 100% sure whether that
+was the case with the old installation; maybe I had had something
+configured or installed to enable this?
+
+Internet says "install and enable power-profiles-daemon", except
+that's on:
+
+#+begin_example
+$ systemctl status power-profiles-daemon.service
+● power-profiles-daemon.service - Power Profiles daemon
+ Loaded: loaded (/usr/lib/systemd/system/power-profiles-daemon.service; disabled; preset: disabled)
+ Active: active (running) since Sun 2024-12-01 11:46:32 CET; 45min ago
+ Invocation: b2545a02bc9642b7aeb5f370e8b50e7c
+ Main PID: 2289 (power-profiles-)
+ Tasks: 4 (limit: 18320)
+ CPU: 52ms
+ CGroup: /system.slice/power-profiles-daemon.service
+ └─2289 /usr/libexec/power-profiles-daemon
+#+end_example
+
+But:
+
+#+begin_example
+$ powerprofilesctl
+,* balanced:
+ PlatformDriver: placeholder
+
+ power-saver:
+ PlatformDriver: placeholder
+#+end_example
+
+Internet says I am missing the right scaling driver, and seems very
+keen on enabling =amd_pstate=, which I do not seem to have available:
+
+#+begin_example
+$ cpupower frequency-info
+analyzing CPU 5:
+ driver: acpi-cpufreq
+ CPUs which run at the same hardware frequency: 5
+ CPUs which need to have their frequency coordinated by software: 5
+ maximum transition latency: Cannot determine or is not supported.
+ hardware limits: 1.40 GHz - 3.70 GHz
+ available frequency steps: 3.70 GHz, 1.70 GHz, 1.40 GHz
+ available cpufreq governors: ondemand performance schedutil
+ current policy: frequency should be within 1.40 GHz and 3.70 GHz.
+ The governor "schedutil" may decide which speed to use
+ within this range.
+ current CPU frequency: Unable to call hardware
+ current CPU frequency: 3.30 GHz (asserted by call to kernel)
+ boost state support:
+ Supported: yes
+ Active: no
+
+$ zcat /proc/config.gz | grep -i pstate
+CONFIG_X86_INTEL_PSTATE=y
+CONFIG_X86_AMD_PSTATE=y
+CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3
+# CONFIG_X86_AMD_PSTATE_UT is not set
+#+end_example
+
+=/proc/config.gz= suggests the kernel configuration supports it, but
+=cpupower= does not seem to know about it. =dmesg= offers:
+
+#+begin_example
+$ sudo dmesg -H
+[…] amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled
+#+end_example
+
+Though:
+
+#+begin_example
+$ lscpu | grep -i cppc
+Flags: […] cppc […]
+#+end_example
+
+So ACPI problem? Lots of posts mentioning =amd_= parameters on the
+kernel command-line but AFAIU those are stale with newer kernels (6.11
+here) which automatically (attempt to) load the =amd_pstate= driver.
+
+Went through the UEFI menu and found nothing related to ACPI or
+[[https://forum.level1techs.com/t/amd-p-state-driver/197885/24][X2APIC]]. Skeptical UEFI settings anyway, since I did not change them
+between the old and new installations.
+
+/Some time later/
+
+Probably not ACPI, =dmesg= is choke full of ACPI noise. OTOH, using
+some diagnosis methods from [[https://bugzilla.kernel.org/show_bug.cgi?id=218171][this kernel bug report]]:
+
+#+begin_example
+$ find /sys/devices -name '*cppc*'
+🦗
+#+end_example
+
+(=acpidump ; acpixtract ; iasl ; grep -i cpc *.dsl= also yields 🦗,
+but =iasl= complains about "unresolved" "control methods", so 🤷)
+
+/Some time later/
+
+[[https://wiki.archlinux.org/title/CPU_frequency_scaling#amd_pstate][ArchWiki]] does say "Change /Enable CPPC/ […] from /Auto/ to /Enabled/".
+My UEFI menu tucks that under /Overclocking → Advanced CPU
+Configuration → AMD CBS → CPPC CTRL/. That change *does* convince
+Linux to enable =amd_pstate=; going over the previous tests in reverse
+order:
+
+#+begin_example
+$ [… acpidump && acpixtract && iasl … ] && grep -i cpc *.dsl
+ssdt1.dsl: Name (_CPC, Package (0x17) // _CPC: Continuous Performance Control
+[… repeats 12 times …]
+
+$ find /sys/devices -name '*cppc*' -o -name '*pstate*' | tr -s '[:digit:]' N | sort -u
+/sys/devices/system/cpu/amd_pstate
+/sys/devices/system/cpu/cpufreq/policyN/amd_pstate_highest_perf
+/sys/devices/system/cpu/cpufreq/policyN/amd_pstate_hw_prefcore
+/sys/devices/system/cpu/cpufreq/policyN/amd_pstate_lowest_nonlinear_freq
+/sys/devices/system/cpu/cpufreq/policyN/amd_pstate_max_freq
+/sys/devices/system/cpu/cpufreq/policyN/amd_pstate_prefcore_ranking
+/sys/devices/system/cpu/cpuN/acpi_cppc
+
+$ sudo dmesg -H
+[… ominous silence about amd_pstate …]
+
+$ cpupower frequency-info
+analyzing CPU 1:
+ driver: amd-pstate-epp
+ CPUs which run at the same hardware frequency: 1
+ CPUs which need to have their frequency coordinated by software: 1
+ maximum transition latency: Cannot determine or is not supported.
+ hardware limits: 400 MHz - 4.31 GHz
+ available cpufreq governors: performance powersave
+ current policy: frequency should be within 2.38 GHz and 4.31 GHz.
+ The governor "powersave" may decide which speed to use
+ within this range.
+ current CPU frequency: Unable to call hardware
+ current CPU frequency: 3.57 GHz (asserted by call to kernel)
+ boost state support:
+ Supported: yes
+ Active: yes
+ AMD PSTATE Highest Performance: 255. Maximum Frequency: 4.31 GHz.
+ AMD PSTATE Nominal Performance: 219. Nominal Frequency: 3.70 GHz.
+ AMD PSTATE Lowest Non-linear Performance: 141. Lowest Non-linear Frequency: 2.38 GHz.
+ AMD PSTATE Lowest Performance: 24. Lowest Frequency: 400 MHz.
+
+$ powerprofilesctl
+ performance:
+ CpuDriver: amd_pstate
+ Degraded: no
+
+,* balanced:
+ CpuDriver: amd_pstate
+ PlatformDriver: placeholder
+
+ power-saver:
+ CpuDriver: amd_pstate
+ PlatformDriver: placeholder
+#+end_example
+
+And lo, the 🍃↔🚀 slider appears in the Power Management tray widget.
+
+Nervous about entering the "Overclocking" UEFI zone tho, and concerned
+about these "Maximum frequencies".
+
+/And does it even help with the game?/
+
+🥁
+
+No. No it does not; no discernible difference in FPS nor vibes.
+
+Will assume this new baseline cannot hurt - OT1H "overclocking" is
+scary, OTOH Linux now has a finer handle on the CPU and hopefully will
+not overwork it to death?
+**** Sᴇᴠᴇʀᴀʟ Wᴇᴇᴋꜱ Lᴀᴛᴇʀ
+- [[https://www.gamingonlinux.com/forum/topic/5475/page=1/][ridge reports]] "bad frame pacing on ADMGPU",
+ - when vsync is turned off: a non-factor in my testing,
+ - lots of useful information in that thread tho and
+ interesting-sounding pointers,
+ - [[https://www.gamingonlinux.com/forum/topic/5475/page=2/#r42519][Shmerl]] says:
+ - games can cause stutter by underloading the GPU, causing it to
+ drop out of "high performance mode",
+ - (=amdgpu_top= and =radeontop= do confirm that lag spikes
+ correlate with GPU usage drop)
+ - see [[https://gitlab.freedesktop.org/drm/amd/-/issues/1500][drm/amd#1500]]:
+ - /lots/ of sysfs noodling there; unfortunately, none of the
+ suggested settings for =power_dpm_force_performance_level= &
+ =pp_power_profile_mode= change the symptoms.
+
+- In [[https://gitlab.freedesktop.org/drm/amd/-/issues/3618#note_2689087][this drm/amd#3618 thread]], @agd5f suggests "6.11 stable kernels"
+ include a fix for the issue at hand there and a further rework "was
+ submitted to 6.13"; @mattipulkkinen reports happy results with
+ 6.13-rc2 (FTR, symptoms persist here with 6.12.8).
+
+- Piggybacked onto [[https://gitlab.freedesktop.org/mesa/mesa/-/issues/11300][mesa/mesa#11300]]:
+ - common: Hades Ⅱ, iGPU, recent kernel & Mesa, Proton Experimental,
+ - differences: Fedora, GNOME, X11,
+ - noteworthy: good performance on Windows,
+ - suggestion by @Venemo: downgrade & bisect Mesa;
+ - tempting, though scared of bricking graphical sessions and/or
+ ending up with a frankensystem (intalling binaries under a
+ prefix is probably easy, but then keeping track of config tweaks
+ and cache artifacts sounds fraught).
+
+- In [[https://gitlab.freedesktop.org/upower/power-profiles-daemon/-/issues/164][upower/power-profiles-daemon#164]], @Nyan reports problematic iGPU
+ capping; not convinced this is applicable though, given the reported
+ symptoms (video playback is fine here).
+
+- Seen reports of Variable Refresh Rate causing problems:
+ - searched high and low to understand why VRR appears nowhere in
+ Plasma settings, despite the start menu turning up "Display
+ Configuration" when searching for "VRR",
+ - mystery solved by ~kscreen-doctor -o~: =Vrr: incapable= 🤷
+
+- [[https://www.techpowerup.com/forums/threads/what-fixed-stuttering-and-random-framerate-spikes-in-games-for-me.327264/][aska33j proclaims]] that /disabling CPPC/ "fixed stuttering and random
+ framerate spikes in games for [them]" so… roundtrip to UEFI,
+ disabling that. The =amd_pstate= warning is back; the "Power
+ Profile" slider is no longer accessible in the systray widget; no
+ discernible effect in-game anyway.
+
+- Looking at Steam forums, [[https://steamcommunity.com/app/1145350/discussions/1/596260472619121965/][some folks]] do report FPS drops /shortly
+ after the update/:
+ #+begin_quote
+ it started fine after the major update, now suddenly im stuck with 40~50 fps with micro sutters
+ — December 6 2024
+ #+end_quote
+
+- After AMD drivers & Mesa, figured I could look at vkd3d's issue
+ tracker. [[https://github.com/doitsujin/dxvk/issues/4436][doitsujin/dxvk#4436]] and
+ [[ValveSoftware/steam-for-linux#11446]] looked somewhat promising:
+ reports of lag on "KDE Tumbleweed Wayland", reported not long before
+ my symptoms began (November 2024)); alas, ~LD_PRELOAD=~ does not
+ help.
+ -
+ #+begin_quote
+ Alternatively, remove the offending line in =/usr/share/drirc.d/00-radv-defaults.conf=
+ #+end_quote
+
+ /discovers [[https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/00-radv-defaults.conf][=/usr/share/drirc.d/=]]/
+
+ Computers were a mistake.
+
+- Peeked at [[https://github.com/HansKristian-Work/vkd3d-proton/blob/master/.github/ISSUE_TEMPLATE/bug_report.md][vkd3d-proton's issue template]] and idly ran with
+ ~PROTON_LOG=1~. Over the course of 30 seconds or so, the log file
+ gets flooded with 3MB's worth of =trace:unwind:dump_unwind_info= 🤨
+**** This is insane
+Selected subset of moving parts; "testability" considering ease of
+clean reverts:
+
+| Part | Testability |
+|--------------+-------------------------------------------------------------------------------------|
+| Linux kernel | 🫣 [[https://en.opensuse.org/SDB:InstallNewerKernel][some distro documentation]]; afraid of side-effects |
+| AMD drivers | 🤷 no clue; maybe inextricable from kernel? |
+| Mesa | 😬 easy to recompile; hard to control transient state in cache & config folders |
+| Steam | 🫥 under Steam's control |
+| Wine | 🫥 under Steam's control |
+| Proton | 👌 as long as I stick to versions under Steam's control; have not considered GE yet |
+| vkd3d-proton | 🫥 under Steam's control |
+| Hades Ⅱ | 🫥 under Steam's control |
+
+That's looking at software packages as individual blackboxes;
+config-wise, worth noting:
+
+| Part | Testability |
+|------------+-------------------|
+| AMD pstate | 😬 UEFI roundtrip |
+| sysfs | OK |
+
+Let's throw in:
+
+| Part | Testability |
+|---------------+-----------------------------------|
+| Mobo firmware | 🔥 reports of nuked boot settings |
+