So I want to update my machines. And I was using ansible to do it with the yay module. Then I was unable to boot up my machine.
The truth to be told I have a bit of esoteric configuration,
because I am mounting my efi root under the /efiroot
. (This will become important as I discover
what the problem was).
First, what is the problem?
Let me boot up a rescue system, mount the partitions and figure out what is wrong.
root@seed ~ # mount /dev/vda2 /mnt
root@seed ~ # mount /dev/vda1 /mnt/efiroot
Re-installing the kernel to fix boot issues:
root@seed ~ # arch-chroot /mnt pacman -S linux-lts
warning: linux-lts-6.6.72-1 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...
Packages (1) linux-lts-6.6.72-1
Total Installed Size: 128.65 MiB
Net Upgrade Size: 0.00 MiB
:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring [####################################################################] 100%
(1/1) checking package integrity [####################################################################] 100%
(1/1) loading package files [####################################################################] 100%
(1/1) checking for file conflicts [####################################################################] 100%
(1/1) checking available disk space [####################################################################] 100%
:: Processing package changes...
(1/1) reinstalling linux-lts [####################################################################] 100%
:: Running post-transaction hooks...
(1/3) Arming ConditionNeedsUpdate...
(2/3) Updating module dependencies...
(3/3) Updating linux initcpios...
==> Building image from preset: /etc/mkinitcpio.d/linux-lts.preset: 'kidpc'
==> Using configuration file: '/etc/mkinitcpio-kidpc.conf'
-> -k /efiroot/EFI/kidpc/vmlinuz-linux-lts -c /etc/mkinitcpio-kidpc.conf -g /efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img
==> ERROR: '/lib/modules/6.6.68-1-lts' is not a valid kernel module directory
error: command failed to execute correctly
And that is the problem. Surprising fact is that the above command returns 0
. I guess this is
why ansible was reporting back a success.
I will hack some scripts to understand what is happening
[root@seed /]# type mkinitcpio
mkinitcpio is /usr/bin/mkinitcpio
[root@seed /]# cp /usr/bin/mkinitcpio /root/
[root@seed /]# vim /usr/bin/mkinitcpio
[root@seed /]# diff -u {/root,/usr/bin}/mkinitcpio
--- /root/mkinitcpio 2025-01-19 07:05:38.306820602 +0100
+++ /usr/bin/mkinitcpio 2025-01-19 07:06:19.717324923 +0100
@@ -130,6 +130,8 @@
return 1
fi
+ error "kver will be called with $kernel"
+ error "$(type kver)"
kver "$kernel" && return
error "invalid kernel specified: '%s'" "$1"
And running the hacked script:
==> ERROR: kver will be called with /efiroot/EFI/kidpc/vmlinuz-linux-lts
==> ERROR: kver is a function
kver ()
{
local kver re='^[[:digit:]]+(\.[[:digit:]]+)+';
local arch;
arch="$(uname -m)";
if [[ $arch == @(i?86|x86_64) ]]; then
kver="$(kver_x86 "$1")";
else
kver="$(kver_generic "$1")";
fi;
[[ "$kver" =~ $re ]] || return 1;
printf '' "$kver"
}
==> ERROR: '/lib/modules/6.6.68-1-lts' is not a valid kernel module directory
So it means two things: The kver
function is defined somewhere else, not in the
mkinitcpio
script, and I need to understand how does it do its thing.
At the beginning of the mkinitcpio
file I see:
# needed files/directories
_f_functions=/usr/lib/initcpio/functions
So looking into the file /usr/lib/initcpio/functions
I managed to figure out what is happening:
kver_x86() {
local kver
local -i offset
# On x86 (since kernel 1.3.73, 1996), regardless of whether it's
# an Image, a zImage, or a bzImage: The file header is the same,
# and contains the kernel_version string.
#
# scrape the version out of the kernel image. locate the offset
# to the version string by reading 2 bytes out of image at at
# address 0x20E. this leads us to a string of, at most, 128 bytes.
# read the first word from this string as the kernel version.
#
# https://www.kernel.org/doc/html/v6.7/arch/x86/boot.html
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S?h=v6.7
offset="$(od -An -j0x20E -dN2 "$1")" || return
read -r kver _ < \
<(dd if="$1" bs=1 count=127 skip=$((offset + 0x200)) 2>/dev/null)
printf '%s' "$kver"
}
But maybe it would be better if I print out what is happening, so adding set -x
--- /root/mkinitcpio 2025-01-19 07:05:38.306820602 +0100
+++ /usr/bin/mkinitcpio 2025-01-19 07:27:25.174283041 +0100
@@ -130,6 +130,8 @@
return 1
fi
+ error "kver will be called with $kernel"
+ set -x
kver "$kernel" && return
error "invalid kernel specified: '%s'" "$1"
And see the output
==> ERROR: kver will be called with /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ kver /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ local kver 're=^[[:digit:]]+(\.[[:digit:]]+)+'
++ local arch
+++ uname -m
++ arch=x86_64
++ [[ x86_64 == @(i?86|x86_64) ]]
+++ kver_x86 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ local kver
+++ local -i offset
++++ od -An -j0x20E -dN2 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ offset=' 17152'
+++ read -r kver _
++++ dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
+++ printf %s 6.6.68-1-lts
++ kver=6.6.68-1-lts
++ [[ 6.6.68-1-lts =~ ^[[:digit:]]+(\.[[:digit:]]+)+ ]]
++ printf %s 6.6.68-1-lts
++ return
==> ERROR: '/lib/modules/6.6.68-1-lts' is not a valid kernel module directory
So the command here is:
dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
And running it says:
[root@seed /]# dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
6.6.68-1-lts (linux-lts@archlinux) #1 SMP PREEMPT_DYNAMIC Fri, 27 Dec 2024 15:16:06 +0000�2�2�2/3�233127+0 records in
127+0 records out
127 bytes copied, 0.00274759 s, 46.2 kB/s
So what it means?
That mkinitcpio tries to figure out what is the actually running kernel's version
by inspecting the binary /efiroot/EFI/kidpc/vmlinuz-linux-lts
. And that binary for some
reason is still pointing to an old kernel. At this point something has triggered in my brain:
I do have my efi mounted under /efiroot
which is a non-standard configuration, and I do have
some hooks that I had to put in place.
Look at these files:
[root@seed /]# cat /etc/mkinitcpio.d/linux-lts.preset
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"
[[ -e /boot/vmlinuz-linux-lts ]] && cp -af /boot/vmlinuz-linux-lts "/efiroot/EFI/kidpc/"
[[ -e /boot/intel-ucode.img ]] && cp -af /boot/intel-ucode.img "/efiroot/EFI/kidpc/"
PRESETS=('kidpc')
kidpc_config="/etc/mkinitcpio-kidpc.conf"
kidpc_image="/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img"
Ok, I need to understand at which points is it being execcuted, so adding some debug.
[root@seed /]# cp /etc/mkinitcpio.d/linux-lts.preset /root/
[root@seed /]# vim /etc/mkinitcpio.d/linux-lts.preset
[root@seed /]# diff -u {/root,/etc/mkinitcpio.d}/linux-lts.preset
--- /root/linux-lts.preset 2025-01-19 07:31:57.112404242 +0100
+++ /etc/mkinitcpio.d/linux-lts.preset 2025-01-19 07:35:29.636922941 +0100
@@ -1,6 +1,8 @@
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"
+echo "PRESET EXECUTED"
+ls -la /boot
[[ -e /boot/vmlinuz-linux-lts ]] && cp -af /boot/vmlinuz-linux-lts "/efiroot/EFI/kidpc/"
[[ -e /boot/intel-ucode.img ]] && cp -af /boot/intel-ucode.img "/efiroot/EFI/kidpc/"
Running pacman -S linux-lts
:
...
(3/3) Updating linux initcpios...
PRESET EXECUTED
total 117112
drwxr-xr-x 2 root root 4096 Dec 29 15:32 .
drwxr-xr-x 18 root root 4096 Dec 29 15:32 ..
-rw------- 1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw------- 1 root root 9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r-- 1 root root 8139776 Nov 12 18:20 intel-ucode.img
-rw-r--r-- 1 root root 13070848 Dec 29 15:32 vmlinuz-linux-lts
PRESET EXECUTED
total 117112
drwxr-xr-x 2 root root 4096 Dec 29 15:32 .
drwxr-xr-x 18 root root 4096 Dec 29 15:32 ..
-rw------- 1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw------- 1 root root 9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r-- 1 root root 8139776 Nov 12 18:20 intel-ucode.img
-rw-r--r-- 1 root root 13070848 Dec 29 15:32 vmlinuz-linux-lts
==> Building image from preset: /etc/mkinitcpio.d/linux-lts.preset: 'kidpc'
So it is executed Twice, and we have a file there to start with. I shall remove that, and re-run it.
[root@seed /]# rm /boot/vmlinuz-linux-lts
Now let's see what is happening
(3/3) Updating linux initcpios...
PRESET EXECUTED
total 104344
drwxr-xr-x 2 root root 4096 Jan 19 07:38 .
drwxr-xr-x 18 root root 4096 Dec 29 15:32 ..
-rw------- 1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw------- 1 root root 9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r-- 1 root root 8139776 Nov 12 18:20 intel-ucode.img
PRESET EXECUTED
total 104344
drwxr-xr-x 2 root root 4096 Jan 19 07:38 .
drwxr-xr-x 18 root root 4096 Dec 29 15:32 ..
-rw------- 1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw------- 1 root root 9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r-- 1 root root 8139776 Nov 12 18:20 intel-ucode.img
==> Building image from preset: /etc/mkinitcpio.d/linux-lts.preset: 'kidpc'
==> Using configuration file: '/etc/mkinitcpio-kidpc.conf'
-> -k /efiroot/EFI/kidpc/vmlinuz-linux-lts -c /etc/mkinitcpio-kidpc.conf -g /efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img
==> ERROR: kver will be called with /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ kver /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ local kver 're=^[[:digit:]]+(\.[[:digit:]]+)+'
++ local arch
+++ uname -m
++ arch=x86_64
++ [[ x86_64 == @(i?86|x86_64) ]]
+++ kver_x86 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ local kver
+++ local -i offset
++++ od -An -j0x20E -dN2 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ offset=' 17152'
+++ read -r kver _
++++ dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
+++ printf %s 6.6.72-1-lts
++ kver=6.6.72-1-lts
++ [[ 6.6.72-1-lts =~ ^[[:digit:]]+(\.[[:digit:]]+)+ ]]
++ printf %s 6.6.72-1-lts
++ return
==> Starting build: '6.6.72-1-lts'
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [autodetect]
-> Running build hook: [keyboard]
==> WARNING: Possibly missing firmware for module: 'xhci_pci'
-> Running build hook: [keymap]
-> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
-> Running build hook: [modconf]
-> Running build hook: [block]
-> Running build hook: [filesystems]
-> Running build hook: [fsck]
==> Generating module dependencies
==> Creating zstd-compressed initcpio image: '/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img'
-> Early uncompressed CPIO image generation successful
==> Initcpio image generation successful
Looks like kernel knows where to put itself, and also initramfs, so I only need to keep the hook for the intel-ucode binary.
[root@seed /]# rm /boot/initramfs-linux-lts-fallback.img
[root@seed /]# rm /boot/initramfs-linux-lts.img
Here is the modified preset file /etc/mkinitcpio.d/linux-lts.preset
:
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"
[[ -e /boot/intel-ucode.img ]] && cp -af /boot/intel-ucode.img "/efiroot/EFI/kidpc/"
PRESETS=('kidpc')
kidpc_config="/etc/mkinitcpio-kidpc.conf"
kidpc_image="/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img"
Lessons learned - microcode
Add microcode hook - Please note that it comes before autodetect, so that it is always included, regardless of where this is running.
[root@seed /]# cat /etc/mkinitcpio-kidpc.conf
MODULES=(virtio virtio_blk virtio_pci virtio_net)
BINARIES=()
FILES=()
HOOKS=(base udev microcode autodetect keyboard keymap consolefont modconf block filesystems fsck)
Get rid of copying in preset file
[root@seed /]# cat /etc/mkinitcpio.d/linux-lts.preset
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"
PRESETS=('kidpc')
kidpc_config="/etc/mkinitcpio-kidpc.conf"
kidpc_image="/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img"
Amend boot configuration so that it does not load microcode
[root@seed /]# cat /efiroot/loader/entries/kidpc.conf
title Arch kidpc
linux /EFI/kidpc/vmlinuz-linux-lts
initrd /EFI/kidpc/initramfs-linux-lts-kidpc.img
options root="UUID=aeabb877-72f2-4817-84c6-934206038906" rw