Install error when updating kernel

So I want to update my machines. And I was using ansible to do it with the yay module. Then I was unable to boot up my machine.

The truth to be told I have a bit of esoteric configuration, because I am mounting my efi root under the /efiroot. (This will become important as I discover what the problem was).

First, what is the problem?

Let me boot up a rescue system, mount the partitions and figure out what is wrong.

root@seed ~ # mount /dev/vda2 /mnt 
root@seed ~ # mount /dev/vda1 /mnt/efiroot 

Re-installing the kernel to fix boot issues:

root@seed ~ # arch-chroot /mnt pacman -S linux-lts
warning: linux-lts-6.6.72-1 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...

Packages (1) linux-lts-6.6.72-1

Total Installed Size:  128.65 MiB
Net Upgrade Size:        0.00 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring                                                                                     [####################################################################] 100%
(1/1) checking package integrity                                                                                   [####################################################################] 100%
(1/1) loading package files                                                                                        [####################################################################] 100%
(1/1) checking for file conflicts                                                                                  [####################################################################] 100%
(1/1) checking available disk space                                                                                [####################################################################] 100%
:: Processing package changes...
(1/1) reinstalling linux-lts                                                                                       [####################################################################] 100%
:: Running post-transaction hooks...
(1/3) Arming ConditionNeedsUpdate...
(2/3) Updating module dependencies...
(3/3) Updating linux initcpios...
==> Building image from preset: /etc/mkinitcpio.d/linux-lts.preset: 'kidpc'
==> Using configuration file: '/etc/mkinitcpio-kidpc.conf'
  -> -k /efiroot/EFI/kidpc/vmlinuz-linux-lts -c /etc/mkinitcpio-kidpc.conf -g /efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img
==> ERROR: '/lib/modules/6.6.68-1-lts' is not a valid kernel module directory
error: command failed to execute correctly

And that is the problem. Surprising fact is that the above command returns 0. I guess this is why ansible was reporting back a success.

I will hack some scripts to understand what is happening

[root@seed /]# type mkinitcpio
mkinitcpio is /usr/bin/mkinitcpio
[root@seed /]# cp /usr/bin/mkinitcpio /root/
[root@seed /]# vim /usr/bin/mkinitcpio 
[root@seed /]# diff -u {/root,/usr/bin}/mkinitcpio
--- /root/mkinitcpio    2025-01-19 07:05:38.306820602 +0100
+++ /usr/bin/mkinitcpio 2025-01-19 07:06:19.717324923 +0100
@@ -130,6 +130,8 @@
         return 1
     fi

+    error "kver will be called with $kernel"
+    error "$(type kver)"
     kver "$kernel" && return

     error "invalid kernel specified: '%s'" "$1"

And running the hacked script:

==> ERROR: kver will be called with /efiroot/EFI/kidpc/vmlinuz-linux-lts
==> ERROR: kver is a function
kver () 
{ 
    local kver re='^[[:digit:]]+(\.[[:digit:]]+)+';
    local arch;
    arch="$(uname -m)";
    if [[ $arch == @(i?86|x86_64) ]]; then
        kver="$(kver_x86 "$1")";
    else
        kver="$(kver_generic "$1")";
    fi;
    [[ "$kver" =~ $re ]] || return 1;
    printf '' "$kver"
}
==> ERROR: '/lib/modules/6.6.68-1-lts' is not a valid kernel module directory

So it means two things: The kver function is defined somewhere else, not in the mkinitcpio script, and I need to understand how does it do its thing.

At the beginning of the mkinitcpio file I see:

# needed files/directories
_f_functions=/usr/lib/initcpio/functions

So looking into the file /usr/lib/initcpio/functions

I managed to figure out what is happening:

kver_x86() {
    local kver
    local -i offset
    # On x86 (since kernel 1.3.73, 1996), regardless of whether it's
    # an Image, a zImage, or a bzImage: The file header is the same,
    # and contains the kernel_version string.
    #
    # scrape the version out of the kernel image. locate the offset
    # to the version string by reading 2 bytes out of image at at
    # address 0x20E. this leads us to a string of, at most, 128 bytes.
    # read the first word from this string as the kernel version.
    #
    # https://www.kernel.org/doc/html/v6.7/arch/x86/boot.html
    # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S?h=v6.7
    offset="$(od -An -j0x20E -dN2 "$1")" || return

    read -r kver _ < \
        <(dd if="$1" bs=1 count=127 skip=$((offset + 0x200)) 2>/dev/null)

    printf '%s' "$kver"
}

But maybe it would be better if I print out what is happening, so adding set -x

--- /root/mkinitcpio    2025-01-19 07:05:38.306820602 +0100
+++ /usr/bin/mkinitcpio 2025-01-19 07:27:25.174283041 +0100
@@ -130,6 +130,8 @@
         return 1
     fi

+    error "kver will be called with $kernel"
+    set -x
     kver "$kernel" && return

     error "invalid kernel specified: '%s'" "$1"

And see the output

==> ERROR: kver will be called with /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ kver /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ local kver 're=^[[:digit:]]+(\.[[:digit:]]+)+'
++ local arch
+++ uname -m
++ arch=x86_64
++ [[ x86_64 == @(i?86|x86_64) ]]
+++ kver_x86 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ local kver
+++ local -i offset
++++ od -An -j0x20E -dN2 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ offset=' 17152'
+++ read -r kver _
++++ dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
+++ printf %s 6.6.68-1-lts
++ kver=6.6.68-1-lts
++ [[ 6.6.68-1-lts =~ ^[[:digit:]]+(\.[[:digit:]]+)+ ]]
++ printf %s 6.6.68-1-lts
++ return
==> ERROR: '/lib/modules/6.6.68-1-lts' is not a valid kernel module directory

So the command here is:

dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664

And running it says:

[root@seed /]# dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
6.6.68-1-lts (linux-lts@archlinux) #1 SMP PREEMPT_DYNAMIC Fri, 27 Dec 2024 15:16:06 +0000222/3233127+0 records in
127+0 records out
127 bytes copied, 0.00274759 s, 46.2 kB/s

So what it means?

That mkinitcpio tries to figure out what is the actually running kernel's version by inspecting the binary /efiroot/EFI/kidpc/vmlinuz-linux-lts. And that binary for some reason is still pointing to an old kernel. At this point something has triggered in my brain: I do have my efi mounted under /efiroot which is a non-standard configuration, and I do have some hooks that I had to put in place.

Look at these files:

[root@seed /]# cat /etc/mkinitcpio.d/linux-lts.preset
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"

[[ -e /boot/vmlinuz-linux-lts ]] && cp -af /boot/vmlinuz-linux-lts "/efiroot/EFI/kidpc/"
[[ -e /boot/intel-ucode.img ]] && cp -af /boot/intel-ucode.img "/efiroot/EFI/kidpc/"

PRESETS=('kidpc')

kidpc_config="/etc/mkinitcpio-kidpc.conf"
kidpc_image="/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img"

Ok, I need to understand at which points is it being execcuted, so adding some debug.

[root@seed /]# cp /etc/mkinitcpio.d/linux-lts.preset /root/
[root@seed /]# vim /etc/mkinitcpio.d/linux-lts.preset
[root@seed /]# diff -u {/root,/etc/mkinitcpio.d}/linux-lts.preset
--- /root/linux-lts.preset  2025-01-19 07:31:57.112404242 +0100
+++ /etc/mkinitcpio.d/linux-lts.preset  2025-01-19 07:35:29.636922941 +0100
@@ -1,6 +1,8 @@
 ALL_config="/etc/mkinitcpio.conf"
 ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"

+echo "PRESET EXECUTED"
+ls -la /boot
 [[ -e /boot/vmlinuz-linux-lts ]] && cp -af /boot/vmlinuz-linux-lts "/efiroot/EFI/kidpc/"
 [[ -e /boot/intel-ucode.img ]] && cp -af /boot/intel-ucode.img "/efiroot/EFI/kidpc/"

Running pacman -S linux-lts:

...
(3/3) Updating linux initcpios...
PRESET EXECUTED
total 117112
drwxr-xr-x  2 root root     4096 Dec 29 15:32 .
drwxr-xr-x 18 root root     4096 Dec 29 15:32 ..
-rw-------  1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw-------  1 root root  9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r--  1 root root  8139776 Nov 12 18:20 intel-ucode.img
-rw-r--r--  1 root root 13070848 Dec 29 15:32 vmlinuz-linux-lts
PRESET EXECUTED
total 117112
drwxr-xr-x  2 root root     4096 Dec 29 15:32 .
drwxr-xr-x 18 root root     4096 Dec 29 15:32 ..
-rw-------  1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw-------  1 root root  9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r--  1 root root  8139776 Nov 12 18:20 intel-ucode.img
-rw-r--r--  1 root root 13070848 Dec 29 15:32 vmlinuz-linux-lts
==> Building image from preset: /etc/mkinitcpio.d/linux-lts.preset: 'kidpc'

So it is executed Twice, and we have a file there to start with. I shall remove that, and re-run it.

[root@seed /]# rm /boot/vmlinuz-linux-lts 

Now let's see what is happening

(3/3) Updating linux initcpios...
PRESET EXECUTED
total 104344
drwxr-xr-x  2 root root     4096 Jan 19 07:38 .
drwxr-xr-x 18 root root     4096 Dec 29 15:32 ..
-rw-------  1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw-------  1 root root  9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r--  1 root root  8139776 Nov 12 18:20 intel-ucode.img
PRESET EXECUTED
total 104344
drwxr-xr-x  2 root root     4096 Jan 19 07:38 .
drwxr-xr-x 18 root root     4096 Dec 29 15:32 ..
-rw-------  1 root root 89059316 Dec 29 15:32 initramfs-linux-lts-fallback.img
-rw-------  1 root root  9636706 Dec 29 15:32 initramfs-linux-lts.img
-rw-r--r--  1 root root  8139776 Nov 12 18:20 intel-ucode.img
==> Building image from preset: /etc/mkinitcpio.d/linux-lts.preset: 'kidpc'
==> Using configuration file: '/etc/mkinitcpio-kidpc.conf'
  -> -k /efiroot/EFI/kidpc/vmlinuz-linux-lts -c /etc/mkinitcpio-kidpc.conf -g /efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img
==> ERROR: kver will be called with /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ kver /efiroot/EFI/kidpc/vmlinuz-linux-lts
++ local kver 're=^[[:digit:]]+(\.[[:digit:]]+)+'
++ local arch
+++ uname -m
++ arch=x86_64
++ [[ x86_64 == @(i?86|x86_64) ]]
+++ kver_x86 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ local kver
+++ local -i offset
++++ od -An -j0x20E -dN2 /efiroot/EFI/kidpc/vmlinuz-linux-lts
+++ offset=' 17152'
+++ read -r kver _
++++ dd if=/efiroot/EFI/kidpc/vmlinuz-linux-lts bs=1 count=127 skip=17664
+++ printf %s 6.6.72-1-lts
++ kver=6.6.72-1-lts
++ [[ 6.6.72-1-lts =~ ^[[:digit:]]+(\.[[:digit:]]+)+ ]]
++ printf %s 6.6.72-1-lts
++ return
==> Starting build: '6.6.72-1-lts'
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [keyboard]
==> WARNING: Possibly missing firmware for module: 'xhci_pci'
  -> Running build hook: [keymap]
  -> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
  -> Running build hook: [modconf]
  -> Running build hook: [block]
  -> Running build hook: [filesystems]
  -> Running build hook: [fsck]
==> Generating module dependencies
==> Creating zstd-compressed initcpio image: '/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img'
  -> Early uncompressed CPIO image generation successful
==> Initcpio image generation successful

Looks like kernel knows where to put itself, and also initramfs, so I only need to keep the hook for the intel-ucode binary.

[root@seed /]# rm /boot/initramfs-linux-lts-fallback.img 
[root@seed /]# rm /boot/initramfs-linux-lts.img 

Here is the modified preset file /etc/mkinitcpio.d/linux-lts.preset:

ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"

[[ -e /boot/intel-ucode.img ]] && cp -af /boot/intel-ucode.img "/efiroot/EFI/kidpc/"

PRESETS=('kidpc')

kidpc_config="/etc/mkinitcpio-kidpc.conf"
kidpc_image="/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img"

Lessons learned - microcode

Add microcode hook - Please note that it comes before autodetect, so that it is always included, regardless of where this is running.

[root@seed /]# cat /etc/mkinitcpio-kidpc.conf 
MODULES=(virtio virtio_blk virtio_pci virtio_net)
BINARIES=()
FILES=()
HOOKS=(base udev microcode autodetect keyboard keymap consolefont modconf block filesystems fsck)

Get rid of copying in preset file

[root@seed /]# cat /etc/mkinitcpio.d/linux-lts.preset
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/efiroot/EFI/kidpc/vmlinuz-linux-lts"

PRESETS=('kidpc')

kidpc_config="/etc/mkinitcpio-kidpc.conf"
kidpc_image="/efiroot/EFI/kidpc/initramfs-linux-lts-kidpc.img"

Amend boot configuration so that it does not load microcode

[root@seed /]# cat /efiroot/loader/entries/kidpc.conf 
title   Arch kidpc
linux   /EFI/kidpc/vmlinuz-linux-lts
initrd  /EFI/kidpc/initramfs-linux-lts-kidpc.img
options root="UUID=aeabb877-72f2-4817-84c6-934206038906" rw