The usual caveats apply: I have made efforts to give an accurate and hopefully helpful explanation of things. But use the following with caution and under your own responsibility! There is no warranty and I deny any responsability about the results of using any of the information below!

OVH proposes a variety of bare-metal server offers, differing in hardware, management tools and bandwith.

I needed in fact only a fixed IPv4 address, so selected one of the entry level offer and followed the helpful tips found here:
How to install NetBSD on OVH

Unfortunately, it didn't work: NetBSD never came up in user space so that I could be able to connect to it by ssh(1)—and it was not a network misconfiguration; if it had, I will have found the /var/log/messages reading the disk; but nothing has been written there.

So the question arisen: What was going on? meaning indeed: How do I get information about what the problem is?

I will not make it a suspense story: NetBSD can boot on it. What will follow is what information I was after and how I managed to get it. The summary sketches it:

  1. Installing NetBSD on an OVH bare-metal server
    1. The server hardware
    2. The web interface and the utils
    3. The questions and how to find the answers
    4. Installing Debian
    5. Installing NetBSD as a dual booting option
    6. Configuring GRUB2 to boot NetBSD
    7. Verifying the boot procedure in the VM
    8. Finding the answers
    9. What I learned
    10. What could be explored to ease things

Installing NetBSD on an OVH bare-metal server

The server hardware

The hardware consists of a quadcore Intel® Xeon® E3-1225 V2, 3.20 GHz on an Intel DH67BL motherboard, with 16 GB of memory, 3 x 2TB SATA disks and an Intel® PCI Express PRO/1000 ethernet card. But the bandwith is limited, in this entry level offer, to 100 MiB.

This is "aging" but functional hardware.

In the web management interface, it is said that there is a RAID card. This is not the case: there is "fake RAID" support in the motherboard but the system images proposed all use software RAID.

The BIOS does not support UEFI; it is legacy BIOS behaviour.

The web interface and the utils

From the web interface (user account), several actions are possible, and several options.

First, there is a first boot selection to be choosen from the web interface. To name only two, one can request to boot a rescue image, running all in memory; or one can choose to run from the hard disk. In the latter case, the first boot selection gives the hand to whatever booting code is on the hard disk.

If one has not access to the server by the network, the web interface allows to reboot the not communicating server.

Obviously, one must select first the boot behaviour before rebooting in order for the boot behaviour to be taken into account. And still obviously, once the boot selection is made, if you have access to the server, you can reboot it directly from a ssh session instead of relying on the web interface.

Several OS images are proposed for installation on the server. But, on this entry level offer, only Linux systems are proposed (no BSD). You can choose, too, the number of disks before making the installation. Choosing only one disk, the system is installed only using one disk. Selecting two, you will have software RAID1. Selecting three, you will have software RAID5.

There is also the possibility, by the boot selection, to load in memory a system image. The "rescue Pro" image is a complete Linux system, with the development tools (compiler etc.) and with qemu-system-x86_64 already installed. The rescue system running all in memory (with 16GB of memory, there is room...), one can install on whatever disk what is needed.

The questions and how to find the answers

NetBSD doesn't come up. So, at best, the kernel crashes or panics and reboots; at worst, the kernel is even not booted. Since the web interface shows what system has been installed (selecting an image from the interface), and since, at this moment there was none (I tried to install not a defined image but indirectly from the rescue system), the questions are:

  1. Does the first boot manager even not boot from hard disk if no system is registered as installed?
  2. If the first boot manager boots from the hard disk, does it bypass the MBR and indeed boots but from a defined, at install time, offset in the disk or a defined, still at install time, partition?
  3. If the NetBSD boot blocks are accessed, is the problem related to secure booting and a signature?
  4. If the booting process loads the kernel, does it crash or does it panic and reboot?

The answer to point 3 is very probably: no, since the BIOS is a legacy BIOS without UEFI, and secure boot is related to UEFI. Remain the other questions.

The two first questions can be addressed by simply installing from a provided image. Even if there was some kind of limit from the first boot manager, loading NetBSD from the system installed will bypass the problem. So remain the question about how to have at least a hint about what the problem with the NetBSD kernel is, since it doesn't reach a point where a network connection can be made, and there is no IPMI so one can not visualize the booting process.

In fact we need at least a very binary information: yes or no; at least a difference in behaviour that makes visible if it works or not.

We have this basic yes/no information since, with another system installed, there is a visible difference: with the installed system, connection can be made; with the NetBSD kernel, no connection. So what we need is to try to load NetBSD from the system installed and, setting a kind of breakpoint in the loading of NetBSD, make it so that if this point is reached, the NetBSD process goes back to the system. Having back the connection would mean that we have successfully reached this breakpoint and that, "till then", it is OK.

This can be achieved using GRUB2. NetBSD boot blocks can be chainloaded from the GRUB2 initial menu. Furthermore, GRUB2 allows to instruct to boot some entry but just once, the next one defaulting to the default entry. After that, in the NetBSD boot process, we can write a /boot.cfg with several entries, and set the default progressively to the step we want to test, starting with a simple quit that will tell us if the NetBSD boot loader is loaded and reads its config.

If necessary, a NetBSD kernel can then be compiled, inserting a cpu_reboot() instruction in the booting process in src/sys/kern/init_main.c until finding the point where the problem lies. (In fact, as we will see, the problem was simpler and I had not to eventually resort to this; but this was the initial plan.)

Installing Debian

I chose to install an image with a Debian Bullseye system. Since I wanted to have room, I selected installation on one disk only. All is done via the web interface and once the installation is done, you get a mail giving instructions for the first connection.

Installing NetBSD as a dual booting option

So NetBSD will be installed supplementary to Debian.

The question is: where to put it? Well, in order to test everything and to be able to bypass an unfelicity about the booting process, I chose to install in fact NetBSD twice: a first installation in the same disk (I have choosen a one disk only, no RAID, Debian installation) and a second installation in an empty disk (there are three 2 TB SATA disks).

We want to resize the root system partition so it has to be done without this root partition mounted. No problem since in every case, the installation will be done by chosing as a boot option the rescue pro image, running all in memory, and having both qemu and the development tools (compiler, linker and libraries).

So the remaining of the processing will be done with the rescue pro system running.

First, making room in the first disk, where Debian is installed. Since the Debian is a recent version (Bullseye), the rescue disk has not compatible e2fsprogs utilities. Just download the source somewhere; unzip; cd in the directory; configure; make and you're done. Then (the utilities have to be prefixed with the path where they just have been compiled), the Debian system partition being the first one on sda:

e2fsck/e2fsck -f /dev/sda1 resize/resize2fs /dev/sda1 20G # note the number of sectors

Above, we have resized the filesystem by itself but we need to adjust the partition table in the MBR. You can do this using parted and using the number of sectors now used by the file system. Something like this will do (consult the man page):

parted select /dev/sda print resizepart 1

Then we will install NetBSD. This is done by downloading first an install image of NetBSD, and running the install under qemu-system-x86_64. The first installation will be done sharing the first disk with Debian:

wget https://cdn.netbsd.org/pub/NetBSD/NetBSD-9.3/images/NetBSD-9.3-amd64-install.img.gz gzip -d NetBSD-9.3-amd64-install.img.gz qemu-system-x86_64\ -enable-kvm\ -net nic -net user\ -m 1G\ -k fr\ -hda NetBSD-9.3-amd64-install.img\ -hdb /dev/sda\ -vnc :0

In the qemu invocation, the -m specifies the amount of memory to use (we have 16GB on this server, so 1G still leaves plenty of room), the NetBSD image is raw and will be interpreted as is and the first disk is booted from. The -net options allow to access the Internet from the VM if needed.

Here, because I'm french, I'm using an AZERTY keyboard, so I selected -k fr for the keyboard. Other have to adapt to whatever they use and whatever qemu accepts for the option.

The last option is essential: with it, one can connect remotely to server_ip:5900 with a vncviewer and proceed with the installation.

So for this first installation, I installed on the same disk as Debian, I didn't replace the MBR booting code (GRUB2) but installed the boot blocks on the NetBSD partition, configure it via the configuration menu of the NetBSD install (at least validate sshd and dhcpcd, add a user and set passwords) and then went to the utilities and the command line to make the following adjustments:

Added ip6mode="autohost" to /etc/rc.conf (you must have selected dhcpcd=YES and sshd=YES via the configuration menu.

Changed in /etc/fstab all the /dev/wd0* with the lingo ROOT.* so that it will work whatever order the bus enumeration and/or the boot information give for the disk.

Added a /boot.cfg:

menu=Boot normally:rndseed /var/db/entropy-file;boot menu=Reboot:quit menu=Booting without SMP:boot -1 menu=Booting without ACPI:boot -2 default=1 timeout=5 clear=1

In the /boot.cfg above, I set a timeout so that when testing, I have a chance to select another entry than the default. For the moment, since I wanted to validate the procedure step by step, this would do (it will be changed after).

I then stopped the VM.

I made then a second installation, but on another disk (it was supposed to be /dev/sdb but a curious renumbering appeared and the installation ended in /dev/sdc—???). The procedure is the very same but I substituted the target empty disk to /dev/sda and, this time, installed the NetBSD MBR boot code. Except for that, the procedure is the same.

Configuring GRUB2 to boot NetBSD

We can stay in rescue mode to configure GRUB2 since, except for the numbering of the disks, the boot procedure proper (I don't speak about the kernel behaviour: in the VM it is an emulated basic machine) should be the same (if the first boot selector indeed passes the hand to the boot code in the disk MBR).

So we can once more invoke qemu with /dev/sda as first disk, /dev/sdb as second disk (nothing was installed in my case) and /dev/sdc as third disk. This will allow to verify that our GRUB2 configuration is theoretically correct before trying on real stuff.

The configuration of GRUB2 is not done by writing directly the config file but by completing stubs that an utility combines to create the real config file. The GRUB2 and/or Debian naming of things is not very consistent, so relying on the name of the utilities or of the variables is misleading.

So, back in the VM, we boot normally in Debian. Instead of sudo 'ing again and again, switch to login root su - root (the path to the admin binaries is not by default in the PATH of a normal user).

We want to be able to test various options, so, in order to be able to do so also in the VM, we will instruct GRUB2 to wait to allow us to display the menu (this for example simplifies finding the ordinal of the entry we want to boot). So we first make this addition to the file /etc/default/grub:

GRUB_HIDDEN_TIMEOUT=5 GRUB_TIMEOUT=5

Then we add the entries for our two NetBSD installations: one on the very same disk as Debian, the other on another disk (the installation ended in /dev/sdc in my case; so you may have to adapt). This is done in stubs found in /etc/grub.d/. In our case, since we want to add this supplementary, we will put our additions in /etc/grub.d/40_custom:

#!/bin/sh exec tail -n +3 $0 # This file provides an easy way to add custom menu entries. Simply type the # menu entries you want to add after this comment. Be careful not to change # the 'exec tail' line above. set fallback=0 # One can use grub2-reboot entry_number to boot once on a test kernel/OS. menuentry 'NetBSD wd0' –id 'netbsd0' { insmod part_msdos insmod chain chainloader (hd0,msdos2)+1 } menuentry 'NetBSD wd2' –id 'netbsd1' { insmod part_msdos insmod chain set root=(hd2) drivemap -s hd0 hd2 chainloader +1 } menuentry 'NetBSD wd2 part' –id 'netbsd2' { insmod part_msdos insmod chain set root=(hd2,msdos1) drivemap -s hd0 hd2 chainloader +1 }

In the chunk of code above, we chainload a legacy BIOS boot block, that is GRUB2 will simply load the block in the defined boot address and jump to its entry point. In the first case, on the disk shared, since the root disk is already set (GRUB2 is on it), we simply chainload the very first block of the NetBSD partition (the second partition of the MBR, identified by msdos2: it is the first block of the NetBSD bootloader). In the second case, we give the hand to the MBR bootloader of the disk (installed by NetBSD) that will in turn, since the NetBSD partition is flagged as active in the partition table, give hand to the NetBSD proper boot blocks. But we need to map the devices since, if not, the kernel will be loaded from the second disk but it will take the root partition on the first disk! The same applies to the third entry that simply loads directly the first boot block on the NetBSD second disk partition.

Once this is done, we let the update-grub utility construct the real config file /boot/grub/grub.cfg:

update-grub

The utility weaves the chunks but also adds entries by searching for "other" systems (but only Linux ones; NetBSD is not recognized). The problem is then to know what are the ordinals of our NetBSD entries. The ordinals of GRUB2 menu entries start at 0 (while the ordinals for the partitions in GRUB2 start at 1). So one can find the number whether by counting the lines, in /boot/grub/grub.cfg, starting without indentation by menu or submenu; or reboot in the VM (since we can see the booting procedure on the emulated machine with VNC) and display the GRUB menu by pressing the ESC key during the counting down, and counting the entries (remember: numbering starting at 0).

Verifying the boot procedure in the VM

We are still with the rescue system and we are still in qemu with the three disks accessed in the VM.

Since, except for the device numbering that may differ between the OSes, the boot process should be the same in the VM as on legacy BIOS bare-metal hardware—once more, not taking in effect the kernels: what the kernel sees in the VM is not the real hardware but an emulated generic machine.

So rebooting inside the VM, one can test if the GRUB menu gives indeed access to NetBSD. This was the case.

There is one thing that was worth verifying in the VM: if using the quit command in the NetBSD bootloader will indeed restart from beginning, allowing us to return to GRUB2. It was worth trying because, with the mapping of the drives when chainloading the other disk, it does not work: quit reboots from the third disk MBR not the first one. Only in the case where NetBSD was on the same disk as Debian, quit'ting in the NetBSD boot loader will give the hand back to GRUB2 via the first disk MBR.

Finding the answers

All was in place and the last piece is using GRUB2 boot once feature: instructing GRUB2 to boot this menu entry but just this time, returning to the default entry on the following booting process—because we needed a simple YES/NO feature: it works, it doesn't work, working meaning rebooting in Debian and then having network access; not working meaning the kernel crashed and we were not back to Debian.

Booting once is done simply, as root (perhaps sudo'ing), by using:

grub-reboot menu_entry_number

What GRUB2 does is simply to write in a boot writable file: /boot/grub/grubenv a definition of the variable next_entry with the entry to boot. If it is set, before booting this entry, GRUB2 unsets (by writing this very file) the variable so that, the next time, the default entry is used.

Since (see above) there would be a problem with testing the NetBSD boot command quit with the other disk, the entry that will be tried once was always booting from the NetBSD partition in the first disk, and it was entry number 2.

So, before shutting down the VM, under Debian we simply:

grub-reboot 2

One can verify that indeed /boot/grub/grubenv has next_entry=2.

And after having shut down the VM in the rescue system, before leaving the rescue system, in the web interface we reset the boot selector to "hard disk" in order for GRUB2 to be called. Once done, we reboot from the rescue.

We will now have the answers:

The first reboot leads GRUB2 to chainload NetBSD from the same disk partition. The boot.cfg boots the first entry (normal boot). Result: no connection. So we reboot from the web interface, and we are back in Debian. What question does it answer? By consulting /boot/grub/grubenv we can see that next_entry has been unset: so yes, it is a normal booting, GRUB is called.

Then we will try the second entry in the NetBSD /boot.cfg. Since it is not possible to mount read/write a FFS2 partition, we need to go back to the rescue disk, to use qemu once more, just with the first hard disk, to go under BSD and change in /boot.cfg the default to 2 in order to verify that the NetBSD boot loader is indeed called. Rebooting in the VM, we go back to Debian and once more grub-reboot 2 and shut down the VM. Resetting booting from hard disk via the web interface, we then reboot from the rescue disk.

GRUB2 is then once more called, chainloading once more the NetBSD boot blocks hopefully. What is the result? Connection to Debian. Why? Because by quit'ting via the NetBSD /boot.cfg (default set to 2), GRUB2 was called back and then defaulting to Linux/Debian. This answers the next question: yes, the NetBSD boot blocks are called, since there is a difference in behaviour and the only thing we changed is the /boot.cfg file; so it is read.

Once more returning in the rescue system, we add this to the /boot.cfg file:

userconf=disable i915drmkms*

and we change too the default to 3, in order to boot without SMP and without the KMS. The remaining of the procedure is the same (returning in Debian to set boot once to NetBSD partition in the first disk).

Result: SUCCESS! We are now under NetBSD. It was then simple to verify if one of the options was the culprit. And the culprit was the KMS (framebuffer). So disabling the framebuffer (KMS) with userconf=disable i915drmkms* allows a NetBSD kernel to boot.

What I learned

In fact, it is always a good idea when we know that some system is working on an unknown hardware to start by running this system. Simply because, under Linux, we then have the dmesg and, from this, we can see if there is a chance that the hardware is supported.

Furthermore, the clue was indeed in the Linux dmesg: it has the command line of its call. And it said:

[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-14-amd64 root=UUID=eea6d0a4-03b6-44e6-8588-ff6c4eba2095 ro nomodeset iommu=pt

The clue was nomodeset. KMS means Kernel Mode Setting; nomodeset, with kernel implied means no KMS. Since, for this, a lot of code is shared with Linux or derived from Linux, if even Linux does not handle it, it is probably a very safe guess that NetBSD will not.

The second lesson is that it is always a safety measure to set a non zero timeout in the bootloader—whatever it is: GRUB2 or NetBSD boot loader. Because if one has to try various things, and using a VM is one way to test, a zero timeout will paint you in a corner: the default entry doesn't boot and there is no time to change it.

What could be explored to ease things

During the tests, they were several things that were helpful or could have been helpful and that whether do not exist with NetBSD or do not exist to my knowledge even with other kernels and that could be handy. From firmware to kernel via boot:

©2022–2024 Thierry Laronde — FRANCE