Netboot

Fri, Apr 1, 2005

I recently bought an EPIA motherboard (a VIA Nehemiah-based M10000) to play with making a media station. One of the requirements for this piece of kit is that it should be quiet, so I wanted to have a little in the machine itself as possible. Having a big and ugly server in the back room, I decided that I'd try to get the little board booting over the ethernet. This page documents my struggle.

The set-up

My (internal) network setup is pretty simple. Everything is in the carfax.org.uk domain. I've got one main machine, called vlad, which acts as pretty much everything at the moment: X Windows workstation, file server, testbed web server, database server... The new machine is going to be called miyu, and, just for reference, the firewall/router/gateway that connects everything to the outside world is called spike.

Machine	Purpose	IP address
vlad	Server	10.0.0.5
miyu	Client	10.0.0.3
spike	Gateway/Router	10.0.0.1

How it works

The network boot process goes something like this:

The client broadcasts a DHCP request to find out what its TCP/IP configuration should be.
- The server responds, giving the client an IP address, netmask, default gateway and "next" server (for the rest of the boot process).
The client downloads a second-stage boot-loader from the "next" server.
The client runs the second-stage boot-loader.
The second-stage boot-loader downloads a configuration file, which contains the name of the kernel file, and the boot options to use.
The boot-loader downloads the kernel, and runs it.
The kernel connects to an NFS server and uses a mount from the server as its root filesystem, booting as normal.

DHCP

The first thing to set up is the DHCP server on vlad. This allows the boot ROM on miyu to find out what IP address it should be using. I use the reference implementation of the v3 DHCP server from the Internet Software Consortium, packaged by Debian as dhcp3-server.

The configuration of the DHCP server looks like this:

# /etc/dhcp3/dhcpd.conf
allow booting;
allow bootp;

option domain-name "carfax.org.uk";
option domain-name-servers 10.0.0.5;
subnet 10.0.0.0 netmask 255.255.255.0 {
    option subnet-mask 255.255.255.0;
    option routers 10.0.0.1;
    default-lease-time 600;
    max-lease-time 7200;
}

option space PXE;
option PXE.mtftp-ip code 1 = ip-address;
option PXE.mtftp-cport code 2 = unsigned integer 16;
option PXE.mtftp-sport code 3 = unsigned integer 16;
option PXE.mtftp-tmout code 4 = unsigned integer 8;
option PXE.mtftp-delay code 5 = unsigned integer 8;

group {
    option vendor-class-identifier "PXEClient";

    next-server 10.0.0.5;
    filename "/tftpboot/pxelinux.0";
    option PXE.mtftp-ip 0.0.0.0;
    vendor-option-space PXE;

    host miyu {
        hardware ethernet 00:40:63:cb:f5:b5;
        fixed-address 10.0.0.3;
    }
}

For the M10000 boards, it would appear that the option space PXE; line and the five following lines are required in order for the boot process to work. They define an additional set of options that can be passed to the client when it asks for DHCP information. These options allow the use of MTFTP (a multicast version of TFTP). We won't be using MTFTP, but we need the options anyway – particularly the option PXE.mtftp-ip 0.0.0.0; line, which tells the boot ROM to use ordinary TFTP, not MTFTP.

You will also need to replace the hardware ethernet address (MAC) of the client machine in the configuration file. The MAC address for miyu is 00:40:63:cb:f5:b5, and was written on a piece of paper stuck to the back of the ethernet socket on the motherboard. It was also printed on the screen when the machine was switched on.

The other thing that I needed (I found out later) was a DHCP relay running on port 4011. The ISC DHCP server has a DHCP relay, available in Debian as dhcp3-relay. Configure this to run on port 4011, and to relay DHCP requests to the main DHCP server on the same machine.

I tested the system at this point by plugging miyu into the network and switching it on. The M10000 boards print out their DHCP configuration when they've found it. I also found that knowing how to use ethereal is pretty helpful in working out where the problems are.

TFTPD

TFTP is the Trivial File Transfer Protocol, and is used in most of the steps of the boot process to transfer files from the server to the client. Setting up the TFTP server is pretty simple. I use H. Peter Anvin's tftpd-hpa, which has been tweaked to work well with network booting.

For boot small numbers of machines, tftpd-hpa is probably best set up under inetd. Add the following line to your /etc/inetd.conf:

tftp    dgram   udp     wait    root    /usr/sbin/in.tftpd in.tftpd -s /usr/local/share/netboot/tftpd -r blksize

Most distributions will probably install the line automatically when you install the tftpd-hpa package. Note that I put the root of my tftpd server in /usr/local/share/netboot/tftpd. You can put yours anywhere you like.

You will need several sections under the tftpd tree. My tree looks something like this:

/usr/local/share/netboot/tftpd/
`-- tftpboot
    |-- miyu-kernel -> vmlinuz-2.4.21-rc7-ac1
    |-- pxelinux.0
    |-- pxelinux.cfg
    |   `-- 0A000003
    `-- vmlinuz-2.4.21-rc7-ac1

We'll deal with each bit of the tree separately below, but it's good to know what it's going to look like in the end.

PXELINUX – a second-stage boot-loader

Now that we've got the two "infrastructure" bits set up (DHCP and TFTP), we can work on actually getting the client machine to do things.

The first thing to set up is the second-stage boot-loader. I use PXELINUX from H. Peter Anvin. The web page above gives some useful hints on setting up and debugging a network boot configuration, although I found that the main DHCP configuration that HPA provides doesn't seem to work with the M10000.

To set up PXELINUX, put the pxelinux.0 file (the boot-loader executable) in the /tftpboot directory in your TFTP server directory. PXELINUX also needs a configuration file. The config files are kept in the pxelinux.cfg directory, and are named by the IP address (in hex) of the machine that they are intended for. So, since I've configured miyu to be 10.0.0.3, the configuration file should be 0A000003 (10.0.0.3 in hex). The letters should be capitals, not lower case – that one caught me for a few moments as well.

You can create configuration files for whole classes of machines as well, by truncating the address. So, if you've got a subnet of machines on 10.0.0.*, you can configure all of them at once by creating a configuration file called 0A0000.

My configuration file is pretty simple:

# PXELinux config file for miyu.carfax.org.uk (10.0.0.3)

default miyu

label miyu
  kernel miyu-kernel
  append root=/dev/nfs nfsroot=10.0.0.5:/usr/local/share/netboot/miyu

Don't worry about the append=... line for now. I'll cover that below. All it does it tell the kernel where to get its root filesystem from.

The kernel

Of course, if you want to run Linux, you will need a kernel. The most important bit to have in your kernel, for this setup, is support for an NFS root filesystem. You will find this in the kernel configuration as:

File Systems --->
   Network File Systems --->
      <*> NFS file system support
      [ ]   Provide NFSv3 client support
      [ ]   Allow direct I/O on NFS files (EXPERIMENTAL)
      [*]   Root file system on NFS

If you have an M10000 board, you could also use the actual .config file I used to get started. This leaves out quite a lot of support for all sorts of things, but it's a good starting point, as it's dead simple. This config file is for the 2.4.21-rc7-ac1 kernel, but you should be able to use it with more or less any 2.4-series kernel, provided you do a make oldconfig after you copy it to .config.

Configure your kernel and build it. Then copy it to /tftpboot/miyu-kernel, or whatever you want to call it. This is the name in the kernel ... line in your PXELINUX configuration file.

I've configured my system so that the config file points to a symlink, so I can keep multiple kernel versions if I want to: see the tree, above, to see the exact setup.

You should now be able to test your configuration to this point. Start up the client machine, and you should see PXELINUX start up, load a kernel, and run it. The kernel will initialise its device drivers, and then come to a screeching halt because it can't find its root filesystem. This is what we work on next.

NFS

The next thing to do is to set up an NFS server to host the root filesystem of miyu. This, like TFTP, requires a directory tree in which to set up the filesystem. I've put mine in the /usr/local/share/netboot/miyu/ directory, right next to the TFTP tree.

I use the user-space NFS server shipped with Debian (as nfs-user-server). Basically, I installed the package, and configured /etc/exports to look like this:

# /etc/exports: the access control list for filesystems which may be exported
#       to NFS clients.  See exports(5).
/usr/local/share/netboot/miyu   miyu(rw)

This exports the directory for miyu's root filesystem to miyu only, in a writable state.

Root filesystem

Now, we need some contents for the root filesystem. I tried it with Debian first, which is pretty easy, but caused me some problems with the network boot, and it's all too complicated for what I wanted the machine for anyway. I've left my incomplete notes on installing Debian anyway, if you want to see them.

After my abortive attempt with Debian, I installed Core Linux, which is a very nice minimal distribution with just enough in it to bootstrap whatever you want. There are excellent instructions on how to do things in Core Linux from Tony Whitmore. I used his installation instructions for guidance, and hacked it around a bit until it worked.

A lot of the Core Linux install guide is unnecessary to use, since we've got a working Linux machine to work from anyway, and a working kernel. So, here's what I did. First, mount the Core Linux CD image on the server with a loopback mount:

mount core_iur_disk-1.iso /cdrom -o loop

Now, the tools on the Core CD expect the CD to be mounted as the root filesystem, which in this case it isn't, so you need to… err… fiddle things a little. In this case, we basically rework the Core install script.

export PATH=$PATH:/cdrom/bin:/cdrom/sbin
install_core /usr/local/share/netboot/miyu

(lots of errors ensue)

cd /usr/local/share/netboot/miyu; for f in /cdrom/pkgs/core_*; do corepk
g -i $f /usr/local/share/netboot/miyu; done

This installs the base Core Linux distribution in the right place. You can then continue reading Tony's guide from Configure boot settings. I just ignored the bits about backing up and about lilo, since

we don't use lilo,
all of the config files are already on a machine which boots properly, so we can fix any problems on the server, and
I'm a dangerous lunatic who doesn't back up often enough.

Finally, the /etc/fstab I used was this:

vlad:/usr/local/share/netboot/miyu  /   nfs defaults,rw,hard,intr 0 0
vlad:/home/hrm          /home/hrm   nfs defaults,rw,hard,intr 0 0

none      /proc proc defaults 0 0

where /home/hrm is my own home directory. If you've got multiple users, you could mount /home on /home, or any other scheme you find useful.

At this point, you should have a basic, working, Core Linux installation booting cleanly on your remote computer.

Good luck, and happy installation!