On I wrote about a better, simpler way of doing this. Check it out at Third party files on a custom Debian Live installer.
Following with my work on a custom Debian distribution, there was a third-party package I wanted to include in it. Unfortunately, this was proprietary software, it wasn’t included in Debian non-free, and it was a pretty large download (about 1GB).
Debian Live doesn’t seem to have a way to handle a case like this, or at least one that doesn’t involve significant drawbacks.
Initial approach
A first approach would be to download the package and store it with the build scripts, handy to be installed during the build process. The problem with this is that I keep my build scripts under source control in GitHub. Adding a 1GB file to a git repository is generally not a good idea, less so when that repo weighs under 1MB otherwise (this is including all commits in its history). There’s also the matter of whether the software license would allow uploading it to GitHub or a similar service, which legally could constitute unauthorised distribution.
What would be better is to only keep a reference (a URL) to this package, downloading it during the build process. Here the problem is that the package will have to be downloaded every time we run a build; a large download that significantly slows down the process (already slow) and makes it more cumbersome to iterate quickly.
Debian Live keeps a cache
directory where it keeps the Debian packages that it
downloads during a build. It should be possible to download our large package
there, and avoid re-downloading if it exists already… except that the cache
directory is not accessible from the chroot jail where custom scripts are run.
It’s from one of these scripts that I would run downloaded file (if it is an
installer) or otherwise put it in the correct location, so access from these is
necessary. Well, bummer.
I also tried keeping a separate cache directory next to where the chroot hook
scripts are kept (that would be config/hooks/cache
), but those scripts don’t
appear to be run off that location, and again that file hierarchy doesn’t appear
to be accessible.
My solution
At the end I went for something a bit more involved. I altered the build scripts to add the following:
- At the start of the build, download the large file to
cache
, unless it already exists there - From
cache
, run a simple http server that can serve the large file. For example, you can use Python’s built-inSimpleHTTPServer
) - Let the build run as normal
- On a chroot hook, use
wget
orcurl
to download the large file from the local http server into the chroot jail - Run any additional setup steps for the large file
In other words, although the chroot jail doesn’t allow us to copy files from the external filesystem, it doesn’t stop us from accessing files that are served over HTTP from that external filesystem.
The scripts
This what my build script looked like after adding this feature:
1#!/usr/bin/env sh 2 3set -e 4 5BASE_DIR=$(dirname "$0")/.. 6SCRIPTS_DIR="$BASE_DIR/scripts" 7CACHE_DIR="$BASE_DIR/cache" 8CONFIG_DIR="$BASE_DIR/config" 9DOWNLOADS_DIR="$CACHE_DIR/downloads" 10LOCAL_FILES_DIR="$CONFIG_DIR/files" 11PID_FILE_PATH=$(mktemp) 12mkdir -p "$DOWNLOADS_DIR" 13 14DOWNLOADER_PATH="$SCRIPTS_DIR/downloader" 15SERVER_PATH="$SCRIPTS_DIR/server" 16 17DOWNLOADS_LIST_PATH="$CONFIG_DIR/downloads.list" 18 19"$DOWNLOADER_PATH" -d "$DOWNLOADS_DIR" -l "$LOCAL_FILES_DIR" "$DOWNLOADS_LIST_PATH" 20"$SERVER_PATH" -P "$PID_FILE_PATH" "$DOWNLOADS_DIR" 21 22lb build noauto "${@}" 2>&1 | tee build.log 23 24PID=$(cat "$PID_FILE_PATH") 25kill "$PID"
For the final implementation, I divided the files I wanted to copy into two categories: large ones I needed to download from the Internet, and smaller ones that I was ok with adding to the repository. On this second group there can be file checksums, customised config files, and probably other things.
There is a “downloader” script, referenced as $DOWNLOADER_PATH
that reads a
list of files ($DOWNLOADS_LIST_PATH
). For each entry, if it appears to look
like a URL, a file is downloaded from the Internet. Other entries are expected
to be files residing at $LOCAL_FILES_DIR
. All files are copied to
$DOWNLOADS_DIR
, renamed as per indicated in each entry of the list.
A server script ($SERVER_PATH
) is then run, and we take note of its pid. This
way, we can cleanly shut it down after the build is done.
This is the downloader script:
1#!/usr/bin/env sh 2 3while getopts ':d:l:' opt; do 4 case $opt in 5 d) 6 DOWNLOADS_DIR="$OPTARG" 7 ;; 8 l) 9 LOCAL_FILES_DIR="$OPTARG" 10 ;; 11 esac 12done 13 14shift $((OPTIND-1)) 15DOWNLOADS_LIST_PATH="$*" 16 17cat "$DOWNLOADS_LIST_PATH" | while read -r NAME URL; do 18 DEST_PATH="$DOWNLOADS_DIR/$NAME" 19 if echo $URL | grep -qE '^https?://' ; then 20 wget -c --output-document "$DEST_PATH" "$URL" 21 else 22 cp "$LOCAL_FILES_DIR/$URL" "$DEST_PATH" 23 fi 24done
As mentioned before, it’s fed a list of files ($DOWNLOADS_LIST_PATH
) that may
refer to URLs or local files (under $LOCAL_FILES_DIR
). Whatever is downloaded,
it will be copied to $DOWNLOADS_DIR
. Using wget’s -c
option, we ensure that
failed downloads are continued, and existing files are not re-downloaded.
This is an example of a file list:
For each entry, the first word before the space is the name that the downloaded/copied file will receive at the cache directory. The second word is either the URL to retrieve it from, or the name of the file in the local filesystem.
This script runs the http server:
Nothing much to see on it, apart from the option used to specify a pid file. This will be used to shut down the server cleanly at the end.
Finally there’s the chroot hook that will retrieve the files “over the fence” of the chroot jail and actually install them. It will be something like this:
1#!/bin/sh 2 3set -e 4 5WORKING_DIR=$(mktemp -d /tmp/tmp.XXXXXXXXXXXXX) 6 7cd "$WORKING_DIR" 8wget http://localhost:12345/big-file 9wget http://localhost:12345/small-file 10 11# Do something with the downloaded big-file and small-file 12# ... 13 14# Finally we clean after ourselves 15rm -rf "$WORKING_DIR"