Self-hosting your code
In mid-2021 I stopped using GitHub for personal projects and started hosting my own code using Gitea. I had been playing around with home servers for several years beforehand, hosting websites, game servers, Vaultwarden and much more. In this article I will describe my journey from cloud-hosted to self-hosted code.
Reasoning behind self-hosting
When GitHub, GitLab, DevOps and other great, free to use or cheap services exist, why go out of your way to self-host? It is far from hassle-free and one loses out on some great tools and features. Objectivity aside, self-hosting is really fun, and I would be lying if personal pleasure was not the foremost reason for my moving. Add to the fact that one gains full control of ones own data and no more reasons are necessary.
Requirements
Before getting started it's good to know what you want and need. I came up with this list of requirements for the git and CI/CD services that I wanted to host:
- Low resource usage (electricity is crazy expensive, thanks commies)
- Just enough features for a lone programmer, I hate bloat
- Recognizable interface for ease of use for potential associates browsing my projects
- Basic CI/CD for testing and deploying
- Easy to backup
The stack
Queue Gitea; the painless self-hosted git service. The community driven project was forked from Gogs back in 2016 and has since evolved into it's own thing. Trying out Gitea was as easy as downloading the binary and running it as a user named git
. The first impressions were superb and beyond my expectations and thus Gitea was gonna be the git server for me.
At that point, the only thing missing were pipelines, as CI/CD is something which Gitea does not offer itself. But webhooks and a good API opens up the door for external building tools. Once again there are a lot of great options, like Jenkins, CircleCI and more. It's also quite common for online services to allow self-hosted runners/agents, but I (obviously) wanted to host the whole package myself. After some searching I found Drone, a fully self-hosted CI service with builtin Gitea integration. Shortly after finding out about Drone going non-libre (and sadly after I had already wasted time configuring Drone), I discovered Woodpecker, a fork of Drone from before Drone's license change.
Even with just under a thousand stars on GitHub when I set it up, I am very impressed with Woodpecker so far. It offers no fancy syntax or advanced conditional statements, it "simply" runs your listed commands one after another (or concurrently if specified) with every step of your pipeline running in it's own docker container with an image of your choosing. This enables you to do anything you would want to do, and it's certainly enough for a simple lad living in his mom's basement.
Why not GitLab
The most popular self-hosted git server is probably GitLab. The community version offers a huge amount of features along with built in CI/CD. For me though, the massive extra performance cost, for features that I will never use, rules it out.
Backups
A good backup solution is the number one priority when hosting something important. A lot can go wrong at home that result in all of your data being lost, e.g. drive failure, lightning strike/power surge, house burning down, theft, ransomware or, most commonly, sysadmin fuckup.
As with any hobby project there is a fine balance between time, money and quality of outcome. I am working on some changes to my setup but overall I am quite happy with what I've got going for backing up my shit:
- Not ever using the cloud, I'm self-hosting all the way
- Using IronWolf drives that are designed for 24/7 use in servers
- Having drives setup in raid0
- Off-site backups (500 m to 165 km)
- Infrastructure as code for service persistence and validation
- Regular restoring of backups to ensure they work
- Sanity check that alerts me when backups aren't working
At this moment I am simply using SSH to transfer a GPG encrypted tarball of my code to my backup servers (all of which are fully mine, fully encrypted and on networks that I control, for the record). This is done several times per day as long as the files have changed compared to the last sent tarball. The downside of this is 1) a lot of tarballs on your backup servers and 2) not instant and thus may lead to loss of the most recent data. Backing up my Gitea data currently costs around 1 gig of extra storage per day on the backups servers, which means that I can go several years without cleaning up before they are full. No problem. The second problem is more of an annoyance than an issue as I would most likely still have the code on my local machine in case of failure and would just have to push it again once Gitea is restored.
I also use Syncthing for backing up family photos and videos, documents, game data, other personal stuff, etc. Eventually I may use it for code as well, but that will come at a time when my Gitea data is way too large to keep going with how thing's are right now.
Migrating from cloud services to Gitea
I really like statistics and wanted (needed) to properly migrate everything from my previous git providers. Gitea have great support for migrating from other services, including issues, PRs, releases, wikis and more. However, they did not offer support for migrating "contribution" statistics (the number seen on a user profile, often as "activity" or "contributions") and so I made a script to estimate those numbers after the fact.
There was also a problem with git user and email since I used different user settings on different providers. For that I used a wanky script (didn't save it anywhere, check stackoverflow?) in all repositories to rewrite all commits with the correct author information.