The internet is a volatile space. Stuff is being removed or cencored every minute of the day, servers crash all the time, and people stop paying for hosting. You might not notice it immediately, but information on the internet is not permanent. If only we could make a backup of the internet…
Well, it seems you can help! Do you have an underutilized internet connection? You can help out by archiving news stories, videos, social media posts and more. This way we can make sure as little as possible is lost. Archived material can be used as evidence, research material, documenting history and culture and a lot more.
In light of the recent events in Ukraine, you can also help. Footage and social media posts are getting censored heavily in this war and it is really important to make sure none of it gets lost. For evidence, for journalistic reason and to learn from it for the future. Follow along, because we are going to start archiving with Archiveteam’s Warrior application.
Archiveteam is not one person, but a loose collective of people dedicated to preserving online history. They have archived a lot of different things, from smaller websites to large websites and collections. Some notable examples are the YouTube dislike counts, Geocities (remember those?), public Google Drive and Mediafire files and a lot more. They do a lot to preserve the online world as we know it since 2009.
Today you could join this loose collective by installing their ‘Warrior’, which is an automated program that accepts work from a central server and archives it. The archived material is (most of the time) added to the Internet Archive, so it can be accessed using the Wayback Machine.
The official documentation has a
docker run command listed as a one-liner to start the warrior. I really don’t like this approach. It is not easily transferred to other servers and I like to be able to make small adjustments in a file if something needs to change. Another problem that is in my opinion a little bit bigger is that it does not store the configuration between updates. This means that you have to set it up again each time the container is updated; not ideal!
That’s why we are going to set up a good ol'
docker-compose.yml file. The configuration will be done using environment variables, since there is not a whole lot to set up. The Docker Compose file can also be downloaded from the Selfhosted Heaven Github page.
There are two containers in this Docker Compose file:
watchtower and the warrior itself. Watchtower is a container that can automatically containers to the newest release. This is needed for the warrior, because you need to have the newest version of the warrior to accept work from the central server.
The configuration is stored in the environment. I’ve set up my nickname for the leaderboard to
selfhostedheaven and I let the warrior automatically decide which project has the most urgency. At the moment for me, it is Reddit, but you could override this with one of the other projects available.
I have a total of 6 concurrent items, since it seems to be a good middleground for the capabilities of my server and internet connection. Lower this if you have little available storage space or little bandwidth available.
Now that everything is set up: start it up with my favorite command:
docker-compose up -d and you’re done! The warrior is now picking up work and helps archive the internet.
The warrior’s activity can be monitored using the webinterface. You can access it in your browser with
http://<ip_address_of_system>:8001. The interface is pretty simple to understand. There is really only one main screen where all the magic happens. On the main screen of the Archiveteam Warrior you can see which project is currently being worked on, what each worker is doing and how much data is already transferred from the project and to the tracker.
Currently, my warrior is working on archiving Reddit (do you already follow the selfhosted subreddit?), but you could choose one of the other projects they are working on.
Although you would not see your work directly, you can actually see how you are doing compared to others! Call it ‘gamification’, but it is pretty cool to see all the activity from the workers flying by on the screen in the leaderboard.
I know this is a little bit different then my usual blogposts, but I feel it’s important to get as many people help archiving the internet as possible. Especially since censorship is at an all-time high and we need to be able to see the truth. Archiving is a small step in the big picture, but I believe it really helps in the long term.
Will you help archive the internet?