Why SeaweedFS?

I got an email last week asking if I could explain why I set up a Seaweed server. It's been a while since I talked about it, mostly in hints in this post from 2022 and this post from 2024. The request gives me an opportunity to expand on it, and to remind myself why I do it.

Stories

In some regards, there is a bit of trauma in me when it comes to losing out on things. This includes hearing stories that my father told me and realizing that when he died, they were gone forever. These same thoughts came up in my novel Sand and Bone:

The little girl's confessions echoed in Rutejìmo's head in a quiet symphony as he wrote in his book. The hand-bound collections of pages creaked under his hand, the leather thong strained to hold the almost fifty pages of tightly-spaced writing. Over the years, he had added a dozen pages to the collection. It wouldn't be too long before the binding couldn't handle the additional pages, but he thought he had a few more years left before that happened.

Even with his additional pages, he didn't have room to write down all of the stories he had heard over the years. He wanted to detail the joys of the little girl's death, such as the choked story about how she had stolen her brother's toy when he wasn't looking. He had also wanted to write the horrors, like one man's confession for killing his sister. Each one was precious and important. Time would erase their stories and a part of Rutejìmo died every time he forgot one.

Sand and Bone 8: Alone

This fear is more than just one person, losing stories of things that happened decades ago as told by an old man. This includes my own stories, which will also be lost when I die. There is little chance I will be “done” with Fedran. There is little chance I will get all of the stories out of my head and onto paper. There isn't enough time, I don't have the self-esteem to do it, and I don't have the drive. When I'm gone, so are those stories. But, even going beyond that, there are so many stories of horror and joy that I feel we are losing with the passage of time. Happy stories of people falling in love, horror stories of Auschwitz, and everything in between. It is that breadth of stories, lessons, and happiness that we lose.

But the big question is, will anyone care? Does anyone want to know the first time I got caught shoplifting. Or the first time I finally said “I love you” to Partner?

Probably not.

But, you never know.

I'm never going to record them, but when I was given my father's artwork and decades of effort, I don't want to lose them yet. I don't want to just chuck it aside. So, I need to keep them.

Hoarding

I come from a long line of hoarders. I realized that as I look at my DVD collection or the boxes of books I have. My LEGO collection isn't huge, but it does fill five or six boxes. I once got rid of it, and I felt guilty for years for doing that; which is why I built it up again. Sometimes, I just play.

My mother did it. She had a massive display case of dragons and cats. She has dozens of sets of china in the basement. Entire room with the bones of a bankrupt lumber mill. A massive library (that I inherited).

My father did it. I spent days going through pieces of paper where he kept every calendar, shred of note, and personal letter he sent to his children. I saw research when I asked for help to get through college, mainly to prove that I was going the wrong path and I really should just drive two hours one way to go the college he thought was a better choice for me. I ended up saying “screw you” and went into debt for a couple decades instead.

My father has terabytes of images that he drew. Decades of him struggling to be a “good” artist, self-doubt and pain. But I'm so happy to watch him to do it. Also, he did the artwork for the nuclear reactor project he was on, and the particle accelerator. And birthday cards for all of his children and grandchildren.

For me, there is a good size of digital hoarding going on. I mean, a couple thousand ebooks is one things, but I have PDFs, for game systems that go back years. The original PDFs of HERO System 5 and 6. Every scan of the Dragon magazine. GURPS. Legal documents, funny stories, and the like. And then there are my DVDs. Back when I didn't have children, I was buying 3-5 a week. When Suncoast Video went out of business, I had a huge refund from the IRS. Walked into the store and said “I'm gunna buy everything I can.” Now, there are DVDs I can't buy anymore, I can't find. But they are slowly rotting in my file cabinet (did you know they only have a 20-30 year shelf life?). I don't want to lose them, even if I don't use them every day.

And then there is Partner's photography business. They do large photo shoots but they also need to keep them around for years “just in case” someone loses their wedding photos. She doesn't “have to”, but I don't want to ever have to say “sorry, I don't have them” for a senior photos or someone's puppies. Or family members are lost.

So I want to keep them.

Storage

All this takes space. The nice thing about digital space is that I can make backups. I can take them with me with a few kilograms of hard drives or upload them into the cloud. I can make copies to make it harder to lose (like when I was hit with a ransomware that took out my media server). The goal is 3-2-1 Backup Strategy: three copies, two locations, and at least one offline copy.

Now, dealing with storage is something I've done for quite a long time. I remember being pulled out of sixth grade school for a day to help my mother recover her RAID 5 box. At the time, drives were in the low 100s of megabytes, but I had to learn a lot about how RAID worked to figure it out. Then watching the slow recovery that took an entire day… then having another drive fail about a day after we recovered the first time and doing it all again. I used and watched hardware and software RAID controllers fail.

At the time, the data wasn't static. We were processing millions of records to do analysis. The drives would get corrupted by poor power, heavy usage, and the eventual failure of mechanical drives. We had customers lose data, watched our data centers blow a disk. Like Partner's photos, we had to keep our analysis for years after because of re-evaluation or the occasional lawsuit.

Over time, it became apparent that there was a maximum size where RAID was useful. After a while, it wasn't a matter “if” a drive goes back but “when” a drive goes bad. Backblaze (where I keep my backups, I recommend them) had a quarterly blog post about drive stats. They track with my own experiences. Yeah, 1.7% failure rate doesn't seem like much but when I'm dealing with stuff that needs to be kept for years, it hits a lot faster than you'd think it would. And offline backups fail too.

When I left my mother's company, I went back a little and just threw some drives into a big machine. Didn't bother with RAID because I knew it would fail since I was looking at a decade of hard drive use and started with a manual mirroring between a couple Windows partitions. But as the DVDs got ripped, music got purchased from Amazon, and photos kept being taken, I started to run out. Soon, it wasn't just Q:\, R:\, and S:\ being copies but each one having a portion of everything.

When I got hit with the ransomware attack, I lost my DVD collection (years of ripping) but not most of Partner's photos and the other things. I started to recover them but one of the drives didn't make it.

Then I lost another drive a few months later.

Then we didn't have money to handle the failure indicators, so I was just watching the SMART notices popping up knowing that there was nothing to do but watch the slow-moving train accident.

Then the media server decided to go down for a week.

Not having access to anything was a stark reminder that I had to do “something” if I didn't want to lose everything. And it wasn't just tossing in another drive every once in a while. I had hit the threshold where I needed to spread it out across more than just drives, I needed to spread it across servers.

Ceph

Enter Ceph. I've read about Ceph for many years before that point and it always appealed to me. It seemed to solve a lot of the problems I've experienced over the years. And it didn't have to require same-sized disks to pull of a RAID. And I could add machines if I needed to add more storage.

The biggest thing with Ceph is that it balanced across multiple servers to reduce failure, but also let you mark files as needing one, two, or more copies. Partner's photo library? Two copies. Videos? One is probably good enough. The servers didn't have to be that powerful either, so I could use my older machines to keep the disk when I had to upgrade to a newer one. In theory, I could use a Raspberry Pi to act as a cluster.

The incident when this happened was when the media server died again. I had saved some money from my commissions (earmarked to get a book edited) and used it to buy fresh hard drives. If anything, replacing the 10+ year old drives with something new and bigger would give me some. I also took the opportunity to switch to NixOS instead of Windows, which I dislike anyways. I mean, NixOS had options for Ceph, how hard could it be?

Apparently, it really wasn't ready for prime time. Eventually the wiki popped up to say that.

Took me a week to figure it out. There was a lot missing (and still is) from the core, but I managed to write up some notes for myself on how to make it work. But, after days of trial and errors, of “almost got it” joys only to watch it crash, I finally got something working. And it was glorious. At least until I realized was swapping like mad and Ceph really needs 1 GB per TB of storage. A few frantic purchases later, and I had a couple more cheap Dell machines (one of the ones that actually died this week) and I ended up with a fairly balanced Ceph cluster.

SeaweedFS

I liked Ceph, but I'm not on it anymore. Partially it was selfish reasons, I want to fix the struggles I had bringing on a new drive into the cluster but the community decided to go a different way. But there was also no one who was able to to do that different way while I was down. To test a fix myself, I had to build for twelve hours just to see if something worked (I also work with 10+ year old computers a lot). I was on unstable, so I went weeks being unable to build because I couldn't downgrade to stable. I also learned that nixpkgs had some patterns that were really hard to get into, like redefining lua at the package level to mean a specific version of lua and no one on the Matrix or Discourse forum was able to tell me that until I happened to find one person who explained it and said I should have just “known” about that mapping.

And then, the same day I realized my efforts to get Ceph working were linked on the NixOS wiki for Ceph, I also saw a little line:

Another distributed filesystem alternative you may evaluate is SeaweedFS.

I vaguely remembered looking at SeaweedFS before, but I was focused on getting Ceph working. And in that moment, when Ceph was not building and I didn't have the resources to fix the problem myself, I decided to try it out.

SeaweedFS does 80% of Ceph. It didn't have all the fancy features, but it had the features I wanted:

  • Distributed across multiple machines
  • Ability to add or remove storage on the fly
  • Variable replication copies
  • Currently built and could be run
  • Can be mounted on Linux
  • It uses 30 GB volumes on standard ext4 partitions instead of a custom one I cannot debug
  • Single Go executable (don't code Go but it only takes twenty minutes build, not twelve hours)
  • Could create S3/cloud tier backup (Ceph couldn't do that)

Yeah, it didn't have NixOS options but Google and GitHub gave me a starting point for me to puzzle it out on my own. I learned a new library and started with a little partition to see if it worked. Like Ceph, I had a lots of fits and starts, trials and puzzling through it, but eventually got it working.

And it was more scattered in terms of information but it worked.

It didn't do Ceph's “deep cleaning” to detect bit rot on failing drives.

It blew up when you tried to create too many encoding shards without the space.

It blew up when you tried to replicate without having enough nodes.

Around that time, Ceph started building on NixOS again but I was already enamored by SeaweedFS. It was “good enough” for me. It took me about a week to migrate hunks of the Ceph data to the Seaweed, decommission a Ceph drive and make it a SeaweedFS drive. Then repeat until everything was moved over.

A few weeks ago, I tried to consolidate my dad's hard drives into the cluster and ran out of space. I ended up buying a new minicomputer and throwing 11 TB worth of drives into it to give me room (two copies of everything, even the media files). This week, I lost one of those old Dells so I had to shuffle around the volumes with a few commands and it “just worked”. I'm going to replace the dead computer and I'm confident that it will “just work” then too. And, more importantly, I don't have to spend a week to figure out the commands to make it happen since I can just copy/paste a bunch of Nix code and redeploy my servers.

Thoughts

I could hoard less. It takes time and energy to keep the computers running but it makes sure Partner has their Golden Girls, the kids have their videos, and I have my dad's artwork. Also, there is something peaceful about looking at a 29.9 TB partition and having it working smoothly.

+---------------------------------------------------------------------------------------------------------------+
| 1 fuse device                                                                                                 |
+-------------+-------+-------+-------+-------------------------------+----------------+------------------------+
| MOUNTED ON  |  SIZE |  USED | AVAIL |              USE%             | TYPE           | FILESYSTEM             |
+-------------+-------+-------+-------+-------------------------------+----------------+------------------------+
| /mnt/home   | 29.9T | 20.9T |  9.0T | [#############.......]  69.8% | fuse.seaweedfs | fs.home:8888:/         |
+-------------+-------+-------+-------+-------------------------------+----------------+------------------------+

Eventually this is all going to go away. When I die, my family isn't going to be able to keep it going. Like Rutejìmo's stories and my dad's artwork, all this will fade. But I'm going to keep it going as long as I can. And try to find more pages for my book.

Metadata

Categories:

Tags: