﻿<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text" xml:lang="en">SeaweedFS</title>
  <link type="application/atom+xml" href="https://d.moonfire.us/tags/seaweedfs/atom.xml" rel="self" />
  <link type="text/html" href="https://d.moonfire.us/tags/seaweedfs/" rel="alternate" />
  <updated>2026-03-09T17:42:47Z</updated>
  <id>https://d.moonfire.us/tags/seaweedfs/</id>
  <author>
    <name>D. Moonfire</name>
  </author>
  <rights>Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International</rights>
  <entry>
    <title>Why SeaweedFS?</title>
    <link rel="alternate" href="https://d.moonfire.us/blog/2024/11/08/why-seaweedfs/" />
    <updated>2024-11-08T06:00:00Z</updated>
    <id>https://d.moonfire.us/blog/2024/11/08/why-seaweedfs/</id>
    <category term="development" scheme="https://d.moonfire.us/categories/" label="Development" />
    <category term="ceph" scheme="https://d.moonfire.us/tags/" label="Ceph" />
    <category term="seaweedfs" scheme="https://d.moonfire.us/tags/" label="SeaweedFS" />
    <category term="nixos" scheme="https://d.moonfire.us/tags/" label="NixOS" />
    <category term="sand-and-bone" scheme="https://d.moonfire.us/tags/" label="Sand and Bone" />
    <summary type="html">Thoughts on why I take the effort to use a distributed network drive for my home.
</summary>
    <content type="html">&lt;p&gt;I got an email last week asking if I could explain why I set up a Seaweed server. It's been a while since I talked about it, mostly in hints in &lt;a href="/blog/2022/12/10/ceph-and-nixos/"&gt;this post from 2022&lt;/a&gt; and &lt;a href="/blog/2024/03/21/switching-ceph-to-seaweedfs/"&gt;this post from 2024&lt;/a&gt;. The request gives me an opportunity to expand on it, and to remind myself why I do it.&lt;/p&gt;
&lt;h2&gt;Stories&lt;/h2&gt;
&lt;p&gt;In some regards, there is a bit of trauma in me when it comes to losing out on things. This includes hearing stories that my father told me and realizing that when he died, they were gone forever. These same thoughts came up in my novel &lt;a href="/tags/sand-and-bone/"&gt;Sand and Bone&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The little girl's confessions echoed in Rutejìmo's head in a quiet symphony as he wrote in his book. The hand-bound collections of pages creaked under his hand, the leather thong strained to hold the almost fifty pages of tightly-spaced writing. Over the years, he had added a dozen pages to the collection. It wouldn't be too long before the binding couldn't handle the additional pages, but he thought he had a few more years left before that happened.&lt;/p&gt;
&lt;p&gt;Even with his additional pages, he didn't have room to write down all of the stories he had heard over the years. He wanted to detail the joys of the little girl's death, such as the choked story about how she had stolen her brother's toy when he wasn't looking. He had also wanted to write the horrors, like one man's confession for killing his sister. Each one was precious and important. Time would erase their stories and a part of Rutejìmo died every time he forgot one.&lt;/p&gt;
&lt;p&gt;&amp;mdash; &lt;a href="//fedran.com/sand-and-bone/chapter-008/"&gt;Sand and Bone 8: Alone&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This fear is more than just one person, losing stories of things that happened decades ago as told by an old man. This includes my own stories, which will also be lost when I die. There is little chance I will be &amp;ldquo;done&amp;rdquo; with &lt;a href="/tags/fedran/"&gt;Fedran&lt;/a&gt;. There is little chance I will get all of the stories out of my head and onto paper. There isn't enough time, I don't have the self-esteem to do it, and I don't have the drive. When I'm gone, so are those stories. But, even going beyond that, there are so many stories of horror and joy that I feel we are losing with the passage of time. Happy stories of people falling in love, horror stories of &lt;a href="https://www.auschwitz.org/en/"&gt;Auschwitz&lt;/a&gt;, and everything in between. It is that breadth of stories, lessons, and happiness that we lose.&lt;/p&gt;
&lt;p&gt;But the big question is, will anyone care? Does anyone want to know the first time I got caught shoplifting. Or the first time I finally said &amp;ldquo;I love you&amp;rdquo; to Partner?&lt;/p&gt;
&lt;p&gt;Probably not.&lt;/p&gt;
&lt;p&gt;But, you never know.&lt;/p&gt;
&lt;p&gt;I'm never going to record them, but when I was given my father's artwork and decades of effort, I don't want to lose them yet. I don't want to just chuck it aside. So, I need to keep them.&lt;/p&gt;
&lt;h2&gt;Hoarding&lt;/h2&gt;
&lt;p&gt;I come from a long line of hoarders. I realized that as I look at my DVD collection or the boxes of books I have. My LEGO collection isn't huge, but it does fill five or six boxes. I once got rid of it, and I felt guilty for years for doing that; which is why I built it up again. Sometimes, I just play.&lt;/p&gt;
&lt;p&gt;My mother did it. She had a massive display case of dragons and cats. She has dozens of sets of china in the basement. Entire room with the bones of a bankrupt lumber mill. A massive library (that I inherited).&lt;/p&gt;
&lt;p&gt;My father did it. I spent days going through pieces of paper where he kept every calendar, shred of note, and personal letter he sent to his children. I saw research when I asked for help to get through college, mainly to prove that I was going the wrong path and I really should just drive two hours one way to go the college he thought was a better choice for me. I ended up saying &amp;ldquo;screw you&amp;rdquo; and went into debt for a couple decades instead.&lt;/p&gt;
&lt;p&gt;My father has terabytes of images that he drew. Decades of him struggling to be a &amp;ldquo;good&amp;rdquo; artist, self-doubt and pain. But I'm so happy to watch him to do it. Also, he did the artwork for the nuclear reactor project he was on, and the particle accelerator. And birthday cards for all of his children and grandchildren.&lt;/p&gt;
&lt;p&gt;For me, there is a good size of digital hoarding going on. I mean, a couple thousand ebooks is one things, but I have PDFs, for game systems that go back years. The original PDFs of HERO System 5 and 6. Every scan of the &lt;em&gt;Dragon&lt;/em&gt; magazine. GURPS. Legal documents, funny stories, and the like. And then there are my DVDs. Back when I didn't have children, I was buying 3-5 a week. When Suncoast Video went out of business, I had a huge refund from the IRS. Walked into the store and said &amp;ldquo;I'm gunna buy everything I can.&amp;rdquo; Now, there are DVDs I can't buy anymore, I can't find. But they are slowly rotting in my file cabinet (did you know they only have a 20-30 year shelf life?). I don't want to lose them, even if I don't use them every day.&lt;/p&gt;
&lt;p&gt;And then there is Partner's photography business. They do large photo shoots but they also need to keep them around for years &amp;ldquo;just in case&amp;rdquo; someone loses their wedding photos. She doesn't &amp;ldquo;have to&amp;rdquo;, but I don't want to ever have to say &amp;ldquo;sorry, I don't have them&amp;rdquo; for a senior photos or someone's puppies. Or family members are lost.&lt;/p&gt;
&lt;p&gt;So I want to keep them.&lt;/p&gt;
&lt;h2&gt;Storage&lt;/h2&gt;
&lt;p&gt;All this takes space. The nice thing about digital space is that I can make backups. I can take them with me with a few kilograms of hard drives or upload them into the cloud. I can make copies to make it harder to lose (like when I was hit with a ransomware that took out my media server). The goal is &lt;a href="https://www.backblaze.com/blog/the-3-2-1-backup-strategy/"&gt;3-2-1 Backup Strategy&lt;/a&gt;: three copies, two locations, and at least one offline copy.&lt;/p&gt;
&lt;p&gt;Now, dealing with storage is something I've done for quite a long time. I remember being pulled out of sixth grade school for a day to help my mother recover her RAID 5 box. At the time, drives were in the low 100s of megabytes, but I had to learn a lot about how RAID worked to figure it out. Then watching the slow recovery that took an entire day&amp;hellip; then having another drive fail about a day after we recovered the first time and doing it all again. I used and watched hardware and software RAID controllers fail.&lt;/p&gt;
&lt;p&gt;At the time, the data wasn't static. We were processing millions of records to do analysis. The drives would get corrupted by poor power, heavy usage, and the eventual failure of mechanical drives. We had customers lose data, watched our data centers blow a disk. Like Partner's photos, we had to keep our analysis for years after because of re-evaluation or the occasional lawsuit.&lt;/p&gt;
&lt;p&gt;Over time, it became apparent that there was a maximum size where RAID was useful. After a while, it wasn't a matter &amp;ldquo;if&amp;rdquo; a drive goes back but &amp;ldquo;when&amp;rdquo; a drive goes bad. Backblaze (where I keep my backups, I recommend them) had a &lt;a href="https://www.backblaze.com/blog/backblaze-drive-stats-for-q2-2024/"&gt;quarterly blog post&lt;/a&gt; about drive stats. They track with my own experiences. Yeah, 1.7% failure rate doesn't seem like much but when I'm dealing with stuff that needs to be kept for years, it hits a lot faster than you'd think it would. And offline backups fail too.&lt;/p&gt;
&lt;p&gt;When I left my mother's company, I went back a little and just threw some drives into a big machine. Didn't bother with RAID because I knew it would fail since I was looking at a decade of hard drive use and started with a manual mirroring between a couple Windows partitions. But as the DVDs got ripped, music got purchased from Amazon, and photos kept being taken, I started to run out. Soon, it wasn't just &lt;code&gt;Q:\&lt;/code&gt;, &lt;code&gt;R:\&lt;/code&gt;, and &lt;code&gt;S:\&lt;/code&gt; being copies but each one having a portion of everything.&lt;/p&gt;
&lt;p&gt;When I got hit with the ransomware attack, I lost my DVD collection (years of ripping) but not most of Partner's photos and the other things. I started to recover them but one of the drives didn't make it.&lt;/p&gt;
&lt;p&gt;Then I lost another drive a few months later.&lt;/p&gt;
&lt;p&gt;Then we didn't have money to handle the failure indicators, so I was just watching the SMART notices popping up knowing that there was nothing to do but watch the slow-moving train accident.&lt;/p&gt;
&lt;p&gt;Then the media server decided to go down for a week.&lt;/p&gt;
&lt;p&gt;Not having access to anything was a stark reminder that I had to do &amp;ldquo;something&amp;rdquo; if I didn't want to lose everything. And it wasn't just tossing in another drive every once in a while. I had hit the threshold where I needed to spread it out across more than just drives, I needed to spread it across servers.&lt;/p&gt;
&lt;h3&gt;Ceph&lt;/h3&gt;
&lt;p&gt;Enter &lt;a href="/tags/ceph/"&gt;Ceph&lt;/a&gt;. I've read about Ceph for many years before that point and it always appealed to me. It seemed to solve a lot of the problems I've experienced over the years. And it didn't have to require same-sized disks to pull of a RAID. And I could add machines if I needed to add more storage.&lt;/p&gt;
&lt;p&gt;The biggest thing with Ceph is that it balanced across multiple servers to reduce failure, but also let you mark files as needing one, two, or more copies. Partner's photo library? Two copies. Videos? One is probably good enough. The servers didn't have to be that powerful either, so I could use my older machines to keep the disk when I had to upgrade to a newer one. In theory, I could use a Raspberry Pi to act as a cluster.&lt;/p&gt;
&lt;p&gt;The incident when this happened was when the media server died again. I had saved some money from my commissions (earmarked to get a book edited) and used it to buy fresh hard drives. If anything, replacing the 10+ year old drives with something new and bigger would give me some. I also took the opportunity to switch to &lt;a href="/tags/nixos/"&gt;NixOS&lt;/a&gt; instead of Windows, which I dislike anyways. I mean, NixOS had options for Ceph, how hard could it be?&lt;/p&gt;
&lt;p&gt;Apparently, it really wasn't ready for prime time. Eventually the wiki popped up to say that.&lt;/p&gt;
&lt;p&gt;Took me a week to figure it out. There was a lot missing (and still is) from the core, but I managed to write up some notes for myself on how to make it work. But, after days of trial and errors, of &amp;ldquo;almost got it&amp;rdquo; joys only to watch it crash, I finally got something working. And it was glorious. At least until I realized was swapping like mad and Ceph really needs 1 GB per TB of storage. A few frantic purchases later, and I had a couple more cheap Dell machines (one of the ones that actually died this week) and I ended up with a fairly balanced Ceph cluster.&lt;/p&gt;
&lt;h3&gt;SeaweedFS&lt;/h3&gt;
&lt;p&gt;I liked Ceph, but I'm not on it anymore. Partially it was selfish reasons, I want to fix the struggles I had bringing on a new drive into the cluster but the community decided to go a different way. But there was also no one who was able to to do that different way while I was down. To test a fix myself, I had to build for twelve hours just to see if something worked (I also work with 10+ year old computers a lot). I was on unstable, so I went weeks being unable to build because I couldn't downgrade to stable. I also learned that &lt;code&gt;nixpkgs&lt;/code&gt; had some patterns that were really hard to get into, like redefining &lt;code&gt;lua&lt;/code&gt; at the package level to mean a specific version of &lt;code&gt;lua&lt;/code&gt; and no one on the Matrix or Discourse forum was able to tell me that until I happened to find one person who explained it and said I should have just &amp;ldquo;known&amp;rdquo; about that mapping.&lt;/p&gt;
&lt;p&gt;And then, the same day I realized my efforts to get Ceph working were linked on the &lt;a href="https://nixos.wiki/wiki/Ceph"&gt;NixOS wiki for Ceph&lt;/a&gt;, I also saw a little line:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Another distributed filesystem alternative you may evaluate is SeaweedFS.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I vaguely remembered looking at SeaweedFS before, but I was focused on getting Ceph working. And in that moment, when Ceph was not building and I didn't have the resources to fix the problem myself, I decided to try it out.&lt;/p&gt;
&lt;p&gt;SeaweedFS does 80% of Ceph. It didn't have all the fancy features, but it had the features I wanted:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distributed across multiple machines&lt;/li&gt;
&lt;li&gt;Ability to add or remove storage on the fly&lt;/li&gt;
&lt;li&gt;Variable replication copies&lt;/li&gt;
&lt;li&gt;Currently built and could be run&lt;/li&gt;
&lt;li&gt;Can be mounted on Linux&lt;/li&gt;
&lt;li&gt;It uses 30 GB volumes on standard &lt;code&gt;ext4&lt;/code&gt; partitions instead of a custom one I cannot debug&lt;/li&gt;
&lt;li&gt;Single Go executable (don't code Go but it only takes twenty minutes build, not twelve hours)&lt;/li&gt;
&lt;li&gt;Could create S3/cloud tier backup (Ceph couldn't do that)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Yeah, it didn't have NixOS options but Google and GitHub gave me a starting point for me to puzzle it out on my own. I learned a new library and started with a little partition to see if it worked. Like Ceph, I had a lots of fits and starts, trials and puzzling through it, but eventually got it working.&lt;/p&gt;
&lt;p&gt;And it was more scattered in terms of information but it worked.&lt;/p&gt;
&lt;p&gt;It didn't do Ceph's &amp;ldquo;deep cleaning&amp;rdquo; to detect bit rot on failing drives.&lt;/p&gt;
&lt;p&gt;It blew up when you tried to create too many encoding shards without the space.&lt;/p&gt;
&lt;p&gt;It blew up when you tried to replicate without having enough nodes.&lt;/p&gt;
&lt;p&gt;Around that time, Ceph started building on NixOS again but I was already enamored by SeaweedFS. It was &amp;ldquo;good enough&amp;rdquo; for me. It took me about a week to migrate hunks of the Ceph data to the Seaweed, decommission a Ceph drive and make it a SeaweedFS drive. Then repeat until everything was moved over.&lt;/p&gt;
&lt;p&gt;A few weeks ago, I tried to consolidate my dad's hard drives into the cluster and ran out of space. I ended up buying a new minicomputer and throwing 11 TB worth of drives into it to give me room (two copies of everything, even the media files). This week, I lost one of those old Dells so I had to shuffle around the volumes with a few commands and it &amp;ldquo;just worked&amp;rdquo;. I'm going to replace the dead computer and I'm confident that it will &amp;ldquo;just work&amp;rdquo; then too. And, more importantly, I don't have to spend a week to figure out the commands to make it happen since I can just copy/paste a bunch of Nix code and redeploy my servers.&lt;/p&gt;
&lt;h2&gt;Thoughts&lt;/h2&gt;
&lt;p&gt;I could hoard less. It takes time and energy to keep the computers running but it makes sure Partner has their &lt;em&gt;Golden Girls&lt;/em&gt;, the kids have their videos, and I have my dad's artwork. Also, there is something peaceful about looking at a 29.9 TB partition and having it working smoothly.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+---------------------------------------------------------------------------------------------------------------+
| 1 fuse device                                                                                                 |
+-------------+-------+-------+-------+-------------------------------+----------------+------------------------+
| MOUNTED ON  |  SIZE |  USED | AVAIL |              USE%             | TYPE           | FILESYSTEM             |
+-------------+-------+-------+-------+-------------------------------+----------------+------------------------+
| /mnt/home   | 29.9T | 20.9T |  9.0T | [#############.......]  69.8% | fuse.seaweedfs | fs.home:8888:/         |
+-------------+-------+-------+-------+-------------------------------+----------------+------------------------+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Eventually this is all going to go away. When I die, my family isn't going to be able to keep it going. Like Rutejìmo's stories and my dad's artwork, all this will fade. But I'm going to keep it going as long as I can. And try to find more pages for my book.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Recovering SeaweedFS</title>
    <link rel="alternate" href="https://d.moonfire.us/blog/2024/10/25/recovering-seaweedfs/" />
    <updated>2024-10-25T05:00:00Z</updated>
    <id>https://d.moonfire.us/blog/2024/10/25/recovering-seaweedfs/</id>
    <category term="development" scheme="https://d.moonfire.us/categories/" label="Development" />
    <category term="ceph" scheme="https://d.moonfire.us/tags/" label="Ceph" />
    <category term="seaweedfs" scheme="https://d.moonfire.us/tags/" label="SeaweedFS" />
    <category term="plex" scheme="https://d.moonfire.us/tags/" label="Plex" />
    <category term="tailscale" scheme="https://d.moonfire.us/tags/" label="Tailscale" />
    <category term="nixos" scheme="https://d.moonfire.us/tags/" label="NixOS" />
    <category term="backblaze" scheme="https://d.moonfire.us/tags/" label="Backblaze" />
    <summary type="html">I accidentally overfilled my SeaweedFS, here is how I recovered my cluster.
</summary>
    <content type="html">&lt;p&gt;Lately, I've been quite fond of &lt;a href="/tags/seaweedfs/"&gt;SeaweedFS&lt;/a&gt;. It isn't as powerful as &lt;a href="/tags/ceph/"&gt;Ceph&lt;/a&gt; but it considerably easier to maintain and manage. There are some tradeoffs, such as finding bit rotting (when the disks start to fail), but I find it not quite as &amp;ldquo;fragile&amp;rdquo; when it comes to using a random collection of Linux machines.&lt;/p&gt;
&lt;p&gt;One of the features I want to play with SeaweedFS is the ability to upload a directory transparently to a S3 bucket (not AWS though, they are too big). I'm thinking about that for later, when I want to make an extra, off-site back up critical files including Partner's photo shoots.&lt;/p&gt;
&lt;h2&gt;Overfilling&lt;/h2&gt;
&lt;p&gt;Last week, I worked on one of the tasks I've been stalling on: archiving my dad's artwork. He had a lot of copies of nearly identical files and I didn't have the working storage on my laptop. I figured I had this huge (22 TB, though mostly full) cluster, I could use that.&lt;/p&gt;
&lt;p&gt;Yeah&amp;hellip; not the best of ideas.&lt;/p&gt;
&lt;p&gt;I didn't realize I had made a mistake until everything started to fail because all of the nodes were at 98% or more full and the system couldn't replicate even the replication logs. I didn't even realize that until Partner said &lt;a href="/tags/plex/"&gt;Plex&lt;/a&gt; was down.&lt;/p&gt;
&lt;p&gt;Well, with replication down, I couldn't even use the weed shell to remove a file. When I did that, it just hung for hours.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;$ weed-shell
&amp;gt; rm -rf in/dad-pictures
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Nix Shell Scripts&lt;/h2&gt;
&lt;p&gt;Above, I use &lt;code&gt;weed-shell&lt;/code&gt;. This is a custom script I generate with &lt;a href="/tags/nixos/"&gt;NixOS&lt;/a&gt; that is installed in any server that can talk to my SeaweedFS.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-nix"&gt;inputs:
let
  shellScript = (
    pkgs.writeShellScriptBin &amp;quot;weed-shell&amp;quot; ''
      weed shell -filer fs.local:8888 -master fs.local:9333 &amp;quot;$@&amp;quot;
    ''
  );
in
{
  environment.systemPackages = [ shellScript ];
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This lets me handle common functions I use when maintaining things. In this case, I don't have to enter the common parameters needed to talk to my SeaweedFS cluster.&lt;/p&gt;
&lt;h2&gt;Cleaning Up&lt;/h2&gt;
&lt;p&gt;I tried a bunch of things, such as trying to force a more extreme of vacuuming (cleaning deleted files):&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;&amp;gt; volume.vacuum --help
Usage of volume.vacuum:
  -collection string
    	vacuum this collection
  -garbageThreshold float
    	vacuum when garbage is more than this limit (default 0.3)
  -volumeId uint
    	the volume id
&amp;gt; volume.vacuum -garbageThreshold 0.1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This didn't help as much as I hoped, but it did allow some replication and some commands to go through. I needed to clear up a lot more space so I could remove files properly and do a wholesale &lt;code&gt;rm -rf&lt;/code&gt; to blow away father's files and try again later once I get some more space.&lt;/p&gt;
&lt;h2&gt;Replication&lt;/h2&gt;
&lt;p&gt;I have my volumes set to &lt;code&gt;010&lt;/code&gt; replication. These are three numbers as data center, rack, and host.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data Center: Make X copies to replicate across multiple data centers. I don't have multiple data centers, so this is always &lt;code&gt;0&lt;/code&gt; for me.&lt;/li&gt;
&lt;li&gt;Rack: This is to replicate across multiple racks. My setup is that each computer is a &amp;ldquo;rack&amp;rdquo; so my &lt;code&gt;1&lt;/code&gt; replication means make an extra copy on a different machine. I treat this as a rack because I also have a DeskPi Super6C which is 6 Raspberry Pi CM4 (compute modules) in a single case, so I treat all six as a &amp;ldquo;rack&amp;rdquo; but with separate hosts.&lt;/li&gt;
&lt;li&gt;Host: The same machine. I don't have much use for two copies on the same machine, so I always set this to &lt;code&gt;0&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If I ever got a friend where I could set up a local server, I would consider setting up a second &amp;ldquo;data center&amp;rdquo; to have an off-site backup. That probably would require &lt;a href="/tags/tailscale/"&gt;Tailscale&lt;/a&gt; but that's beyond my current scope.&lt;/p&gt;
&lt;h2&gt;Volumes&lt;/h2&gt;
&lt;p&gt;SeaweedFS basically creates multiple 30 GB files which act as a blob with multiple files inside it. That way, the problems with thousands of small files aren't an issue since everything is done on the 30 GB files called &amp;ldquo;volumes&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Replication is done at the volume level, which means I was able to turn off replication for a series of volumes.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;&amp;gt; lock
&amp;gt; volume.configure.replication -replication 000 -volumeId 1
&amp;gt; volume.configure.replication -replication 000 -volumeId 2
&amp;gt; volume.fix.replication
&amp;gt; volume.balance -force
&amp;gt; unlock
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;lock&lt;/code&gt; and &lt;code&gt;unlock&lt;/code&gt; are important when making changes like this, they prevent some critical operations from corrupting the cluster. The commands will tell you when it is needed.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;volume.configure.replication&lt;/code&gt; basically changed those volumes to no replication (risky). Once that is done, &lt;code&gt;volume.fix.replication&lt;/code&gt; and &lt;code&gt;volume.balance -force&lt;/code&gt; deletes the excessive copies and shuffle things around, giving me some breathing room to get replication running again so I can mass delete files.&lt;/p&gt;
&lt;p&gt;When I'm done, I just go and change all the nodes back to &lt;code&gt;-replication 010&lt;/code&gt; to give me the second backup.&lt;/p&gt;
&lt;h2&gt;Data Hoarding&lt;/h2&gt;
&lt;p&gt;The problem ultimate is data hoarding. Both my father and I both have multiple copies of files running around. It isn't great, but when you don't have time to clean out a copy of a laptop dying, it is sometimes easier to &lt;code&gt;rsync&lt;/code&gt; the entire laptop into a directory of the new machine and then move on.&lt;/p&gt;
&lt;p&gt;In this case, I needed to do some trimming of the duplicates from his files. The script is based on the one from a &lt;a href="https://stackoverflow.com/a/19552048"&gt;StackOverflow answer&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;find . -not -empty -type f -printf &amp;quot;%s\n&amp;quot; \
    | sort -rn \
    | uniq -d \
    | xargs -I{} -n1 find . -type f -size {}c -print0 \
    | xargs -0 sha256sum \
    | sort \
    | uniq -w32 --all-repeated=separate
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output is pretty simple because it only lists duplicates and the paths to find them.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;$ echo one &amp;gt; a.txt
$ echo one &amp;gt; b.txt
$ echo two &amp;gt; c.txt
$ echo two &amp;gt; d.txt
$ echo three &amp;gt; e.txt
$ find . -not -empty -type f -printf &amp;quot;%s\n&amp;quot; \
    | sort -rn \
    | uniq -d \
    | xargs -I{} -n1 find . -type f -size {}c -print0 \
    | xargs -0 sha256sum \
    | sort \
    | uniq -w32 --all-repeated=separate
27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a  ./c.txt
27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a  ./d.txt

2c8b08da5ce60398e1f19af0e5dccc744df274b826abe585eaba68c525434806  ./a.txt
2c8b08da5ce60398e1f19af0e5dccc744df274b826abe585eaba68c525434806  ./b.txt
$
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this, I can find the duplicates in my own system and delete them to clear out a few terabytes worth of data. It takes time, but I haven't done it before so I pointed it at &lt;code&gt;/mnt/seaweed&lt;/code&gt; and let it run.&lt;/p&gt;
&lt;p&gt;Once that is done, I can turn replication back on, fix replication, rebalance, and I should be good to go.&lt;/p&gt;
&lt;h2&gt;Forward Steps&lt;/h2&gt;
&lt;p&gt;I knew I was running out of storage for a while now, so I blew my monthly budget and got the fourth server ordered. This one has three 4 TB NVMe sticks (about 11 TB which will be added to the cluster) and should give me enough room to get my dad's files collected, deduplicated, and then look into uploading them to a cheap S3 storage (&lt;a href="/tags/backblaze/"&gt;Backblaze&lt;/a&gt;) for later.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Switching Ceph to SeaweedFS on NixOS</title>
    <link rel="alternate" href="https://d.moonfire.us/blog/2024/03/21/switching-ceph-to-seaweedfs/" />
    <updated>2024-03-21T05:00:00Z</updated>
    <id>https://d.moonfire.us/blog/2024/03/21/switching-ceph-to-seaweedfs/</id>
    <category term="development" scheme="https://d.moonfire.us/categories/" label="Development" />
    <category term="ceph" scheme="https://d.moonfire.us/tags/" label="Ceph" />
    <category term="seaweedfs" scheme="https://d.moonfire.us/tags/" label="SeaweedFS" />
    <category term="raspberry-pi" scheme="https://d.moonfire.us/tags/" label="Raspberry Pi" />
    <category term="colmena" scheme="https://d.moonfire.us/tags/" label="Colmena" />
    <category term="nixos" scheme="https://d.moonfire.us/tags/" label="NixOS" />
    <category term="restic" scheme="https://d.moonfire.us/tags/" label="Restic" />
    <summary type="html">Over the new year, I decided to get SeaweedFS working on my home lab and eventually took down my Ceph cluster to move everything over.
</summary>
    <content type="html">&lt;p&gt;At the end of 2023, I realized that I was running out of space on my home &lt;a href="/tags/ceph/"&gt;Ceph&lt;/a&gt; cluster and it was time to add another node to it. While I had space for one more 3.5&amp;quot; drive in one of my servers, I was feeling a little adventurous and decided to get a &lt;a href="https://deskpi.com/collections/deskpi-super6c"&gt;DeskPi Super6C&lt;/a&gt;, a &lt;a href="/tags/raspberry-pi/"&gt;Raspberry Pi&lt;/a&gt; CM4, a large NVMe drive, and try to create a new node that way.&lt;/p&gt;
&lt;p&gt;Well, over the following few months, a lot of mistakes were made that are worthy of a dedicated post. But, when most of those problems were resolve, I encountered another series of &amp;ldquo;adventures&amp;rdquo; which led me to switching out my home's Ceph cluster for a &lt;a href="/tags/seaweedfs/"&gt;SeaweedFS&lt;/a&gt; one.&lt;/p&gt;
&lt;h2&gt;Unable to Build Ceph&lt;/h2&gt;
&lt;p&gt;Around the time I was working on the Pi setup, my &lt;a href="/tags/nixos/"&gt;NixOS&lt;/a&gt; flake was unable to build the &lt;code&gt;ceph&lt;/code&gt; packages. Part of this is because I was working off unstable and so the few weeks of being unable to build meant I couldn't get Ceph working on the new hardware. I tried even compiling it myself, which takes about six hours on my laptop and longe because I had to remote build on the Pi itself since I have yet to figure out how to get &lt;a href="/tags/colmena/"&gt;Colmena&lt;/a&gt; to build &lt;code&gt;aarch64&lt;/code&gt; on my laptop.&lt;/p&gt;
&lt;p&gt;Also, I was dreading setting up Ceph since I remember how many manual steps I had to do to get the OSDs working on my machines. While researching it, I was surprised to see my &lt;a href="/blog/2022/12/10/ceph-and-nixos/"&gt;blog post on it&lt;/a&gt; was on the &lt;a href="https://nixos.wiki/wiki/Ceph"&gt;wiki page&lt;/a&gt;, which is kind of cool and a nice egoboo.&lt;/p&gt;
&lt;p&gt;There was &lt;a href="https://github.com/NixOS/nixpkgs/pull/281924"&gt;a PR&lt;/a&gt; on Github for using the Ceph-provided OSD setup that would have hopefully alleviated it. That looked promising, so I was watching that PR with interest because I was right at the point of needing it.&lt;/p&gt;
&lt;p&gt;Sadly, that PR ended up being abandoned for a &amp;ldquo;better&amp;rdquo; approach. Given that it takes me six hours to build Ceph, I couldn't really help with that approach which meant I was stuck waiting unless I was willing to dedicate a month or so trying to figure it all out. Given that the last time I tried to do that, my PR was abandoned for a different reason, I was preparing to keeping my Ceph pinned until the next release and just having my Raspberry Pi setup sit there idle.&lt;/p&gt;
&lt;p&gt;I was also being impatient and there was something new to try out.&lt;/p&gt;
&lt;h2&gt;SeaweedFS&lt;/h2&gt;
&lt;p&gt;Then I noticed a little thing on top of the NixOS wiki for Ceph:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Another distributed filesystem alternative you may evaluate is SeaweedFS.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I vaguely remember looking at it when I first set up my 22 TB Ceph cluster, but I'd been dreaming about having a Ceph cluster for so long that it was dismissed because I really wanted to do Ceph.&lt;/p&gt;
&lt;p&gt;Now, the need was less there so I thought I would give it a try. If anything, I still had a running Ceph cluster and I could run them side-by-side.&lt;/p&gt;
&lt;p&gt;A big difference I noticed is that SeaweedFS has a single executable that provides everything. You can run it as a all-in-one process, but the three big services can also be run independently. That includes the master (coordinates everything), the volumes (where things are stored), and filer (make it look like a filesystem).&lt;/p&gt;
&lt;p&gt;Also, Ceph likes to work at the block level whereas SeaweedFS wants to be pointed to plain directories. So the plan was to take the 1 TB drive for my Raspberry Pi and turn it into a little cluster to try it out.&lt;/p&gt;
&lt;h2&gt;SeaweedFS and NixOS&lt;/h2&gt;
&lt;p&gt;The first thing was that SeaweedFS doesn't have any NixOS options. I couldn't find any flakes for it either. My attempt to create one took me three days with little success. Instead, I ended up cheating and just grabbed the &lt;a href="https://hg.sr.ht/%7Edermetfan/seaweedfs-nixos/browse/seaweedfs.nix?rev=tip"&gt;best-looking one I could find&lt;/a&gt; and dump it directly into my flake. It isn't even an override.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Yeah I would love to have a flake for this but I'm not skilled enough to create it myself.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Masters&lt;/h2&gt;
&lt;p&gt;With that, a little fumbling got a master† server up and running. You only need one of these, so pick a stable server and set it up.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-nix"&gt;# src/nodes/node-0.nix
inputs @ { config
, pkgs
, flakes
, ...
}: {
  imports = [
    ../../services/seaweedfs.nix # the file from dermetfan
  ];

  services.seaweedfs.clusters.default = {
    package = pkgs.seaweedfs;

    masters.main = {
      openFirewall = true;
      ip = &amp;quot;fs.home&amp;quot;; # This is what shows up in the links
      mdir = &amp;quot;/var/lib/seaweedfs/master/main&amp;quot;;
      volumePreallocate = true;

      defaultReplication = {
        dataCenter = 0;
        rack = 0;
        server = 0;
      };
    };
  };
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a really basic setup that doesn't really do anything. The master server is pretty much a coordinator. But, what is nice is that that it shows something by starting up a web server at &lt;code&gt;fs.home:9333&lt;/code&gt; that lets you see that it is up (sadly, no dark mode) and running. This site will also let you get to all the other servers through web links.&lt;/p&gt;
&lt;p&gt;Another important part is the &lt;code&gt;defaultReplication&lt;/code&gt;. I made it explicit, but when messing around, setting all three to &lt;code&gt;0&lt;/code&gt; means that you don't get hung up the first time you try to write a file and it tries to replicate to a second node that isn't set up. All zeros is basically &amp;ldquo;treat the cluster as a single large disk.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Later on, you can change that easily. I ended up setting &lt;code&gt;rack = 1;&lt;/code&gt; in the above example because I treat each node as a &amp;ldquo;rack&amp;rdquo; since I don't really have a server rack.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;† I don't like using &amp;ldquo;master&amp;rdquo; and prefer main, but that is the terminology that SeaweedFS uses.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Volumes&lt;/h2&gt;
&lt;p&gt;Next up was configuring a volume server. I ended up doing one per server (I have four nodes in the cluster now) even though three of them had multiple partitions/directories on different physical drives. In all of these cases, I named the directory &lt;code&gt;/mnt/fs-001&lt;/code&gt; and created an &lt;code&gt;ext4&lt;/code&gt; partition on it. I could have used ZFS, but I know and trust &lt;code&gt;ext4&lt;/code&gt; and had trouble with ZFS years ago. But it doesn't matter, just make a drive.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-nix"&gt;# src/nodes/node-0.nix
inputs @ { config
, pkgs
, flakes
, ...
}: {
  imports = [
    ../../services/seaweedfs.nix # the file from dermetfan
  ];

  services.seaweedfs.clusters.default = {
    package = pkgs.seaweedfs;

    volumes.${config.networking.hostName} = {
      openFirewall = true;
      dataCenter = &amp;quot;home&amp;quot;;
      rack = config.networking.hostName;
      ip = &amp;quot;${config.networking.hostName}.home&amp;quot;;
      dir = [ &amp;quot;/mnt/fs-001&amp;quot; ];
      disk = [ &amp;quot;hdd&amp;quot; ]; # Replication gets screwy if these don't match
      max = [ 0 ];
      port = 9334;

      mserver = [
        {
          ip = &amp;quot;fs.home&amp;quot;;
          port = 9333;
        }
      ];
    };
  };
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once started up, this starts a service on &lt;code&gt;http://node-001.home:9333&lt;/code&gt;, connects to the master which will then show a link on that page, and basically say there is plenty of space.&lt;/p&gt;
&lt;p&gt;The key parts I found are the &lt;code&gt;disk&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Replication is based on dataCenter, rack, and server but also only if the disk types agree. So, &lt;code&gt;hdd&lt;/code&gt; will only sync to other &lt;code&gt;hdd&lt;/code&gt; even if half of them are &lt;code&gt;ssd&lt;/code&gt; or &lt;code&gt;nvme&lt;/code&gt;. Because I have a mix of NVMe and HDD, I made them all &lt;code&gt;hdd&lt;/code&gt; because it works and I don't really care.&lt;/p&gt;
&lt;p&gt;The value of &lt;code&gt;0&lt;/code&gt; for &lt;code&gt;max&lt;/code&gt; means use all the available space. Otherwise, it only grabs a small number of 30 GB blocks and stops. Since I was dedicating the entire drive over to the cluster, I wanted to use everything.&lt;/p&gt;
&lt;h2&gt;Filers&lt;/h2&gt;
&lt;p&gt;The final service needed is a filer. This is basically the POSIX layer that lets you mount the drive in Linux and start to do fun things with it. Like the others, it just gets put on the server. I only set up one filer and it seems to work, but others set up multiples. I just don't really understand why.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-nix"&gt;# src/nodes/node-0.nix
inputs @ { config
, pkgs
, flakes
, ...
}: {
  imports = [
    ../../services/seaweedfs.nix # the file from dermetfan
  ];

  services.seaweedfs.clusters.default = {
    package = pkgs.seaweedfs;

    filers.main = {
      openFirewall = true;
      dataCenter = &amp;quot;home&amp;quot;;
      encryptVolumeData = false;
      ip = &amp;quot;fs.home&amp;quot;;
      peers = [ ];

      master = [ # this is actually in cluster.masters that I import in the real file
        {
          ip = &amp;quot;fs.home&amp;quot;;
          port = 9333;
        }
      ];
    };
  };
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like the others, this starts up a web service at &lt;code&gt;fs.home:8888&lt;/code&gt; that lets you browse the file system, upload files, and do fun things. Once this is all deployed (by your system of choice, mine is Colmena), then it should be up and running. Which means you should be able to upload a folder with the port 8888 site.&lt;/p&gt;
&lt;h2&gt;Debugging&lt;/h2&gt;
&lt;p&gt;I found the error messages are a little confusing a time, but weren't too much of a trouble to find. I just had to tail &lt;code&gt;journalctl&lt;/code&gt; and then try to figure it out.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;journalctl -f | grep seaweed
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you have multiple servers, debugging requires doing this to all of them.&lt;/p&gt;
&lt;h2&gt;Secondary Volumes&lt;/h2&gt;
&lt;p&gt;Adding more volumes is pretty easy. I just add a Nix expression to each node to include drives.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-nix"&gt;  services.seaweedfs.clusters.default = {
    package = pkgs.seaweedfs;

    volumes.main = {
      openFirewall = true;
      dataCenter = &amp;quot;main&amp;quot;;
      rack = config.networking.hostName;
      mserver = cluster.masters; # I have this expanded out above
      ip = &amp;quot;${config.networking.hostName}.home&amp;quot;;
      dir = [ &amp;quot;/mnt/fs-002&amp;quot; &amp;quot;/mnt/fs-007&amp;quot; ]; # These are two 6 TB red drives
      disk = [ &amp;quot;hdd&amp;quot; &amp;quot;hdd&amp;quot; ]; # Replication gets screwy if these don't match
      max = [ 0 0 ];
      port = 9334;
    };
  };
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As soon as they deploy, they hook up automatically and increase the size of the cluster.&lt;/p&gt;
&lt;h2&gt;Mounting&lt;/h2&gt;
&lt;p&gt;Mounting&amp;hellip; this gave me a lot of trouble. Nix does not play well with the auto-mount and SeaweedFS, so I had to jump through a few hoops. In the end, I created a &lt;code&gt;mount.nix&lt;/code&gt; file that I include on any node that I have to mount the cluster on, which always goes into &lt;code&gt;/mnt/cluster&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-nix"&gt;inputs @ { config
, pkgs
, ...
}:
let
  mountDir = &amp;quot;/mnt/cluster&amp;quot;;

  # A script to go directly to the shell.
  shellScript = (pkgs.writeShellScriptBin
    &amp;quot;weed-shell&amp;quot;
    ''
      weed shell -filer fs.home:8888 -master fs.home:9333 &amp;quot;$@&amp;quot;
    '');

  # A script to list the volumes.
  volumeListScript = (pkgs.writeShellScriptBin
    &amp;quot;weed-volume-list&amp;quot;
    ''
      echo &amp;quot;volume.list&amp;quot; | weed-shell
    '');

  # A script to allow the file system to be mounted using Nix services.
  mountScript = (pkgs.writeShellScriptBin
    &amp;quot;mount.seaweedfs&amp;quot;
    ''
      if ${pkgs.gnugrep}/bin/grep -q ${mountDir} /proc/self/mountinfo
      then
        echo &amp;quot;already mounted, unmounting&amp;quot;
        exit 0
      fi

      echo &amp;quot;mounting weed: ${pkgs.seaweedfs}/bin/weed&amp;quot; &amp;quot;$@&amp;quot;
      ${pkgs.seaweedfs}/bin/weed &amp;quot;$@&amp;quot;
      status=$?

      for i in 1 1 2 3 4 8 16
      do
        echo &amp;quot;checking if mounted yet: $i&amp;quot;
        if ${pkgs.gnugrep}/bin/grep -q ${mountDir} /proc/self/mountinfo
        then
          echo &amp;quot;mounted&amp;quot;
          exit 0
        fi

        ${pkgs.coreutils-full}/bin/sleep $i
      done

      echo &amp;quot;gave up: status=$status&amp;quot;
      exit $status
    '');
in
{
  imports = [
    ../../seaweedfs.nix
  ];

  # The `weed fuse` returns too fast and systemd doesn't think it has succeeded
  # so we have a little delay put in here to give the file system a chance to
  # finish mounting and populate /proc/self/mountinfo before returning.
  environment.systemPackages = [
    pkgs.seaweedfs
    shellScript
    volumeListScript
    mountScript
  ];

  systemd.mounts = [
    {
      type = &amp;quot;seaweedfs&amp;quot;;
      what = &amp;quot;fuse&amp;quot;;
      where = &amp;quot;${mountDir}&amp;quot;;
      mountConfig = {
        Options = &amp;quot;filer=fs.home:8888&amp;quot;;
      };
    }
  ];
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, let me break this into the parts. SeaweedFS has a nice little interactive shell where you can query status, change replication, and do lots of little things. However, it requires a few parameters, so the first thing I do is create a shell script called &lt;code&gt;weed-shell&lt;/code&gt; that provides those parameters so I don't have to type them.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;$ weed-shell
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second thing while doing this is that I wanted to see a list of all the volumes. SeaweedFS creates 30 GB blobs for storage instead of thousands of little files. This makes things more efficient in a lot of way (replication is done on volume blocks).&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;$ weed-volume-list | head
.&amp;gt; Topology volumeSizeLimit:30000 MB hdd(volume:810/1046 active:808 free:236 remote:0)
  DataCenter main hdd(volume:810/1046 active:808 free:236 remote:0)
    Rack node-0 hdd(volume:276/371 active:275 free:95 remote:0)
      DataNode node-0.home:9334 hdd(volume:276/371 active:275 free:95 remote:0)
        Disk hdd(volume:276/371 active:275 free:95 remote:0)
          volume id:77618  size:31474091232  file_count:16345  replica_placement:10  version:3  modified_at_second:1708137673 
          volume id:77620  size:31501725624  file_count:16342  delete_count:4  deleted_byte_count:7990733  replica_placement:10  version:3  modified_at_second:1708268248 
          volume id:77591  size:31470805832  file_count:15095  replica_placement:10  version:3  modified_at_second:1708104961 
          volume id:77439  size:31489572176  file_count:15067  replica_placement:10  version:3  modified_at_second:1708027468 
          volume id:77480  size:31528095736  file_count:15118  delete_count:1  deleted_byte_count:1133  replica_placement:10  version:3  modified_at_second:1708093312 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When doing things manually, that is all I needed to see things working and get the warm and fuzzy feeling that it worked.&lt;/p&gt;
&lt;p&gt;Getting it to automatically mount (or even &lt;code&gt;systemctl start mnt-cluser.mount&lt;/code&gt;) is that the command to do so is &lt;code&gt;weed fuse /mnt/cluster -o &amp;quot;filer=fs.home:8888&amp;quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;NixOS doesn't like that.&lt;/p&gt;
&lt;p&gt;So my answer was to write a shell script that fakes a &lt;code&gt;mount.seaweedfs&lt;/code&gt; and calls the right thing. Unfortunately, it rarely worked and it took me a few days to figure out why. While &lt;code&gt;weed fuse&lt;/code&gt; returns right away, I'm guessing network latency means that &lt;code&gt;/proc/self/mountinfo&lt;/code&gt; doesn't update for a few seconds later. But &lt;code&gt;systemd&lt;/code&gt; had already queried the &lt;code&gt;mountinfo&lt;/code&gt; file, saw that it wasn't mounted, and then declared the mount failed.&lt;/p&gt;
&lt;p&gt;But, by the time I (as a slow human) looked at it, the &lt;code&gt;mountinfo&lt;/code&gt; showed success.&lt;/p&gt;
&lt;p&gt;The answer was to delay returning from &lt;code&gt;mount.seaweedfs&lt;/code&gt; until we give SeaweedFS a chance to finish so &lt;code&gt;systemd&lt;/code&gt; could see it was mounted and didn't fail the unit. Hence the loop, grep, and sleeping inside &lt;code&gt;mount.seaweedfs&lt;/code&gt;. Figuring that out required a lot of reading code and puzzling through things to figure out, so hopefully that will help someone else.&lt;/p&gt;
&lt;p&gt;After I did, though, it was working pretty smoothly since, including recovering on reboot.&lt;/p&gt;
&lt;h2&gt;Changing Replication&lt;/h2&gt;
&lt;p&gt;As I mentioned above, once I was able to migrate the Ceph cluster, I changed replication to &lt;code&gt;rack = 1;&lt;/code&gt; to create one extra copy across all four nodes. However, SeaweedFS doesn't automatically rebalance like Ceph does. Instead, you have to go into the shell and run some commands.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-shell"&gt;$ weed-shell
lock
volume.deleteEmpty -quietFor=24h -force
volume.balance -force
volume.fix.replication
unlock
exit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can also set it up to do it automatically, I'm not entirely sure I've done that so I'm not going to show my attempt.&lt;/p&gt;
&lt;h2&gt;Observations&lt;/h2&gt;
&lt;p&gt;One of the biggest thing I noticed is that Ceph does proactive maintenance on drives. It doesn't sound like much, but I feel more comfortable that Ceph would detect errors. It also means that the hard drives are always running in my basement; just the slow grind of physical hardware as Ceph scrubs and shuffles things around.&lt;/p&gt;
&lt;p&gt;SeaweedFS is more passive in that regard. I don't trust that it won't catch a failed hard drive a fast, but it still doesn't have the failures of RAID, lets me spread out data across multiple servers and locations. There is also a feature for uploading to a S3 server if I wanted. I use a &lt;a href="/tags/restic/"&gt;Restic&lt;/a&gt; service for my S3 uploads.&lt;/p&gt;
&lt;p&gt;That passivity also means it hasn't been grinding my drives as much and I don't have to worry about the SSDs burning out too quickly.&lt;/p&gt;
&lt;p&gt;Another minor thing is that while there are a lot less options with SeaweedFS, it took me about a third of the time to get the cluster up and running. There were a few error messages that threw me, but for the most part, I understood the errors and what SeaweedFS was looking for. Not always the case with Ceph and I had a few year-long warnings that I never figured out how to fix that I was content to leave as-is.&lt;/p&gt;
&lt;p&gt;I do not like the lack of dark mode on SeaweedFS's websites.&lt;/p&gt;
&lt;h2&gt;Opinions&lt;/h2&gt;
&lt;p&gt;I continue to like Ceph, but I also like SeaweedFS. I would use either, depending on the expected load. If I was running Docker images or doing coding on the cluster, I would use a Ceph cluster. But, in my case, I'm using it for long-term storage, video files, assets, and photo shoots. Not to mention my dad's backups. So I don't need the interactive of Ceph along its higher level of maintenance.&lt;/p&gt;
&lt;p&gt;Also, it is a relatively simple Go project, doesn't take six hours to build, and uses more concepts that I understand (&lt;code&gt;mkfs.ext4&lt;/code&gt;) that I'm more comfortable with it.&lt;/p&gt;
&lt;p&gt;It was also available at the point I wanted to play (though Ceph is building on NixOS unstable again, so that is moot problem. I was just being impatient and wanted to learn something new.)&lt;/p&gt;
&lt;p&gt;At the moment, SeaweedFS works out nice for my use case and decided to switch my entire Ceph cluster over. I don't feel as safe with SeaweedFS, but I feel Safe Enough™.&lt;/p&gt;
</content>
  </entry>
</feed>
