BobsAccountant

joined 1 year ago
[โ€“] BobsAccountant@lemmy.world 4 points 9 months ago* (last edited 9 months ago)

My family and I really like it. I invested in a small, physical scanner capable of network file sharing that we have plugged in and always ready to scan. When we get documents or receipts, we scan them and they're immediately added to the database. I also have it checking an email address (mine is custom, but you could really have it check any address) and any time a PDF or such is sent, it gets consumed and that email them gets sorted.

There are a few downsides, however. As mentioned in other posts, turning your physical stack of documents into a digital stack of documents is just trading one pile for another. At least with a digital pile, you can sort a little quicker, but you still have to sort the consumed documents and check them to make sure the engine, which is supposed to be learning, has elected to sort the documents correctly.

The compose stack is pretty easy to use, but it does benefit from a little knowledge in Docker/containers. Especially when the main container decides it's not healthy. I wouldn't recommend it to a first time Docker user, is all.

Additionally and also previously mentioned, if you're keeping important documents in it, encrypted storage with encrypted back up is important.

[โ€“] BobsAccountant@lemmy.world 2 points 9 months ago* (last edited 9 months ago) (2 children)

Adding on to this:

These are all great points, but I wanted to share something that I wish I'd known before I spun up my array... The configuration of your array matters a lot. I had originally chosen to use RAIDZ1 as it's the most efficient with capacity while still offering a little fault tolerance. This was a mistake, but in my defense, the hard data on this really wasn't distributed until long after I had moved my large (for me) dataset to the array. I really wish I had gone with a Striped Mirror configuration. The benefits are pretty overwhelming:

  • Performance is better than even RAIDZ2, especially as individual disk size increases.
  • Fault tolerance is better as you could have up to 50% of the disks fail, so long as one disk in a mirrored set remains functional.
  • Fault recovery is better. With traditional arrays with distributed chunks, you have to resilver (rebuild) the entire array, requiring more time, costing performance and shortening the life of the unaffected drives.
  • You can stripe mismatched sets of mirrored drives, so long as the mirrored set is identical, without having the array default to the size of the smallest member. This allows you to grow your array more organically, rather than having to replace every drive, one at a time, resilvering after each change.

Yes, you pay for these gains with less usable space, but platter drives are getting cheaper and cheaper, the trade seems more worth it than ever. Oh and I realize that it wasn't obvious, but I am still using ZFS to manage the array, just not in a RAIDZn configuration.