They are not even trusting it themselves. This is from the release notes
I'll not instantly switch ntfy.sh over. Instead, I'm kindly asking the community to test the Postgres support and report back to me if things are working
Fuck that.
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
They are not even trusting it themselves. This is from the release notes
I'll not instantly switch ntfy.sh over. Instead, I'm kindly asking the community to test the Postgres support and report back to me if things are working
Fuck that.
Classic "test in production" strategy, very solid!
Test in production is the best. We spent months warning from data bugs and nobody bat an eye (upstream bug, not our responsibility but we noticed) When it was d launched in prod we just pointed out the bug that nobody fixed was still there and immediately a war room was formed and the bug fixed within an hour.
It honestly seems more efficient to let shit hit the fan than to fight everybody to do their job.
You're implying a shitty capitalist company that nobody cares for if it burns down. A tool like this though that is self-hosted by a lot of people (29.1k stars on GH!) and that is internet-facing is very different.
Then, let's just call it "massive decentralized surprise testing"
It looks like that tool is more or less built by a single developer (you already trust their judgment anyways!), and even though the code came through in a single PR it was a merge from a branch that had 79 separate commits: https://github.com/binwiederhier/ntfy/pull/1619
Also glancing through it a bit, huge portions of that are straightforward refactors or even just formatting changes caused by adding a new backend option.
I'm not going to say it's fine, but they didn't just throw Claude at a problem and let it rewrite 25k lines of code unnecessarily.
@ueiqkkwhuwjw just this quote at the start of the release notes
> 14,997 added lines of code, and 10,202 lines removed, all from one pull request
This is already a major red flag even without the ai stuff right? Can't believe anyone would flaunt that like this.
The "single pull request" is a merge release from 79 separate commits. It's the sum of all work, it doesn't mean all of it was changed in one go.
Uh. I'd really prefer if people experimented with new technology a bit more cautiously and not directly jump to "the biggest release [...] ever done".
Upvote and comment on: https://github.com/binwiederhier/ntfy/issues/1645
They just replied:
What gave you the idea that this was a full rewrite? I moved things around with AI and added postgres support for the queries. Nobody has ever reviewed and tested anything more thoroughly than I did with this branch.
You are twisting what it actually is. You are assuming something that is not true.
This makes me think that they didn't review or test it at all, lmao
This is the biggest release I've ever done on the server. It's 14,997 added lines of code, and 10,202 lines removed
Thanks for the link! As a short aside for the other people here: Try not to spam developers. That usually achieves the opposite and makes them miserable, when we want them to not burn out, and write good software for us. A thumbs-up emoji is the correct reaction for the average person. Or for the pros - a code-review highlighting specific issues within the code.
Yeah, this is now inherently untrustworthy. Better to switch to an alternative.
I'm a developer
I sometimes sometimes use AI for an answer to a complicated problem because normally I'd open up 20 pages , have to go through them all to find the right answer
AI gets me the answer right away, though it likely is completely wrong or at least partially wrong. Either way, it gives me a general direction and with that I only have to search through one or two pages to confirm, so the same process is just a little faster.
I laso have used AI on a couple of occasions to ask it to write code for a complicated problem. Again, you don't copy the code, god no, it's always the worst, and it is in 80% of the cases still at least riddled with bugs, or just complete bullshit. However, it might give me an alternative idea or a direction to take to implement or fix this complicated feature problem.
That's the extent to which I've used AI and for the foreseeable future that won't change because AI still can't code. It's still wildly flailing around and it might produce something that implements a certain functionality, but it's a guarantee that that functionality will have more bugs and security holes than features
I am also a developer and agree entirely.
Asking for advice, examples or the occasional boilerplate is at most how I use AI and certainly not integrated directly into my IDE.
I'm assuming this is some sort of canary message to indicate that the code base has been compromised, the author can't talk about it, and everyone should immediately stop using the service. Surely no-one would be unwise enough to commit this otherwise?
Even ignoring the huge red LLM flag, a 25kLOC delta in a single PR should be cause for instant rejection as there's no way to fully understand or test it, let alone in 2-3 weeks.
25kLOC delta in a single PR should be cause for instant rejection
Not to pick at nits, but it would be VERY different if it was 1k lines added and 24k lines removed. There's something extremely satisfying about removing 10k+ lines of unnecessary code.
Definitely share your initial concern. Without strong review processes to ensure that every line of code follows the intent of the human developer, there’s no way of knowing what exactly is in there and the implications for the human users. And I’m not just talking about bugs.
They say it’s reviewed, but the temptation to blindly trust is there. In this case, developer appears to have taken some care.
The code was written by Cursor and Claude, but reviewed and heavily tested over 2-3 weeks by me. I created comparison documents, went through all queries multiple times and reviewed the logic over and over again. I also did load tests and manual regression tests, which took lots of evenings.
Let us hope so. Handle with care to ensure responsibility is not offloaded to a machine instead of a person.
The size of that changeset means that it’s inherently unreviewable.
The commit history is something I’ve seen only in the PRs that even the most dysfunctional companies would demand a rewrite for.
Also, 2-3 weeks review? PostgreSQL support could be added in that time without the need for a damn „vibe check”. Hell, it would probably take less time than that.
To be fair they would have needed to spend time testing the manual implementation as well.
The problem I see mainly is that even if this rolls out perfectly, the erratic and changing nature if llms still make it pointless as a proof of concept. Next time Claude might fuck up in a fringe way that's not covered by unit tests and is missed by manual tests.
On the other hand I guess I've been guilty myself on numerous occasions to implement fringe bugs into production code, but at least I learn from it.
I made my statement as a BDD/TDD practitioner.
The code goal of software engineering is not to deliver said code, but to deliver it in a framework that lets others—and consequently me in a week’s time—to contribute easily. This makes both future improvements and bug fixes easier.
Dumping a ~25000 lines changeset with a git history that’s almost designed to confuse is antithetical to both engineering and open source.
Definitely time to find an alternative. What the actual fuck is this
Look, if he wanted to introduce AI code, whatever, but doing it all at once in a 14k line change is crazy.
Surely it would be better to introduce AI by letting it handle misc changes here and there instead of starting with the "biggest release ever done" (his words), no?
there is this repo that lists some slopware : https://codeberg.org/small-hack/open-slopware maybe someone can add it
I think there's room for a little bit of nuance that page doesn't do a great job of describing. In my opinion there's a huge difference between volunteer maintainers using AI PR checks as a screening measure to ease their review burden and focusing their actual reviews on PRs that pass the AI checks, and AI-deranged lone developers flooding the code with "AI features" and slopping out 10kloc PRs for no obvious reason.
Just because a project is using AI code reviews or has an AGENTS.md is not necessarily a red flag. A yellow flag, maybe, but the evidence that the Linux Kernel itself is on that list should serve as an example of why you can't just kneejerk anti-AI here. If you know anything about Linus Torvalds you know he has zero tolerance for bad code, and the use of AI is not going to change that despite everyone's fears. If it doesn't work out, Linus will be the first one to throw it under the bus.
we're all so fucked
This doesn't make me uneasy. It makes me resentful, a little angry, and a lot tired. Thanks for bringing it to attention, I will make sure that nothing of that project or from that author will ever cross my ecosystem again.
I just set up a ntfy server for Unified Push earlier this week to use with Matrix. Now I have to turn around and immediately replace it...
If you use ntfy mainly as a Unified Push distributor on Android, then I highly recommend switching to a XMPP client that can do the same.
Fuck, I love ntfy, it's one of the best self hosted push notification systems I've used. It has been flawless so far.
Don't like this.
Oh ffs..
Thanks for the heads-up
I can see the pragmatic appeal. Maintaining a lot of code for an open source project is thankless. Go is designed for idiots like me so it makes sense that an llm should be able to emit code that mostly works. There are classes of errors that are less likely in Go and the compiler and linting will prevent some foot guns and then it would have been tested.
Ethically I hate anything to do with the llm industry and all it represents. I hate the environmental impacts. The social impacts. The disregard for intellectual property. The devaluing of human effort. The scam economics. I won't use anything touched by it on principle and if that means walking away from a dead Internet so be it. There is enough pre-2020s books, audiobooks, movies, music and code to keep me interested for the rest of my life.
ts getting you pinned to 2.17 in the compose file 🥹🤞🥀
I'll embrace the inevitable fork.
That's concerning. If it was "I generated a function with an LLM and reviewed it myself" I'd be much less concerned, but 14k added lines and 10k removed lines is crazy. We already know that LLMs don't generate up to scratch code quality...
I won't use PostgreSQL with ntfy, and keep an eye on it to see if they continue down this path for other parts of ntfy. If so I'll have to switch to another UP provider.
I'm so tired of that.
I'm using it for scripts notifications + unifiedpush. I don't know where to start to find the fitting alternative.