I had an idea for a system sort of like this to reduce moderator burden. The idea would be for each user to have a score based on their volume and ratio of correct reports to incorrect reports (determined by whether it ultimately resulted in a moderator action) of rule breaking comments/posts. Content is automatically removed if the cumulative scores of people who have reported it is high enough. Moderators can manually adjust the scores of users if needed, and undo community mod actions. More complex rules could be applied as needed for how scores are determined.
To address the possibility that such a system would be abused, I think the best solution would be secrecy. Just don't let anyone know that this is how it works, or that there is a score attached to their account that could be gamed. Pretend it's a new kind of automod or AI bot or something, and have a short time delay between the report that pushes it over the edge and the actual removal.
I guess that's somewhat true if you are sharing an implementation around, but even avoiding the feature being widely known could make a difference. Even if it was known, I think the scoring could work alright on its own. A malicious removal could be quickly reversed manually and all reporters scores zeroed.