this post was submitted on 30 Sep 2025
963 points (98.6% liked)

Technology

75634 readers
3322 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

"No Duh," say senior developers everywhere.

The article explains that vibe code often is close, but not quite, functional, requiring developers to go in and find where the problems are - resulting in a net slowdown of development rather than productivity gains.

you are viewing a single comment's thread
view the rest of the comments
[–] sugar_in_your_tea@sh.itjust.works 17 points 23 hours ago (4 children)

I personally think unit tests are the worst application of AI. Tests are there to ensure the code is correct, so ideally the dev would write the tests to verify that the AI-generated code is correct.

I personally don't use AI to write code, since writing code is the easiest and quickest part of my job. I instead use it to generate examples of using a new library, give me comparisons of different options, etc, and then I write the code after that. Basically, I use it as a replacement for a search engine/blog posts.

[–] Baguette@lemmy.blahaj.zone 2 points 22 hours ago* (last edited 21 hours ago) (2 children)

To preface I don't actually use ai for anything at my job, which might be a bad metric but my workflow is 10x slower if i even try using ai

That said, I want AI to be able to do unit tests in the sense that I can write some starting ones, then it be able to infer what branches aren't covered and help me fill the rest.

Obviously it's not smart enough, and honestly I highly doubt it will ever be because that's the nature of llm, but my peeve with unit test is that testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw. It's not hard, just tedious. Branching coverage is already enforced, so you should know when you forgot to test a case.

Edit: my vision would be an interactive version rather than my company's current, where it just generates whatever it wants instantly. I'd want something to prompt me saying this branch is not covered, and then tell me how it will try to cover it. It eliminates the tedious work but still lets the dev know what they're doing.

I also think you should treat ai code as a pull request and actually review what it writes. My coworkers that do use it don't really proofread, so it ends up having some bad practices and code smells.

[–] sugar_in_your_tea@sh.itjust.works 1 points 19 hours ago (1 children)

testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw

That's what parameterization is for. In unit tests, most dependencies should be mocked, so expecting a dependency to throw shouldn't really be a thing much of the time.

I’d want something to prompt me saying this branch is not covered, and then tell me how it will try to cover it

You can get the first half with coverage tools. The second half should be fairly straightforward, assuming you wrote the code. If a branch is hard to hit (i.e. it happens if an OS or library function fails), either mock that part or don't bother with the test. I ask my team to hit 70-80% code coverage because that last 20-30% tends to be extreme corner cases that are hard to hit.

My coworkers that do use it don’t really proofread, so it ends up having some bad practices and code smells.

And this is the problem. Reviewers only know so much about the overall context and often do a surface level review unless you're touching something super important.

We can make conventions all we want, but people will be lazy and submit crap, especially when deadlines are close. >

[–] Baguette@lemmy.blahaj.zone 1 points 17 hours ago (1 children)

The issue with my org is the push to be ci/cd means 90% line and branch coverage, which ends up being you spend just as much time writing tests as actually developing the feature, which already is on an accelerated schedule because my org has made promises that end up becoming ridiculous deadlines, like a 2 month project becoming a 1 month deadline

Mocking is easy, almost everything in my team's codebase is designed to be mockable. The only stuff I can think of that isn't mocked are usually just clocks, which you could mock but I actually like using fixed clocks for unit testing most of the time. But mocking is also tedious. Lots of mocks end up being:

  1. Change the test constant expected. Which usually ends up being almost the same input just with one changed field.
  2. Change the response answer from the mock
  3. Given the response, expect the result to be x or some exception y

Chances are, if you wrote it you should already know what branches are there. It's just translating that to actual unit tests that's a pain. Branching logic should be easy to read as well. If I read a nested if statement chances are there's something that can be redesigned better.

I also think that 90% of actual testing should be done through integ tests. Unit tests to me helps to validate what you expect to happen, but expectations don't necessarily equate to real dependencies and inputs. But that's a preference, mostly because our design philosophy revolves around dependency injection.

[–] sugar_in_your_tea@sh.itjust.works 1 points 16 hours ago* (last edited 16 hours ago)

I also think that 90% of actual testing should be done through integ tests

I think both are essential, and they test different things. Unit tests verify that individual pieces do what you expect, whereas integration tests verify that those pieces are connected properly. Unit tests should be written by the devs and help them prove their solution works as intended, and integration tests should be written by QA to prove that user flows work as expected.

Integration test coverage should be measured in terms of features/capabilities, whereas unit tests are measured in terms of branches and lines. My target is 90% for features/capabilities (mostly miss the admin bits that end customers don't use), and 70-80% for branches and lines (skip unlikely errors, simple data passing code like controllers, etc). Getting the last bit of testing for each is nice, but incredibly difficult and low value.

Lots of mocks end up being

I use Python, which allows runtime mocking of existing objects, so most of our mocks are like this:

@patch.object(Object, "method", return_value=value)

Most tests have one or two lines of this above the test function. It's pretty simple and not very repetitive at all. If we need more complex mocks, that's usually a sign we need to refactor the code.

dependency injection

I absolutely hate dependency injection, most of the time. 99% of the time, there are only two implementations of a dependency, the standard one and a mock.

If there's a way to patch things at runtime (e.g. Python's unittest.mock lib), dependency injection becomes a massive waste of time with all the boilerplate.

If there isn't a way to patch things at runtime, I prefer a more functional approach that works off interfaces where dependencies are merely passed as needed as data. That way you avoid the boilerplate and still get the benefits of DI.

That said, dependency injection has its place if a dependency has several implementations. I find that's pretty rare, but maybe its more common in your domain.

[–] MangoCats@feddit.it 1 points 21 hours ago (1 children)

A software tester walks into a bar, he orders a beer.

He orders -1 beers.

He orders 0 beers.

He orders 843909245824 beers.

He orders duck beers.

AI can be trained to do that, but if you are in a not-well-trodden space, you'll want to be defining your own edge cases in addition to whatever AI comes up with.

[–] ganryuu@lemmy.ca 4 points 13 hours ago* (last edited 13 hours ago)

Way I heard this joke, it continues with:

A real customer enters.

He asks where the toilets are.

The bar explodes.

[–] MangoCats@feddit.it 1 points 21 hours ago (3 children)

Ideally, there are requirements before anything, and some TDD types argue that the tests should come before the code as well.

Ideally, the customer is well represented during requirements development - ideally, not by the code developer.

Ideally, the code developer is not the same person that develops the unit tests.

Ideally, someone other than the test developer reviews the tests to assure that the tests do in-fact provide requirements coverage.

Ideally, the modules that come together to make the system function have similarly tight requirements and unit-tests and reviews, and the whole thing runs CI/CD to notify developers of any regressions/bugs within minutes of code check in.

In reality, some portion of that process (often, most of it) is short-cut for one or many reasons. Replacing the missing bits with AI is better than not having them at all.

[–] sugar_in_your_tea@sh.itjust.works 7 points 21 hours ago (2 children)

Ideally, the code developer is not the same person that develops the unit tests.

Why? The developer is exactly the person I want writing the tests.

There should also be integration tests written by a separate QA, but unit tests should 100% be the responsibility of the dev making the change.

Replacing the missing bits with AI is better than not having them at all.

I disagree. A bad test is worse than no test, because it gives you a false sense of security. I can identify missing tests with coverage reports, I can't easily identify bad tests. If I'm working in a codebase with poor coverage, I'll be extra careful to check for any downstream impacts of my change because I know the test suite won't help me. If I'm working in a codebase with poor tests but high coverage, I may assume a test pass indicates that I didn't break anything else.

If a company is going to rely heavily on AI for codegen, I'd expect tests to be manually written and have very high test coverage.

[–] MangoCats@feddit.it 2 points 18 hours ago

but unit tests should 100% be the responsibility of the dev making the change.

True enough

A bad test is worse than no test

Also agree, if your org has trimmed to the point that you're just making tests to say you have tests, with no review as to their efficacy, they will be getting what they deserve soon enough.

If a company is going to rely heavily on AI for anything I'd expect a significant traditional human employee backstop to the AI until it has a track record. Not "buckle up, we're gonna try somethin'" track record, more like two or three full business cycles before starting to divest of the human capital that built the business to where it is today. Though, if your business is on the ropes and likely to tank anyway.... why not try something new?

Was a story about IBM letting thousands of workers go, replacing them with AI... then hiring even more workers in other areas with the money saved from the AI retooling. Apparently they let a bunch of HR and other admin staff go and beefed up on sales and product development. There are some jobs that you want more predictable algorithms in than potentially biased people, and HR seems like an area that could have a lot of that.

[–] Nalivai@lemmy.world 1 points 20 hours ago (2 children)

Why? The developer is exactly the person I want writing the tests.

It's better if it's a different developer, so they don't know the nuances of your implementation and test functionality only, avoids some mistakes. You're correct on all the other points.

[–] sugar_in_your_tea@sh.itjust.works 2 points 19 hours ago (1 children)

I really disagree here. If someone else is writing your unit tests, that means one of the following is true:

  • the tests are written after the code is merged - there will be gaps, and the second dev will be lazy in writing those tests
  • the tests are written before the code is worked on (TDD) - everything would take twice as long because each dev essentially needs to write the code again, and there's no way you're going to consistently cover everything the first time

Devs should write their tests, and reviewers should ensure the tests do a good job covering the logic. At the end of the day, the dev is responsible for the correctness of their code, so this makes the most sense to me.

[–] Nalivai@lemmy.world 1 points 5 hours ago

the tests are written after the code is merged - there will be gaps, and the second dev will be lazy in writing those tests

I don't really see how this follows. Why do the second one necessary have to be lazy, and what stops the first one from being lazy as well.
The reason I like it to be different people is so there are two sets of eyes looking at the same problem without the need for doing a job twice. If you miss something while implementing, it's easier for you to miss it during test writing. It's very hard to switch to testing the concept and not the specific implementation, but if you weren't the one implementing it, you're not "married" to the code and it's easier for you to spot the gaps.

[–] MangoCats@feddit.it 1 points 17 hours ago (1 children)

I'm mixed on unit tests - there are some things the developer will know (white box) about edge cases etc. that others likely wouldn't, and they should definitely have input on those tests. On the other hand, independence of review is a very important aspect of "harnessing the power of the team." If you've got one guy who gathers the requirements, implements the code, writes the tests, and declares the requirements fulfilled, that better be one outstandingly brilliant guy with all the time on his hands he needs to do the jobs right. If you're trying to leverage the talents of 20 people to make a better product, having them all be solo-virtuoso actors working independently alongside each other is more likely to create conflict, chaos, duplication, and massive holes of missed opportunities and unforeseen problems in the project.

[–] Nalivai@lemmy.world 1 points 4 hours ago

independence of review is a very important aspect of “harnessing the power of the team.”

Yep, that's basically my rationale

[–] Nalivai@lemmy.world 3 points 20 hours ago (1 children)

Replacing the missing bits with AI is better than not having them at all.

Nah, bullshit tests that pretend to be tests but are essentially "if true == true then pass" is significantly worse than no test at all.

[–] MangoCats@feddit.it -2 points 17 hours ago (1 children)

bullshit tests that pretend to be tests but are essentially “if true == true then pass” is significantly worse than no test at all.

Sure. But, unsupervised developers who: write the code, write their own tests, change companies every 18 months, are even more likely to pull BS like that than AI is.

You can actually get some test validity oversight out of AI review of the requirements and tests, not perfect, but better than self-supervised new hires.

[–] Nalivai@lemmy.world 1 points 4 hours ago (1 children)

You can actually get some test validity oversight out of AI review

You also will get some bullshit out of it. If you're in a situation when you can't trust your developers because they're changing companies every 18 months, and you can't even supervise your untrustworthy developers, then you sure as shit can't trust whatever LLM will generate you. At least your flock of developers will bullshit you predictably to save time and energy, with LLM you have zero ideas where lies will come from, and those will be inventive lies.

[–] MangoCats@feddit.it 1 points 2 hours ago

I work in a "tight" industry where we check ALL our code. By contrast, a lot of places I have visited - including some you would think are fairly important like medical office management and gas pump card reader software makers - are not tight, not tight at all. It's a matter of moving the needle, improving a bad situation. You'll never achieve "perfect" on any dynamic non-trivial system, but if you can move closer to it for little or no cost?

Of course, when I interviewed with that office management software company, they turned me down - probably because they like their culture the way it is and they were afraid I'd change things with my history of working places for at least 2.5 years, sometimes up to 12, and making sure the code is right before it ships instead of giving their sales reps that "hands on, oooh I see why you don't like that, I'll have our people fix that right away - just for you" support culture.

[–] themaninblack@lemmy.world -3 points 21 hours ago

Saved this comment. No notes.

[–] FishFace@lemmy.world 1 points 21 hours ago (2 children)

The reason tests are a good candidate is that there is a lot of boilerplate and no complicated business logic. It can be quite a time saver. You probably know some untested code in some project - you could get an llm to write some tests that would at least poke some key code paths, which is better than nothing. If the tests are wrong, it's barely worse than having no tests.

[–] theolodis@feddit.org 5 points 21 hours ago (2 children)

Wrong tests will make you feel safe. And in the worst case, the next developer that is going to port the code will think that somebody wrote those tests with intention, and potentially create broken code to make the test green.

[–] FishFace@lemmy.world 1 points 11 hours ago

Then write comments in the tests that say they haven't been checked.

That is indeed the absolute worst case though, and most of the tests that are so produced will be giving value because checking a test is easier than checking the code (this is kind of the point of tests) and so most will be correct.

The risk of regressions covered by the good tests is higher than someone writing code to the rare bad test that you've marked as suspicious because you (for whatever reason) are not confident in your ability to check it.

Exactly! I've seen plenty of tests where the test code was confidently wrong and it was obvious the dev just copied the output into the assertion instead of asserting what they expect the output to be. In fact, when I joined my current org, most of the tests were snapshot tests, which automated that process. I've pushed to replace them such with better tests, and we caught bugs in the process.

[–] sugar_in_your_tea@sh.itjust.works 2 points 20 hours ago (1 children)

better than nothing

I disagree. I'd much rather have a lower coverage with high quality tests than high coverage with dubious tests.

If your tests are repetitive, you're probably writing your tests wrong, or at least focusing on the wrong logic to test. Unit tests should prove the correctness of business logic and calculations. If there's no significant business logic, there's little priority for writing a test.

[–] FishFace@lemmy.world 1 points 11 hours ago (1 children)

The actual risk of those tests being wrong is low because you're checking them.

If your tests aren't repetitive they've got no setup or mocking in so they don't test very much.

[–] sugar_in_your_tea@sh.itjust.works 1 points 5 hours ago (1 children)

If your test code is repetitive, you're not following DRY sufficiently, or the code under test is overly complicated. We'll generally have a single mock or setup code for several tests, some of which are parameterized. For example, in Python:

@parameterized.expand([
  (key, value, Expected Exception,),
  (other_key, other_value, OtherExpectedException,), 
])
def test_exceptions(self, key, value, exception_class):
    obj = setup()
    setattr(obj, key, value) 

    with self.assertRaises(exception_class): 
        func_to_test(obj)

Mocks are similarly simple:

@unittest.mock.patch.object(Class, "method", return_value=...)

dynamic_mock =  MagicMock(Class)
dynamic_mock...

How this looks will vary in practice, but the idea is to design code such that usage is simple. If you're writing complex mocks frequently, there's probably room for a refactor.

[–] FishFace@lemmy.world 1 points 4 hours ago

I know how to use parametrised tests, but thanks.

Tests are still much more repetitive than application code. If you're testing a wrapper around some API, each test may need you to mock a different underlying API call. (Mocking all of them at once would hide things). Each mock is different, so you can't just extract it somewhere; but it is still repetitive.

If you need three tests each of which require a (real or mock) user, a certain directory structure to be present somewhere, input data to be got from somewhere, that's three things that, even if you streamline them, need to be done in each test. I have been involved in a project where we originally followed the principle of, "if you need a user object in more than one test, put it in setUp or in a shared fixture" and the result is rapid unwieldy shared setup between tests - and if ever you should want to change one of those tests, you'd better hope you only need to add to it, not to change what's already there, otherwise you break all the other tests.

For this reason, zealous application of DRY is not a good idea with tests, and so they are a bit repetitive. That is an acceptable trade-off, but also a place where an LLM can save you some time.

If you’re writing complex mocks frequently, there’s probably room for a refactor.

Ah, the end of all coding discussions, "if this is a problem for you, your code sucks." I mean, you're not wrong, because all code sucks.

LLMs are like the junior dev. You have to review their output because they might have screwed up in some stupid way, but that doesn't mean they're not worth having.

[–] Draces@lemmy.world 0 points 19 hours ago (1 children)

What model are you using? I've had such a radically different experience but I've only bothered with the latest models. The old ones weren't even worth trying with

[–] sugar_in_your_tea@sh.itjust.works 2 points 19 hours ago (1 children)

I'll have to check, we have a few models hosted at our company and I forget the exact versions and whatnot. They're relatively recent, but not the highest end since we need to host them locally.

But the issue here isn't directly related to which model it is, but to the way LLMs work. They cannot reason, they can only give believable output. If the goal is code coverage, it'll get coverage, but not necessarily be well designed.

If both the logic and the tests are automated, humans will be lazy and miss stuff. If only the logic is generated, humans can treat the code as a black box and write good tests that way. Humans will be lazy with whatever is automated, so if I have to pick one to be hand written, it'll be the code that ensures the logic is correct.

[–] wesley@yall.theatl.social 0 points 18 hours ago (1 children)

We're mandated to use it at my work. For unit tests it can really go wild and it'll write thousands of lines of tests to cover a single file/class for instance whereas a developer would probably only write a fourth as much. You have to be specific to get any decent output from them like "write a test for this function and use inputs x and y and the expected output is z"

Personally I like writing tests too and I think through what test cases I need based on what the code is supposed to do. Maybe if there are annoying mocks that I need to create I'll let the AI do that part or something.

Generating tests like that would take longer than writing the tests myself...

Nobody is going to thoroughly review thousands of lines of test code.