Ask HN: Are AIs intentionally weak at debugging their code?

Maybe this is done to encourage software engineers to understand the code that AIs write?

5 points | by amichail 2 days ago

4 comments

not_your_vase 1 day ago
Have you noticed that Microsoft, Google and Apple software are still just as full of bugs as they were 5 years ago, if not more - even though all of them are all-in on AI, pedal to the metal? If LLMs would actually understand the code with human-like intelligence, then it would be a few minutes only to go through all open bug tickets, evaluate them, and to fix all the valid bugs, and reply to the invalid reports.
But to this day the best we have are the (unfortunately useless) volunteer replies on the relevant help forums. (And hundreds of unanswered github bugs per project)
[-]
- superconduct123 1 day ago
  The emperor has no clothes
Pinkthinker 1 day ago
When you think of all the efforts that humans undergo to produce structured pseudo code for their own flawed software, why would we be surprised that an AI would struggle with unstructured text prompts? LLMs will never get there. You need a way to tell the computer exactly what you want, ideally without having to spell out the logic.
[-]
- amichail 1 day ago
  Sometimes the prompt is fine but the code generated has bug(s).
  So you tell the AI about the bug(s) and it tries to fix them and sometimes it fails to do so.
  I don't think LLMs even try to debug their code by running it in the debugger.
  [-]
  - Pinkthinker 1 day ago
    I don’t think you have ever written financial software. To take an example, you are not going to be able with a Chat GPT prompt to ask it to price a specific bond in a mortgage backed security. It’s hard enough to do it using structured pseudocode.
apothegm 1 day ago
Uh, no. OpenAI and Anthropic and Google and co really, really, really DNGAF whether or not you understand the code their LLMs write for you.
LLMs are not capable of reasoning or following code flow. They’re predictors of next tokens. They’re increasingly astonishingly good at predicting next tokens, to the point that they sometimes appear to be reasoning. But they can’t actually reason.
Jeremy1026 1 day ago
If they do a bad job writing it, what makes you think they'd be good at debugging it? If they could debug it, they'd just write it right the first time.