I just solved a puzzle (61137). When I switched on analysis mode, Stockfish (at depth 21) recommended 18...Qxd4 which is wrong. When I executed this move, Stockfish correctly showed a much reduced evaluation for black. I'm a new user, so perhaps I'm misunderstanding something about the UI. In case that it matters, I'm using Chromium on Ubuntu 18.04. Let me know if you want a screenshot or additional information.

Stockfish shows an evaluation of +1.3 in favor of white at depth 25 before 18...Qxd4. After 18...Qxd4 the evaluation is +4.8 in favor of white at depth 25. This is clearly a blunder for black and why it was made into a puzzle in the first place. Stockfish recommends 18...Be7 at depth 25.

The question is then: why does it take depth 25 to see that it’s a bad move before 18...Qxd4 but after 18...Qxd4 it’s immediately clear? And if it really takes 25 half moves to see that 18...Qxd4 is bad, doesn’t that mean that the puzzle is really hard? I didn’t think it was hard, but perhaps I’m missing something.

I don't understand what you're talking about:
Before 18…Qxd4, there's many moves that are good: Stockfish recommends…Ke7,… Ke7,… Bxe5 (exd5 Nd5) and so on. These moves range in less than 50 centipawns. Stockfish IMMEDIATELY sees it, even at depth 15