I've noticed many problems that follow a similar pattern. Here is an example:
lichess.org/training/nEsBn
There is an obvious candidate move (in this case Nf6), but it's not at all obvious that it's winning. In the natural "human" line, it only becomes clearly winning after quite a few more moves. But in the "computer" line where black just gives up the queen, it's obviously winning after the next move.
Since these problems only have one obvious candidate move, many players end up playing it by default and the problem gets an unnaturally low rating, whereas if they had to find all the moves againt the more natural defense, most would fail and it would be rated much higher.
My suggestion would be to do a form of A-B testing by varying the line that's proposed when there are two defenses of similar strength. If players have more trouble with one than the other, than that defense would become the default line for the problem and it would end up rated at its actual human difficulty.
I have no idea how easy or hard this would be to implement, but I thought I'd suggest it anyway.
lichess.org/training/nEsBn
There is an obvious candidate move (in this case Nf6), but it's not at all obvious that it's winning. In the natural "human" line, it only becomes clearly winning after quite a few more moves. But in the "computer" line where black just gives up the queen, it's obviously winning after the next move.
Since these problems only have one obvious candidate move, many players end up playing it by default and the problem gets an unnaturally low rating, whereas if they had to find all the moves againt the more natural defense, most would fail and it would be rated much higher.
My suggestion would be to do a form of A-B testing by varying the line that's proposed when there are two defenses of similar strength. If players have more trouble with one than the other, than that defense would become the default line for the problem and it would end up rated at its actual human difficulty.
I have no idea how easy or hard this would be to implement, but I thought I'd suggest it anyway.