The LLM Gauntlet: When a 'Pro' Model Fails a Simple Task

I thought the task was simple: push a project to GitHub. Debug any errors. Standard work.

My Vertex AI credits felt like a winning lottery ticket. I had Gemini 2.5 Pro, a model with ‘Pro’ in its name, ready to go. I was initially pleased. The feeling did not last.

The Pro model failed. It couldn’t resolve basic git errors. It got stuck, suggesting fixes that led nowhere, unable to debug its own broken logic. The process was a loop of failure.

Out of frustration, I switched to the Gemini 3.0 Flash preview. The change was immediate. Flash diagnosed the workflow issues, corrected the path, and pushed the code in minutes. The ‘preview’ model accomplished what the ‘pro’ model could not.

The experience was a lesson. A model’s capability isn’t just about its stated power. It’s about resilience, the ability to navigate the messy reality of a development environment and self-correct.

To bridge that gap for less capable models, I’ve established a set of global rules for my projects—a .cursorrules file. It acts as a guardrail, an instruction manual to keep the AI on track. My hope is that it gives even the more stubborn models a fighting chance.

The Vertex AI credits were a false dawn, but the exercise was valuable. It clarified the real challenge in AI collaboration: not just raw intelligence, but the gritty, practical ability to recover from a mistake. Until all models learn that, a good set of rules is the best co-pilot you can ask for.