AIVibe CodingCursorLLMsDeveloper Tools

Vibe Coding: How AI Agents Are Changing the Way We Build Software

18 November 2025·10 min read·Harshit Gupta

TL;DR

Vibe coding — describing what you want in plain language and letting an AI write the code — is genuinely transforming how developers build software in 2025. This post covers what vibe coding actually is, the tools leading the charge (Cursor, Claude, Copilot), real productivity numbers, the hard limits you'll hit, and when you should and shouldn't lean on it.

What Is "Vibe Coding"?

The term was coined by Andrej Karpathy in early 2025: "There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." It describes a workflow where you describe intent in natural language — "add pagination to this API endpoint", "refactor this to use async/await", "write a Redis cache layer for this function" — and let a large language model generate the implementation.

It sounds like a gimmick. It isn't. Within six months of Karpathy's post, every major developer tool had pivoted toward this paradigm. And after building production features this way for the better part of a year, I have a nuanced view of exactly when it works brilliantly, when it fails silently, and what skills you actually need to use it well.

Important framing

Vibe coding doesn't eliminate the need for engineering skills — it amplifies them. The developers getting the most out of these tools are experienced engineers who can read, evaluate, and refine AI-generated code instantly. Beginners who "vibe code" without understanding what's generated are building on sand.

The Tools Defining Vibe Coding in 2025

Cursor — The IDE That Changed Everything

Cursor is a VS Code fork with AI baked into every layer. Its "Composer" mode lets you describe multi-file changes and watch them execute across your codebase. In my experience, it's the highest-leverage tool in the stack:

Tab completion that completes entire functions, not just lines — it predicts your next logical move
Cmd+K inline edits — select a block of code, describe the change, get it instantly
Composer for cross-file refactors — "add rate limiting middleware to all Flask routes" actually works
@codebase context — reference any file, function, or doc inline while prompting

The killer feature is that Cursor understands your codebase. After indexing your project, it knows your patterns, your naming conventions, your architecture. The suggestions feel like a senior dev who's been reading your code for weeks.

Claude (Anthropic) — The Best Reasoning Engine

For architectural decisions, complex refactors, and debugging subtle bugs, Claude 3.5 Sonnet and the newer Claude 3.7 with extended thinking are unmatched. Where GPT-4o excels at fast completions, Claude excels at careful, structured reasoning. I use Claude for:

Designing system architecture before writing a line of code
Debugging race conditions and async issues where the reasoning chain matters
Writing comprehensive test suites — Claude writes tests that actually test edge cases
Code review — paste a PR diff and ask "what am I missing?"

GitHub Copilot — The Always-On Pair Programmer

Copilot has evolved from a smart autocomplete into a full agent. Copilot Workspace can take a GitHub issue and generate a full pull request — tests, implementation, docs included. It's deeply integrated into the GitHub ecosystem, which makes it compelling for teams already on that stack.

Real Productivity Numbers

I tracked my development velocity for three months across two projects — one using traditional workflow, one using a full vibe coding stack (Cursor + Claude). The results were significant but not magical:

Boilerplate & CRUD operations: 5–8x faster. Generating a full REST endpoint with validation, error handling, and tests that I'd have written in 45 minutes now takes 8–10 minutes.
New feature development: 2–3x faster. The AI handles the implementation; I focus on the architecture and edge cases it misses.
Debugging complex issues: ~1.2x faster (sometimes slower). The AI is confidently wrong as often as it's right here. It generates plausible-looking fixes that don't solve the root cause.
Infrastructure and DevOps work: 3–4x faster. Writing Dockerfiles, CI configs, and Terraform templates is where AI shines consistently.

The 80/20 insight

AI is extraordinarily good at writing the 80% of code that is standard, predictable, and well-documented. It struggles with the 20% that requires deep domain knowledge, understanding non-obvious constraints, or reasoning about failure modes. Senior engineers benefit most because they instantly recognize which category they're in.

Where Vibe Coding Falls Apart

I've seen — and made — every mistake. Here are the hard limits:

1. Hallucinated APIs and Libraries

LLMs confidently generate code using library methods that don't exist. In Python especially, they'll invent plausible-sounding function signatures. Always run the code. Don't trust it until tests pass and you've verified the actual API docs for anything non-trivial.

2. Security Blind Spots

AI-generated code has a concerning pattern of missing security-critical details: SQL injection through unsanitized inputs, missing authentication checks, overly permissive CORS configs, exposed secrets in error messages. Never merge AI-generated code that touches auth, payments, or data access without a careful security review.

3. Context Window Limits Create Inconsistency

In large codebases, the AI doesn't "know" what it generated three sessions ago. It might generate a utility function that already exists, use a different naming convention than your codebase, or introduce an approach that contradicts your architecture. You need to be the consistency layer.

4. Over-Engineering by Default

LLMs tend toward comprehensive solutions. Ask for a simple feature and you'll often get an enterprise-grade implementation with abstractions you don't need. Learn to prompt for simplicity explicitly: "write the simplest possible implementation", "no need for configuration options", "don't add features I didn't ask for".

How to Vibe Code Well: Practical Principles

# Bad prompt (too vague)
"add caching to my app"

# Good prompt (specific, contextual)
"add Redis caching to the get_user_profile() function in services/user.py.
Cache key should be user:{user_id}. TTL 300 seconds.
Use the existing redis_client from utils/cache.py.
Don't add any new dependencies."

The quality of AI output is almost entirely determined by the quality of your prompt. Specific, contextual prompts that reference your existing patterns produce dramatically better results than vague requests.

Other principles I've internalized:

Read everything it generates. You're accountable for every line of code in the codebase, regardless of who (or what) wrote it.
Test immediately. AI code tends to look right but fail on edge cases. Run it, break it, verify it.
Iterate in small steps. Don't ask for a complete feature in one shot. Build it incrementally, verifying at each step.
Use AI for the boring parts. Preserve your creative energy for architecture, design decisions, and the genuinely hard problems.

Is This the End of Traditional Coding?

No — but it's a genuine shift in what "coding" means. The developers who will thrive are those who can think in systems, evaluate code quality rapidly, write precise specifications, and direct AI tools toward correct solutions. The rote work of writing boilerplate is largely automated. The hard work of thinking clearly about what to build is not.

Vibe coding is the most significant productivity shift I've experienced in my career. It's not a replacement for engineering judgment — it's a multiplier for it.

Key Takeaways

Vibe coding is a real paradigm shift, not a gimmick — productivity gains are measurable and significant
Cursor, Claude, and Copilot are the leading tools; each has distinct strengths
2–8x speedups on standard work; minimal gain on complex debugging or novel architecture
Hard limits: hallucinated APIs, security gaps, context inconsistency, over-engineering
Specific, contextual prompts produce dramatically better output than vague requests
Senior engineers benefit most — AI amplifies good judgment, it doesn't replace it

Back to All Posts

Written by Harshit Gupta