Vibe-Coding on BinHong Lee's Blog

Shifting from Code Writing to Reviewing

binhong@binhong.me (BinHong Lee) — Thu, 02 Apr 2026 00:00:00 -0800

When I built GlobeTrotte last year, vibe-coding had just begun gaining traction but I was the “weird guy” who coded everything by hand. I spent a little over 4 weeks building the backend service (Go), iOS app (SwiftUI), Android app (Jetpack Compose), and web (TanStack) all by myself. I like to think that’s pretty fast for an app of that level of complexity. The tradeoff here however, was that I have almost no time to “handle the business side”.

I wrote about my early skepticism on AI coding back in July last year. TLDR; I tried getting Windsurf’s SWE-1 to refactor navigation on the Android app but ended up watching it spiral into compiler errors after compiler errors as it tried to “code its way out” by piling on even more code. At the time, I concluded that vibe-coding wasn’t ready but one thing I missed was how much the model mattered.

The inflection point #

Building slogx and Git Navigator was almost an entirely opposite experience. slogx was started because I was curious to explore the capabilities of Google’s AI Studio. It definitely impressed me in terms of building good looking UI but its lack of support for anything else was a bottleneck that quickly prompted me to move away (especially to build the SDKs). Git Navigator was started because I wanted to see how well Google’s Antigravity and Claude Sonnet perform. For pretty much an entire month, all my vibe-coding prompts were basically “the graph rendered wrongly in this / that case”. (I probably should’ve intervened earlier but I was lazy and curious if it can eventually figure it out.) Neither could, but Claude Opus (4.5) managed to build an actual working version. I redid the algo anyway eventually with a much simpler / straightforward idea (just DFS lol), but the fact that it worked when others didn’t, makes this an important inflection point to me.

Dev Velocity #

From here on out, most of the code for both slogx and Git Navigator was written by an LLM. I do review every single line of them but that doesn’t always say much especially for areas which I’m not already an expert of. It definitely feels like it’d be faster than if I were to build it myself but I haven’t done actual testing to know for sure. My suspicion is that it’s faster in the first few days, maybe the first week. One month in, I can’t tell if I’m meaningfully ahead of where I’d be otherwise. The LoC isn’t dramatically higher than GlobeTrotte, but because I didn’t write this code, I’m significantly less familiar with my own codebase. That said, I can definitely see the development cycle being significantly faster if I don’t actually do line-by-line code review and instead stick to just vibe-reviewing(?) it whenever I see something kinda sus.

Context switching tax #

Another important shift I noticed is the nature of work itself. Before, you’d be “wired in” while designing then coding where you would do deep focus work while holding the entire problem in your head. Now though, the prompting-and-waiting breaks the chunk where you either scroll social media while waiting or review and prompt a separate session in the meantime. However, this means that you’re constantly context switching between 2 or more project / features and you’re less of a maker but more of a manager making sure your “subordinates” complete their deliverables.

Visualizing the product for LLM #

One of the trickier things that took me too long to figure out was visualization. Like how humans need to look at the end result (print debugging, hot reload etc.) to easily self-correct and understand what went wrong, LLMs need something like this too. If you’re building a webapp, some providers have a Chrome extension or a browser built-in for this purpose. Since I’m building a VSCode extension, I actually didn’t find much tooling around it (ironically lol). When I first made all the different coding agents build the graphing UI for Git Navigator, I spent a lot of time screenshotting and explaining what’s right / wrong with it.

Eventually, I figured out to make it write a script that it can run against a given folder / repo to see how the graph is sorted. Since the rendering code would order the commits (in TypeScript objects) before passing it to be rendered as UI, the script essentially just dumps the order output for easy viewing by the LLM. This cuts me out from needing to repeatedly screenshot it as LLMs can run the script, then read the output and identify if things are ordered correctly or if there’s anything that needs fixing. I think this is probably one of my biggest lessons here, this is like an observability tool except your target audience are LLMs instead of human developers.

Catching shortcuts #

There’s been a few times where I caught the model taking shortcuts that technically worked but were obviously wrong. One time, I asked for a complex git operation (some rebase gymnastic) and Opus 4.5 just used the TypeScript backend to create a temporary bash script and run that, instead of running it programmatically through the TypeScript backend. It worked fine locally on my machine (and any Unix based system I think) but it would break VS Code Remote scenarios where files aren’t local, or on Windows because path separators are wrong. It’s the kind of solution that passes every test you thought to write yet falls apart the moment a real user touches it. This is where review actually matters since LLMs (largely) optimize for “does it work right now” rather than “is this the right solution”. If you’re not catching these, you’re accumulating tech debt at LLM speed.

Catching product gaps #

There are also times where the feature exists and works, but the placement or interaction is off in a “product sense” kinda way. One example I had was asking Codex to implement line and hunk staging for Git Navigator. We had some good planning discussions about the tri-state checkboxes, “include/exclude” wording instead of git jargon, inline diff expansion but, it shipped the entire picker in the side panel. This makes for a weird UX ergonomics because users are looking at the uncommitted changes block when they’re deciding what to commit. So burying the granular staging controls in a separate panel means most people likely wouldn’t even notice that it exists. I had to ask for it to be moved into the uncommitted changes block itself. Here’s another twist, when it did, the LLM’s first instinct was to build a new diff renderer from scratch instead of reusing the existing one used for conflict resolution. I think this is probably the bull case for Product Managers being the beneficiaries of vibe-coding considering that this is likely their strong suit.

Delete everything and try again #

Fortunately, I haven’t had to rely on this strategy too much but it does happen. I think when building one of the stack features (on Git Navigator), Codex completely misunderstood what I meant and kinda just went off the rails despite being given clear specs to follow. I discarded all the changes (thanks git), started another new session, gave it the same spec, then just let it try again. Funnily enough, it worked perfectly on the second run, only needed some minor tweaks before the feature is ready to be committed. I haven’t had to resort to this too much (and I don’t keep good enough git hygiene to always rely on it 🫣) but it’s definitely something to keep in mind if things go weirdly wrong.

Wrap up #

I don’t think I’m going back to handwriting all the code (for now). Not because I’m convinced it’s faster, but because it feels faster. I think it also comes with a level of detachment that makes me feel more comfortable for being ruthless with code I didn’t write while nitpicking for the ideal product experience that I want. It makes it easier to throw things away, cherry-pick only what’s good, and not get precious about any of it. Whether that’s a healthier relationship with code or just a different kind of laziness, you tell me 🫠.

Early takes on vibe-coding

binhong@binhong.me (BinHong Lee) — Thu, 03 Jul 2025 00:00:00 -0800

I keep hearing about vibe-coding and I’ve always written the majority of code myself. While at Meta, I got a chance to try out CodeCompose. It worked really well as an autocomplete but when it tried to do anything more than 5 lines at a time, it would - on many occasions - commit bugs that aren’t immediately obvious at first sight. Generally, I’ve caught them by looking at the generated code and wondering “huh this isn’t how I’d do this, why?”. That said, it definitely helped me code and ship faster especially on mundane tasks. Vibe-coding though, seems like taking it to a whole new level (using even less supervision and care on the code being committed).

Perfect for small, isolated problems #

I started my attempt by making Claude code out a GitHub Action workflow file. I have a submodule setup (where a repo is shared and imported across multiple other repos) and wanted to have an automated way to tell how its changes will affect code on other repos while also creating PRs to keep them updated. Seems like a perfectly fine isolated problem to try this out on. I did run out of tokens a few times (being on a free plan) so I had to get creative but it largely worked. I’d say it behaved like a normal engineer writing a first version (which isn’t perfect) but can understand and work its way through debugging and resolving the issue slowly when given clear information on what went wrong.

Not for complex changes in an intern-size project #

Note: Using the phrase “intern-size” here because back then, there was a weird rumor that interns were expected to ship 10k LoC as part of their internship to get return offers in FAANG lol. I don’t think it was ever true but definitely a standard people worked towards.

Now that I’ve got it working on an isolated problem, I wanted to see how it might handle a complex change in a pre-existing project. I have an Android app codebase (for GlobeTrotte) with around 8k+ LoC so I decided to try it on there (using SWE-1 from Windsurf). This is the instruction I provided (admittedly a complex one):

add new navhost to edittripactivity and make each of edit day and edit place a separate screen instead (so it push-and-pop for each small edit)

PS: edittripactivity is a file name (technically EditTripAcitivity.kt but I think the LLM understood it), navhost is a concept of how screen navigation works in Jetpack Compose.

The LLM took 20+ minutes before running out of time which required me to make a continue call not just once but twice before telling me it was done. It’s all chaos from here on out. It tells me that there are a bunch of errors so it tries to write more code (?) leading to more errors, so more code, then more errors etc. At some point, I mentioned that there were 88 errors and it figured to try compiling and reading the compiler error (instead of looking for them itself) but that barely cut down the number of errors. I just kept telling it that there were more errors and it just kept trying to code itself out of the mess by adding more code and thus more errors. I eventually gave up and ran git checkout . to clean everything up.

Losing track of signatures #

At some point, it started making up stuff that either existed with a different name, or something that it thought should exist but didn’t (or it forgot to add the implementation for it, I can’t tell). The first example is that it keeps calling PlaceItem() even though there’s no object with that name (and all the please fix error prompts never saw it touching them). There is however, an object called Place() which I’m assuming is what it was referring to. The second example is where it called updateDay(delete = true) despite the fact that updateDay() has a bunch of other required params while it also doesn’t have delete as a param. I can only assume that it just inferred the functionality of the function without actually understanding if it worked as intended.

Ask clarifying questions #

The prompt I provided is a bit vague to be honest. It’s asking to make a UX change without actually providing any design example but rather just describing it with words as if the other person would easily understand it. The LLM went to work immediately with that prompt without asking for more clarifying questions like how the screens get triggered, how the layout should work, how the UI should look etc. I think if LLMs can learn to ask clarifying questions, it can be invaluable for situations like this where the ask might be a little too vague to work off of.

Phenomenal auto-complete machine #

I’d be remissed if I didn’t mention the auto-complete capabilities of AI coding assistants. In short, they are consistently phenomenal especially when it comes to boilerplate code needing minor tweaks here and there. The AI would make the necessary tweaks automatically making it a breeze when going through the more mind-numbing part of the code base. This is a consistent experience both when I was at Meta (using CodeCompose) and now using Windsurf for my personal project.

Is it a mid-level engineer yet? #

Short answer, no. Long answer, it depends. In terms of raw coding ability in an isolated environment, I think it’s meeting the mid-level engineer mark just fine (maybe even better due to its breadth generally uncommon among “humans” lol) but it’s the everything else part that’s an issue. For starters, I expect a mid-level engineer to ask for help instead of mindlessly trying to commit code (or send out PRs) over and over again that isn’t compiling. I also don’t (usually) have to nudge them that their code isn’t compiling or failing tests. They can see it themselves and would go work on debugging and fixing them proactively. This is on top of all the issues mentioned above when working in a not-even-that-large of a codebase.

For now though, it seems like it’s still not good enough to take over even just the coding part of my job so I guess I’m going back to implementing the new navhost for my Android app by myself.