-
Notifications
You must be signed in to change notification settings - Fork 537
Publish: Untitled #3947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Publish: Untitled #3947
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
d16c681
Create articles/testing.mdx via admin
ComputelessComputer 2b3b421
Update articles/testing.mdx via admin
ComputelessComputer 75bfb2c
Update articles/testing.mdx via admin
ComputelessComputer d55490b
Update articles/testing.mdx via admin
goranmoomin bcbd887
Update articles/testing.mdx via admin
goranmoomin 959d605
Update articles/testing.mdx via admin
goranmoomin 7994144
Update articles/testing.mdx via admin
goranmoomin 493ceb8
Update articles/testing.mdx via admin
goranmoomin ffcf897
Update articles/testing.mdx via admin
goranmoomin 9200eee
Update articles/testing.mdx via admin
goranmoomin ed18037
Update articles/testing.mdx via admin
goranmoomin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,206 @@ | ||
| --- | ||
| meta_title: "Untitled" | ||
| author: "John Jeong" | ||
| featured: false | ||
| date: "2026-02-13" | ||
| --- | ||
|
|
||
| # The $5,000 AI Coding Experiment: What 1,000 Devin Tasks Taught Us | ||
|
|
||
| Display title: Is Devin AI Worth It? We Spent $5,000 to Find Out | ||
|
|
||
| Meta description: Real results from spending $5,000 on Devin AI: how we automated migrations, enabled non-technical teams to ship code, and cut maintenance work in half. | ||
|
|
||
| --- | ||
|
|
||
| In the last two months, my two-person team spent over $5,000 running roughly 1,000 tasks in [Devin](https://www.google.com/url?q=https://devin.ai/&sa=D&source=editors&ust=1770998346314312&usg=AOvVaw2pNO7Pfv5Jm5rPPrNyiJWu), the AI software engineer. This isn't vibe coding hype or spinning up 10 Claude Code instances to burn tokens as fast as possible. This is what actually happened when we integrated AI agents into our real-world workflow while building Hyprnote. | ||
|
|
||
| Here's what we learned. | ||
|
|
||
| ## Not a Reader? Watch the Video Instead | ||
|
|
||
| <>[[a]](#cmnt1) | ||
|
|
||
| *Timestamps throughout this post link to specific examples in the video.* | ||
|
|
||
| ## Running AI Agents Where Your Team Already Works | ||
|
|
||
| The single most powerful decision we made was running Devin inside Slack, not our IDE. Here's why this matters: | ||
|
|
||
| Slack is where discussions already happen. We're already getting alerts from Zendesk, Sentry, and Discord. Being able to launch an agent directly inside a thread where the problem is being discussed is incredibly valuable. | ||
|
|
||
| Real example: [[b]](#cmnt2)John, my co-founder, identified an issue with our AI prompts and tagged Devin to fix it. Devin fixed it, but the approach was non-optimal. Since AI prompting is something I work on, I jumped into the same thread with more context. Devin figured it out based on my additional input, finished the PR, and it got merged. | ||
|
|
||
| This is collaborative debugging without context switching. No copying issues into a separate tool. No explaining the same problem twice. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=52s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D52s&sa=D&source=editors&ust=1770998346317516&usg=AOvVaw1hP_tQHdNlIRwXEILSEA04) | ||
|
|
||
| Another example: I[[c]](#cmnt3) tagged Devin about our 404 page not rendering properly. John, who works on our webpage primarily, pointed out reference files to look at in the same thread. Based on his input, we got a PR and merged it. | ||
|
|
||
| The agent isn't replacing us—it's joining the conversation where it's already happening. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=74s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D74s&sa=D&source=editors&ust=1770998346318805&usg=AOvVaw2Q5dQjaIMOR40u91WFoI5J) | ||
|
|
||
| ## AI Agents Enable Non-Technical Teams to Ship Code | ||
|
|
||
| Having an agent accessible from Slack opened up tasks that don't necessarily require technical skills. For instance, understanding what we're tracking in analytics or making small adjustments to better understand user behavior. | ||
|
|
||
| John attached some PostHog docs and asked questions about what we're tracking and what we should be tracking long-term. Devin made the changes. Now both John and I know we have analytics updates—super helpful for staying aligned.[[d]](#cmnt4) | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=108s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D108s&sa=D&source=editors&ust=1770998346320440&usg=AOvVaw3qOW9ebWwR4PM8NY9KVL7w) | ||
|
|
||
| Since we use GitHub as a CMS for Hyprnote, we can even update landing pages or blog content directly from Slack. John attached a PDF from an internal discussion, and Devin updated our docs based on the actual conversation we had. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=132s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D132s&sa=D&source=editors&ust=1770998346321177&usg=AOvVaw3x-dP1viDobhtoHJUab3DK) | ||
|
|
||
| ## Three Types of Tasks to Delegate to Devin AI | ||
|
|
||
| As a small early-stage startup, there's always a lot going on. We're handling day-to-day work while thinking about what's next—new features, product direction, how the codebase should evolve. That's why it's extremely helpful to dump all of this into an async coding agent and let it figure things out. | ||
|
|
||
| Here are three types of tasks that represent different degrees of relevance and urgency: | ||
|
|
||
| ### Degree 1: Exploration (Not Shipping Anytime Soon) | ||
|
|
||
| This is work that isn't planned for the immediate future. We're not going to ship it or even merge it right now, but it's still valuable to explore so we can understand what the work would look like, how complex it is, and roughly how long it might take. | ||
|
|
||
| Example: Even though we're focusing on our macOS desktop app, we had ideas around building a Chrome extension. I asked Devin to research how to make a Chrome extension that works with a desktop app. We learned how 1Password does it and got a rough plan.[[e]](#cmnt5) | ||
|
|
||
| Then we cloned the repository of a popular Chrome extension framework. Based on the docs and actual code examples, we implemented it to see how it would look. We didn't even merge it, but it's still useful to see how it'll look in the future. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=177s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D177s&sa=D&source=editors&ust=1770998346323297&usg=AOvVaw3MhEtgrPWL_3gXWWxB_aut) | ||
|
|
||
| ### Degree 2: Preparation (Relevant, But Not Right Now) | ||
|
|
||
| This is work we'll likely merge, but I won't pull it into my IDE yet. | ||
|
|
||
| Example: Someone asked whether Hyprnote could import data from Apple Notes. That feels like a feature we could support in the future, but it's not a core focus at the moment.[[f]](#cmnt6) | ||
|
|
||
| We did research to see if there was any existing work on that. There was, so we cloned it, ported the test cases, and let Devin implement it. Tests passed, so we safely merged it for a future feature. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=227s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D227s&sa=D&source=editors&ust=1770998346324469&usg=AOvVaw1d-D2KJ9G8y-u1X-vl3mOo) | ||
|
|
||
| ### Degree 3: Production (Very Relevant Right Now) | ||
|
|
||
| This is work I'll definitely look at, but I'm spawning the agent right now because I'm in the middle of something and want to avoid context switching. Or maybe I'm traveling or about to go to sleep. | ||
|
|
||
| Example: We needed to update test cases around our Tinybase utils—very relevant and important work. We asked Devin to clone the repo, inspect the codebase, and write the test cases.[[g]](#cmnt7) | ||
|
|
||
| One interesting trick: we asked Devin to use the Claude CLI that we already installed on Devin's machine. This way we can offload some AI inference to our Anthropic account and use some credits. | ||
|
|
||
| Pro tip: We encoded this knowledge as an "offload agent" on how to use Claude CLI. Mentioning "consult smart friend" (something Devin uses as a prompt internally) helps Claude CLI get called at the right timing. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=263s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D263s&sa=D&source=editors&ust=1770998346325929&usg=AOvVaw3yMv1xXMhnUFf6HSfyQegA) | ||
|
|
||
| ## Good Documentation Enables AI Agents to Ship Code Faster | ||
|
|
||
| In Hyprnote, we focus on supporting multiple providers for language and speech model inference as part of our open-source effort. Early on, we spent time designing and documenting flexible, clean interfaces. This worked well for future contributions and community involvement. | ||
|
|
||
| It turns out these same choices are incredibly helpful when working with coding agents. | ||
|
|
||
| Example: ElevenLabs Support[[h]](#cmnt8) | ||
|
|
||
| We support both WebSocket-based real-time transcription and file upload-based batch transcription. We had a very detailed prompt on how models should be handled, how language should be handled, and other API references in the docs. | ||
|
|
||
| Since we have end-to-end testing support in place, we sent the ElevenLabs API key as credentials (this can be passed in the prompt or through Infisical CLI). With all the documentation, test cases, and API key in place, Devin implemented it in almost one shot, and we safely merged it. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=349s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D349s&sa=D&source=editors&ust=1770998346327494&usg=AOvVaw3V_pizLB10YeMrbNskYbCv) | ||
|
|
||
| Example: Mistral Support[[i]](#cmnt9) | ||
|
|
||
| Same story for language models—even easier because there's no WebSocket involved. Since we have infrastructure to support any language provider, Mistral was supported in less than 10 minutes. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=392s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D392s&sa=D&source=editors&ust=1770998346328328&usg=AOvVaw3NTT95hIWWwEQpSNzROYpA) | ||
|
|
||
| Example: OpenAI Support[[j]](#cmnt10) | ||
|
|
||
| This one was a little harder. We had errors in the client, so we passed the error message and credentials. After a few minutes—since we had API keys and test cases in place—Devin figured out that OpenAI only supports 24kHz sample rate. That's why it was failing. We fixed it without any engineering resources invested. | ||
|
|
||
| The pattern is clear: good docs + clean interfaces + test infrastructure = AI agents that actually ship code. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=406s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D406s&sa=D&source=editors&ust=1770998346329675&usg=AOvVaw3ZtnMuXaU1E2AFD1gBOx4z) | ||
|
|
||
| ## Automating Code Maintenance with AI Agents | ||
|
|
||
| If you're a developer, you know that once a codebase reaches a certain size and age, maintenance work alone can take a lot of engineering time and slow the team down. With coding agents, we can offload a lot of that work. | ||
|
|
||
| ### Single-Prompt Migrations[[k]](#cmnt11) | ||
|
|
||
| One common example is doing migrations that have clear documentation. In Hyprnote, we recently completed: | ||
|
|
||
| - AI SDK version 6 migration in a single prompt | ||
| - Tailwind v3 to v4 migration in a single prompt | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=447s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D447s&sa=D&source=editors&ust=1770998346331531&usg=AOvVaw2EQazVNk4b5_PL_o5jGDB2) | ||
|
|
||
| ### Concurrent Multi-PR Migrations[[l]](#cmnt12) | ||
|
|
||
| Things can get more complicated and may require multiple PRs or concurrent work. | ||
|
|
||
| A good example is applying Vercel's recent React best practices agent skills. We attached Vercel's React best practices document, and Devin figured out what changes should be done. But since there was a lot of isolatable work, we prompted Devin to do this concurrently by spawning concurrent Devin sessions. | ||
|
|
||
| One way to do this is to ask Devin to make actual API calls. But there's a better way: use the analyze-session task. This lets you spawn concurrent Devin sessions to run work concurrently and generate separate PRs per task. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=463s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D463s&sa=D&source=editors&ust=1770998346333195&usg=AOvVaw0nuQylcKbpeIqKNCbmWw3k) | ||
|
|
||
| ### Daily Automated Linting[[m]](#cmnt13) | ||
|
|
||
| This is all very useful, but we're not doing migrations or receiving new guidelines every day. However, if you pair an agent with an automated linting tool, this approach can be applied daily. | ||
|
|
||
| In Hyprnote, we have a large Rust codebase, and since Cargo Clippy is pretty good, we set up a GitHub Action to run Cargo Clippy daily and spawn Devin to apply any fixes based on the output. | ||
|
|
||
| Since it takes a lot of time to run Clippy and Cargo check, we save a lot of time applying these guidelines or Clippy warnings in an async manner. | ||
|
|
||
| → Watch in video: [https://www.youtube.com/watch?v=UojsNSbhm6o&t=528s](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DUojsNSbhm6o%26t%3D528s&sa=D&source=editors&ust=1770998346334746&usg=AOvVaw2Qf8uDBB4v-TKhnVGQzrrs) | ||
|
|
||
| ## Final Verdict: Is Devin AI Worth It? | ||
|
|
||
| If you're expecting AI agents to replace developers, you'll be disappointed. We're not there yet. | ||
|
|
||
| But if you're looking to meaningfully extend what a small team can accomplish? Absolutely worth it. | ||
|
|
||
| Devin AI is worth the investment when you: | ||
|
|
||
| - Have well-documented code with clean interfaces and test coverage | ||
| - Need to explore features before committing engineering time | ||
| - Want to offload maintenance work (migrations, linting, updates) | ||
| - Have non-technical team members who need to ship small changes | ||
| - Run concurrent work that would otherwise bottleneck your team | ||
|
|
||
| Devin AI is NOT worth it if you: | ||
|
|
||
| - Have poorly documented, tightly coupled code | ||
| - Expect it to understand context without clear instructions | ||
| - Want it to make architectural decisions | ||
| - Need it to work in complete isolation without human oversight | ||
|
|
||
| The key insight after 1,000 tasks: You're not buying code generation. You're buying collaboration at scale. | ||
|
|
||
| The best ROI came from tasks we could delegate async (exploration work at 2 AM, maintenance work during travel, migrations while focusing on core features). The agent didn't replace our judgment; it multiplied our capacity to act on it. | ||
|
|
||
| Our recommendation: Start with one well-defined use case (like automated linting or simple migrations), measure the time saved, then expand. Don't try to use it for everything on day one. | ||
|
|
||
| [[a]](#cmnt_ref1)embed this video here: [https://www.youtube.com/watch?v=UojsNSbhm6o](https://www.youtube.com/watch?v=UojsNSbhm6o) | ||
|
|
||
| [[b]](#cmnt_ref2)screenshot 1 | ||
|
|
||
| [[c]](#cmnt_ref3)screenshot 2 | ||
|
|
||
| [[d]](#cmnt_ref4)screenshot 3 | ||
|
|
||
| [[e]](#cmnt_ref5)screenshot 4 | ||
|
|
||
| [[f]](#cmnt_ref6)screenshot 5 | ||
|
|
||
| [[g]](#cmnt_ref7)screenshot 6 | ||
|
|
||
| [[h]](#cmnt_ref8)screenshot 7 | ||
|
|
||
| [[i]](#cmnt_ref9)screenshot 8 | ||
|
|
||
| [[j]](#cmnt_ref10)screenshot 9 | ||
|
|
||
| [[k]](#cmnt_ref11)screenshot 10 | ||
|
|
||
| [[l]](#cmnt_ref12)screenshot 11 | ||
|
|
||
| [[m]](#cmnt_ref13)screenshot 12 | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Malformed JSX syntax will cause MDX parsing to fail. The opening fragment
<>has no closing tag</>, which will break page rendering in production.Fix:
Or remove the fragment entirely if not needed:
Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.