I recently OpenAI’s o3-mini-high model to generate a small web application, which I have published on GitHub.
My use case was simple: I wanted a way to send ChatGPT conversations and research sessions to my e-reader so I could read them while avoiding the impact of blue light from screens. I use the application pocket to send web pages to my ereader, but OpenAI conversations are behind a login wall, and therefore cannot be saved via the usual Pocket web extension.
First, I asked the model for different approaches to solve this problem, and it gave me several options, including one in which as a user I would select text from a page, and click on a bookmarklet that would send the text to an API which would save that text in a database. This is the option that seemed the best so I stuck with it.
Next, I wrote a prompt that described the above solution while requiring particular technologies to be used: TypeScript, Node.js, and PostgreSQL. I did that because these are some of the technologies I’m proficient with, and I wanted to be able to check the output and quality of any code generated.
I also wrote a basic requirements for what the application should be doing: receive a text snippet via an API and save it to a database, and display any text snippets as a standalone page.
I then inputed the prompt into o3-mini-high, and it generated code that didn’t initially work when trying to run it. I just pasted the error message into the model, which could find what the issue was, and made a correction to the code. Then the code got stuck at another error message, and I repeated the process twice again (three times in total) and in the end the code was able to run. Some of the issues were related to database grant accesses that needed to be run on the database as a one-time preparation step.
To my surprise, the resulting code was simple and well structured. It also contained comments explaining what different functions were doing. Oh, and the model also generate the JavaScript code for the bookmarklet, and it worked from the first trial.
I had no clue that what I just did was a trend called “vibe coding,” and I discovered that later while seeing the term pop in my feed on various social media platforms.
Iterative Development
From that point, I decided to add a few other features:
- HTML formatting (the text was copied without formatting which meant all hierarchical organization was lost. I asked the model to maintain any HTML formatting)
- List of all saved snippets.
- Option to delete a saved snippet from the list page.
- Option to add a snippet and its title manually via and “add snippet” page.
- The ability to upload a snippet of very large length (the model’s initial solution was to use a GET method which I requested to be swapped for a POST).
I worked on these extra features iteratively over a couple of session. Each time, I would re-paste the entire code into o3-mini-high in order to make sure that the context window would still be able to access the source code (the code will fall out of context if you ask the model many successive questions).
Some of the answers were generating only particular endpoints, and I had to either add or replace the code into my source file, while being careful that nothing else would be broken. This wasn’t great and still needs to be improved.
The experience was overall very pleasant, and this felt like a huge time saver. I could totally have developed this small application myself, but it would have taken me a lot longer to figure out how to solve every compilation error, what particular database query I needed to run in order to grant accesses the right way, and so on.
Reflection on Trust and Limitations
- My use case was very simple, a basic web application with a few endpoints. I would assume that anything more mission critical or requiring more complex calculation would be problematic, unless you know exactly what edge cases to test and make the model generate those tests for you. If not, then how would you trust the code?
- I was very important for me to pick technologies I understand so I could read and explain the code and ensure quality, because with my use case, I need to be sure that I’m not exposing my data in unwanted ways, or that the application is not going to delete all of my saved data due to a bug. I would 100% not trust the model to create an application with technologies which I do not understand.
- My use case was simple and can run on a small server to be used by only me, a single user. If this was as more complex application or anything with data privacy issues, I don’t know how well would LLMs do with ensure that everything is secure.
- My codebase for this application is very small. I don’t know how an LLM would do with large codebases (I don’t have this experience yet) and how it would ensure style and best practices consistency across the entire codebase (well, large teams of developers usually suck at that anyway, so maybe it’s not that big of a problem).
- The whole user experience of having to paste code into a conversation UI and copy paste resulting code into a source file was very clumsy and cumbersome. ChatGPT’s conversation UI is still a generic tool, and I have no doubt that this will be improved in the future, and there are already some specialized IDEs like Cursor that are solving the interaction problem for the programming use case.
- Ideally, I want the UX to be a verbal conversation with the code, with me selecting some parts of the source code and asking for additions or alterations. And I want the inner dev loop managed by tools for me, so I don’t have to copy-paste code around, restart my Node.js server, and so on.
Future Outlook
Two of my points above revolve around trust. The use of an LLM here assumes that I know enough of the technologies used and that I can do the work myself if needed, so that I can verify the validity, security, and all other critical aspects of the solution. Now, this was a very simple web app, so imagine if this had been anything production critical or anything involving human safety. No auditor in a regulated industry would allow a company to run code that cannot be trusted to a demonstrable level.
This is going to be the way that AI will be used at scale for programming: LLMs will not be replacing all engineers, just as pocket calculators didn’t replace engineers. Just like any tool, LLMs will be augmenting engineers.
But I see a caveat here, while I do believe that LLMs will not be replacing all engineers, they is going to replace some of them. The rate of replacement will be high in organizations that have simple use cases, for example frontend feature development: people who attended a coding bootcamp to learn React.js, and can only do React.js and nothing else, are the first ones who are going to be fully replaced. The organizations made of bootcamp folks will be replaced by five engineers and an LLM. The LLM will spit code at the speed of light and the five engineers will spend 90% of their time reviewing the LLM’s code.
And on the other hand, all industries that have a high degree of risk and heavy regulation will maintain larger number of software engineers on payroll (headcount will still decrease) and each of them will be augmented with LLMs.
The main takeaway is: how does someone quickly retools himself in order to remain competitive in the new reality of fewer tech jobs, while these jobs also happen to be more technically demanding?
Be First to Comment