Building a revenue-generating SaaS now takes hours, not weeks. That changes the earliest phase of startups.
After quitting my Y Combinator backed startup, I gave myself a challenge. How fast can I turn an idea into a product and get someone to pay me for it?
Coming up with an idea was the hardest part. Given that my goal is to earn revenue quickly, taking a “solution” and trying to find a customer is not a good approach. Instead, the customer and their problem must come first. [1]
Admittedly, it took me a few weeks of aimlessly learning about topics I found interesting before I came across one worth working on. At some point during this process, I stumbled upon Dia-1.6B, a new text-to-speech (TTS) model. It supports voice cloning. Could I build a product that does this? [2] After searching the internet, it seems like there are a lot of existing products that do the same thing. I don’t see what I have to offer. Moreover, I am concerned about copyright issues.
While learning more about the TTS model space, I saw a post on X where the guy said he wanted a TTS API specifically to use for an online survival game he was developing. That intrigued me. I slid into Martin’s DMs to learn more about his use case.
Imagine playing a game where every NPC has a deep backstory and a unique personality. Or every player has their own Jarvis/AI assistant to answer questions and provide useful tips. That’s something I think should exist.
So given that:
- Someone else has this specific need,
- I think this would be awesome,
I want to do this.
Day 1 (June 7)
Martin and I discussed the current state of TTS products. Notably, many companies are using this technology for automated customer support. [3] ElevenLabs appears to have the best general-purpose TTS API offering. Their API is nice, but the pricing is ambiguous. Paying for credits upfront and being charged for overages seems to be a turn-off for game developers. Additionally, hyper-realistic voices aren’t necessary for most games, especially if building an in-game AI assistant.
With these ideas in mind, I wanted to put together a landing page that communicates my idea of what a good product in this space would look like. With one prompt, DeepSite generated the raw HTML for a nice landing page. Most of the design and style seen on the currently deployed landing page was generated by DeepSite using the following prompt.
|
|
I have some notion of how to build and deploy a product fast, but I used o3 with Deep Research to sketch out a plan with the full tech stack in order to get another data point.
|
|
I received some follow-up questions, including Should the TTS demo use your own model or can it rely on existing APIs?
. I decided to use the ElevenLabs API for the demo (and eventually the v1 product). There’s no sense in spending months developing a custom model if I don’t know this is something people are interested in. And willing to pay for it.
o3 outlined a plan and long justification. I then opened up Cursor. I started a new project and added only the raw HTML from DeepSite into index.html
. I added this to Agent Mode context with Claude 4 and gave the following prompt:
|
|
In a few minutes, the landing page was mostly implemented. It took some extra prompting to update it with my own branding, fix fonts, and tweak the copy. [4] The generated README
included simple instructions for deploying the project to Vercel. [5] I sent the landing page to Martin and waited for his feedback.
Without these tools, the same work would have taken me up to a week to complete. The bulk of that time would be spent on design, where my skills are most lacking. DeepSite did this better than I could have done, with one quick prompt. The generated text content was also nicely written and saved me time.
Day 3 (June 9)
Monday morning, I woke up to a DM from Martin. It was exactly what he was looking for and he signed up for the waitlist. I started working on a usable API for him to work with. Additionally, I learned more useful information from Martin:
- His use case involves each player having their own AI assistant. He uses another LLM to maintain the context of the user’s recent actions and respond to their questions on the fly.
- Voice streaming is important. Waiting for my API to generate the entire audio clip and return it to the client is not feasible. This adds tremendous latency. Time to First Byte (TTFB) is the metric to optimize for.
To solve the first point, I used one of ElevenLabs' built-in robotic voices. To solve the second, the ElevenLabs API already supports streaming. I co-opted my landing page’s demo to also support an API endpoint [6]. It took some prompting and manual work to wrap ElevenLabs' Streaming API with my Kikashi streaming API. Claude kept generating code that processed the entire request at once but told me it was streaming chunks. Eventually, it worked.
After letting Martin try out the API, I got more feedback. He wants a specific voice. He sent me 20 example audio files which ElevenLabs allowed me to easily clone and add to my API.
First Revenue (June 10)
Martin is happy with the API. Some latency issues could be improved, but it’s already good enough.
“Will you pay me for this?”
“I’d love to.”
I sent a payment link for $50/month and Martin paid. [7] Choosing this price was somewhat arbitrary. I chose a fixed pricing model to ease concerns about ambiguous usage-based pricing. The price was chosen because it’s more than I’m currently paying ElevenLabs. Realistically, I expect two pricing tiers: One for the development phase and another for production. That is something I made up in my mind without talking to any developers or potential customers. The most important thing to keep in mind is that pricing is difficult. I do not know the answer, but I will continue to experiment and see what my customers are willing to pay.
Reflections
- Could I have done this faster? Definitely. Much of the time I spent, not detailed in this post, was talking to more indie game devs. That didn’t help me get revenue quickly, but it did point me toward what I need to work on next. The point of this exercise was not simply to get paid quickly, but to do so in a realistic manner that allows me to turn Kikashi into something more.
- I made an API wrapper. So what? From talking to more game devs, I can see that the major value proposition does not lie strictly in TTS, but rather in joining an agent/brain function with text-to-speech capability. Working with the ElevenLabs API provided me with a quick way to provide minor value and get feedback.
- AI tools enable fast, high-quality code generation. The bottleneck for a lot of companies is no longer building product, but rather knowing what to build, go to market, and growth.
What’s Next?
More broadly, I think it is worth thinking about how viable this is as a business. Learning more about the gaming industry brings rise to some concerns.
- Game dev is a hits-based business. Similar to Hollywood, there are few very large studios that make most of the money. Some indie game devs I talked to have been working on the same game for 4+ years without seeing revenue.
- The most common monetization strategy is “buy-to-play,” where the developer sells their game for a set price and that is it. Popular games might extend this later on with DLCs or microtransactions. The important implication here is that the ongoing cost of maintenance (paying for servers & APIs!) becomes harder to justify for a lot of games.
- Vendor lock-in. It’s a scary idea for indie developers to build a game around a third-party API. This is especially true if that API becomes a core part of the game. Open sourcing parts of the product could help alleviate some of the pain.
- Latency is huge, particularly regarding AI capabilities. That makes the difference between an awesome implementation and a terrible one. I think this can be solved. If companies are already replacing customer support employees with AI, we can probably reduce latency enough to support gaming.
One company whose business I find interesting is Clockwork Labs, the creators of SpacetimeDB. Martin’s game is built on top of this DB/Server product. It is open source, well-funded (presumably), and generates revenue from hosting the service. This appears to provide a counter-example to most of my concerns.
Notes
[1] Solving a specific problem for a specific person is generally a better way to go about deciding what to work on. If you solve the problem well enough, you automatically have your first customer.
[2] It made me think of my current favourite Instagram account @fullstackpeter, which involves a TTS model generating speech that sounds like Peter and Stewie from Family Guy.
[3] Some great examples are Phonely and Bland.
[4] I still had to update some environment variables with my Supabase, ElevenLabs, and Google Analytics keys. The worst part about that was figuring out how to create a new “property” in GA. Lesson learned, always use PostHog for analytics.
[5] I used a domain I’ve been sitting on for a while because I thought it sounded cool. I have too many domain names.
[6] With a single hard-coded API key.
[7] First I tried signing up for PayPal and creating a subscription through them. Apparently, the business needs to be approved before that works. Instead, I signed up for Stripe as a sole proprietor and set up the payment link. The process with Stripe took about five minutes. Again, lesson learned.