TL;DR I switched a Dart API from Cloud Run to Railway for a 300% faster cold start, simplified DevOps, and a straightforward fee structure.
Problem
I'm working on this project github.com/daohoangson/flutter_widget_from_html. It is a pub.dev package that's super handy for Flutter developers who want to seamlessly render HTML in their apps.
Now, when it comes to HTML, it can get pretty dynamic, right? That's why having a playground to showcase features, troubleshoot issues, and tackle bugs is crucial. The Google team has this fantastic tool called dartpad.dev, which is just perfect for this kind of thing. However, there's a little catch - third-party packages like mine usually can't be used there (unless you have thousands of likes, as explained on Medium).
So I decided to take matters into my own hands, forked it, then deployed try.fwfh.dev with additional package support.
- Initial idea since 2019
- First deployment in 2021
Cloud Run Deployment
Back in 2021, the code was split into two repositories: frontend and backend. I deployed each of them as a Cloud Run service in Google Cloud Platform. For some unknown reason, I couldn't use domain mapping, so a load balancer is required to use try.fwfh.dev. There wasn't much traffic so the computing cost was minimal. Still, it amounted to around $20 per month due to the LB. This is pain point number 1.
Fast forward to June 2023, and the two repositories were merged into one. I noticed that it was now possible to deploy the frontend to Firebase Hosting, which is cheaper and offers better performance. However, the backend remained on Cloud Run and continued to suffer from an incredibly slow cold start. This has been a persistent issue since the beginning, pain point number 2 right here.
I'm in search of a better solution and decided to explore Railway, my goals are twofold:
- Achieve a faster cold start time
- Keep the cost reasonably low, ideally under $10 per month
Railway Experiment
Getting started is a breeze since everything's already containerized. I opted for the most budget-friendly plan, priced at just $5 per month.
Initially, I migrated both the frontend and backend, but it became evident that containerized NGINX couldn't outperform static hosting, so the frontend stuck with Firebase and I only moved the backend. In the final PR, there's just one file.
Metrics
I used time and curl to call the endpoints and get some numbers. Railway was faster in all tests, which is quite a surprise.
Cloud Run | Railway | |
---|---|---|
Cold start | 53,214 ms | 14,862 ms |
Analyze | 830 ms | 420 ms |
Compile | 5,880 ms | 5,120 ms |
Comparison
Considering the significant performance boost, I have some concerns about the cost. Therefore, I went ahead and deployed a separate pair of services purely for the purpose of benchmarking against each other:
- Railway Hobby (8 cpu, 8GB memory)
- Cloud Run 8 cpu, 8GB memory, max 1 instance
- Measure cold starts via uptimerobot.com, monitors have 1-hour interval
- Daily stress test via loader.io, maintain 10 clients compiling for 1 minute
After a week, the results are in:
Cloud Run | Railway | |
---|---|---|
Cold start | 46,875 ms |
12,780 ms |
Compile | 30,107 ms |
25,314 ms |
Cost | $0.47 after -$1.21 free tier | $0.65 |
Further cost break down:
Cloud Run | Railway | |||
---|---|---|---|---|
Usage | Cost | Usage | Cost | |
CPU core-sec | 45,835 | $1.1 | 4,351 | $0.0336 |
Memory GB-sec | 45,823 | $0.11 | 159,960 | $0.6173 |
Traffic GB | 0.41 North America 0.11 Intercontinental |
$0 $0.01 |
0.01 | $0.0009 |
Some interesting observations:
- Railway seems to put the service to sleep after 40 minutes of inactivity
- Cloud Run categorizes an instance as active when it serves requests, then it goes idle, and eventually terminates it after approximately 15 minutes
- I didn't included storage costs because each provider has a different billing model. Railway charges for disk usage on a minutely GB basis, while GCP bills for container registry storage
- GCP has a "Networking Traffic Egress GCP Replication" SKU, which costs around $0.39 during the trial period. I incur this cost when Cloud Build pushes the Docker image to the registry and again for replicating data across regions 🤷
- The providers calculate resource usage differently. Cloud Run's CPU and memory numbers are nearly identical at 45k, whereas Railway's memory usage is significantly higher at 160k, with only 4k in CPU usage. For this particular service, we can potentially tweak Cloud Run to use less CPU and save costs, but with Railway, we don't need to worry since they bill based on actual usage, which is convenient.
- GCP introduces additional billing items such as build time and request count, among others, whereas Railway only charges for CPU, memory, disk and bandwidth.
Conclusion
Given the comparable costs and Railway's exceptional performance, it's a clear choice for now. I'll keep a close eye on this and reconsider if the performance ever takes a dip.
Referral link: https://railway.app?referralCode=daohoangson
Comments
Post a Comment