How to choose a database
I asked developers on twitter to share how they chose a database for their production system. Read their advice, along with other interesting news for SaaS developers.
It is rare to encounter a SaaS developer who never had to pick a database. So rare, that when I asked developers on Twitter “What was the last database that you picked, and how did you make this choice” only three developers mentioned that they never did. One of them because he is famous for rewriting MySQL for Facebook scale.
What do all the other developers do? They use Postgres, mostly.
I got 60+ responses, out of which, 22 picked Postgres. The runner up is DynamoDB with 6 responses. Then there is an extremely long tail of 27 databases with 1-2 responses for each.
Why Postgres? SaaS Developer Community member, Lalit Pagaria, summed it up nicely:
In short, if you want to do mostly normal DB things and do them reasonably well - it is hard to go wrong with Postgres.
Why DynamoDB? Because it is Serverless, highly available, and plays well with AWS ecosystem:
What are the main criteria for choosing a database? This time there wasn’t a “long tail” but rather just 3 main criteria that everyone used to choose their database:
Operational reasons: This category includes easy maintenance, reliability, good tooling and good documentation. I also included familiarity in this category, with the assumption that developers pick a familiar DB because it is easier to operate.
Specific capabilities: This category covers a wide range of less common requirements that very few databases include, and therefore drive the decision to pick a specific database - multi-region active-active, continuous queries, time-series, analytics, CQRS, FaaS support.
Performance: Speed, and to lesser extent, scale were mentioned almost as often as operations and special features. No one wants a slow DB, and databases often compete by publishing benchmarks.
It is worth mentioning one criteria that didn’t have much influence on the decision:
Cost. Two tweets mentioned that they picked BigQuery because it is cheap. Several more mentioned that they use DynamoDB or Firebase “even though it is expensive at scale”. Overall, I got the impression that cost is a minor factor when picking a DB. Perhaps because self-hosted Postgres is perceived as “free”.
Thats it. Picking a DB is easy.
The usual bi-weekly news for SaaS Developers
Picking a DB is easy, designing a schema is (much) harder. There are few schema that every SaaS developer needs to design, usually more than once - user identity model, tenant model, customer/account/organization model, configuration model and of course - a billing model.
Hacker News had a great discussion on designing a schema for usage based billing. Lots of thoughts from people who did this before and from those who did not.
The top advice is:
Design for the ability to audit. Especially the inevitable customer questions about changes to their monthly bill.
This means collecting raw usage metrics (throughput, storage size) and events (cluster created, job paused). And also any events on changes to the billing itself (rate updates, custom rates). Retain the raw data while creating aggregated versions of it for billing.
The reverse, mutable tables as source and change capture (like Debezium) for maintaining a log of change events is also doable, at least in some cases.
Billing may start simple, but it gets more complex with pre-paid credits, different types of discounts, etc. You probably don’t want to build everything yourself. Look for an OSS solution or a service.
Should a SaaS Startup have a bug bounty program? Our community is split on the topic, and some prefer professional pen-testers.
Security expert Alec Muffett explained the topic in detail. In his blog he shares some of the common pitfalls of bug bounty programs and gives advice to startups on how to approach this aspect of security.
Over at the SaaS Developer Slack, our friend Felix GV shared a link to a blog on a cutting edge idea: Database Oriented Operating System.
We're proposing a database-oriented operating system (DBOS): a new operating system that natively supports large-scale distributed applications in the cloud.
We believe the next generation of operating systems should be database-oriented because databases are built to solve the hard problems of modern computing. Databases today can manage petabytes of data, are distributed and increasingly cloud-native, and can secure and govern data with fine-grained access control and provenance tracking.
Our prototypes have convinced us that DBOS is practical, so we are now planning the next phase of the project: implementing a complete database-oriented development stack for distributed applications.
Our community members compared DBOS to other ideas that were steps in that direction, and wondered where this is all leading.