SQL vs NoSQL Databases | O(1) System Design Interview Guide

|

System design is an important skill that experienced (and even junior) engineers are expected to know. As such, it is not surprising that it is an inevitable part of many technical interviews for many companies, such as forthe software engineering roles at Shopee and Bytedance.

However, unlike online assessment or coding interviews where solutions are assessed against objective criteria like time-space complexity, things are not as straightforward when it comes to system design interviews. Often, there’s no right or wrong answer and your interview performance is assessed based on your thought process behind your proposed solution. (That’s right, you cannot grind Leetcode for it!)

Instead, a better way to prepare for system design interviews is to understand the WHAT and HOW so you are better prepared to back up your answers with the WHY.

Table of Contents:

  • What are SQL Databases and NoSQL Databases
  • Difference between SQL and NoSQL Databases
  • SQL v.s. NoSQL: Which is more scalable? (Scalability)
  • SQL v.s. NoSQL: Which is faster? (Performance)
  • SQL v.s. NoSQL: Which is more secure?
  • Pros and Cons of SQL and NoSQL
  • Use cases for SQL and NoSQL: When to Use Which? (with real-world examples)

What are SQL Databases (Relational Databases)?

When people say “SQL” or “SQL Databases”, they are mostly actually referring to relational databases. Data is organized in tables, where columns represent attributes and rows represent records

SQL stands for Structured Query Language, a language used to interact with a relational database. It can be used to read and write (create, update, delete) data in a Relational Database Management System (RDBMS).

Examples of SQL Databases

  • MySQL
  • PostgreSQL
  • Oracle Database
  • Microsoft SQL Server
  • MariaDB

What are NoSQL Databases (Non-relational Databases)?

Unlike SQL, NoSQL (a.k.a. Not Only SQL, Non-SQL) is a database that manages data in a non-relational structure. This means that it is schemaless (no fixed schema) and avoids joins. Due to its scalable, high performance and flexible nature, Its popularity is rising in Big Data and real-time applications.

4 Types and Examples of NoSQL Databases

1. Key-Value Stores

  • Simplest and easiest to implement
  • Like a hash table, everything is stored in a key-value pair - each unique key points to a value
  • Value can be anything from number, string to JSON and BLOB (Binary Large Objects)
  • Thus, they are highly scalable and can handle high volumes of traffic
  • Ideal for session management e.g. localStorage in the browser, shopping cart.
  • Examples: Redis, Aerospike, Memcached, Riak, Couchbase, Amazon DynamoDB, Oracle Berkeley DB

2. Column Stores

  • Column-based database: Keys stored in columns rather than rows
  • Can process large amounts of data 
  • Great at handling analytical queries - high performance on aggregation queries like SUM, AVG etc.
  • Ideal for Data Warehouses and Business Intelligence (B.I.)
  • Examples: Bigtable, Hadoop, Cassandra, HBase, Hypertable, Druid, Azure SQL Data Warehouse

3. Document-oriented Databases

  • Similar to Key-Values Stores
  • Ideal for storing any type of data, including images, videos and even geospatial data
  • Data stored in JSON - Documents that are collections of other key-value collections
  • Examples: MongoDB, ElasticSearch, CouchDB, CosmosDB, Amazon SimpleDB

4. Graph Databases

  • Based on graph theory – designed to identify connections between data points
  • The relationships are pre-captured, thus allowing for rapid traversal since no calculation is needed
  • Ideal for fraud detection, recommendation engineers, network mapping, social networks etc.
  • Examples: Neo4j, Virtuoso, ArangoDB, OrientDB, GraphDB, Amazon Neptune, Fauna, FlockDB

Difference between SQL and NoSQL Databases

SQL v.s. NoSQL: Which is more scalable? (Scalability)

Scalability is an indicator of how well the system can increase or decrease performance and cost in response to the processing demand. This is not to be confused with performance, which will be covered in the next section.

TL;DR: NoSQL scales better

  • Horizontal scaling is cheaper - Adding more machines is usually easier and cheaper than upgrading the hardware of a single machine (vertical scaling).
  • Both vertical and horizontal scaling is possible for NoSQL - However, horizontal scaling is hard for SQL
  • Technological limit to vertical scaling - It is limited by technology and hardware that’s available

Gergely Orosz shared his experience when building a large payments system at Uber:

We also had engineers on our team who have worked at large payment providers where they tried - and failed - to scale vertically on the largest machines that money could buy at their time.”

Source: Iron Man 2 | Marvel

SQL v.s. NoSQL: Which is faster? (Performance)

Performance is an indicator of how well a software system or component meets its requirements for timeliness, measured by response time or throughput. 

TL;DR: NoSQL is generally faster

  • NoSQL is not ACID - It was designed to be faster by forgoing one of the ACID properties
  • NoSQL stores commonly accessed data together - They have better performance on reads/writes on a single data entity.
  • SQL schemas validate data changes - Additional computational required to ensure data changes do not violate predefined schemas. However, the schemaless nature of NoSQL’s skip this step and thus, have higher write operations per second.
  • SQL databases are normalized - Data is broken down and stored in tables. As such, it is faster than NoSQL for complex queries like joins. “Joins” in NoSQL are done by retrieving multiple data entities and joining them “manually”.
  • Indexes are more efficient for SQL - Indexes work by duplicating existing data and efficiently storing them in the RAM. They work better for structured data and thus, SQL benefits more from it.
Source: Avengers: Infinity War | Marvel

SQL v.s. NoSQL: Which is more secure?

When it comes to comparing the security of the databases, we use 3 factors:

  1. Confidentiality - Only authorized users can access
  2. Integrity - Accuracy and Consistency of the data
  3. Availability - Data is available when needed

TL;DR: SQL is more secure

  • SQL guarantees ACID transactions - NoSQL databases do not and thus lack confidentiality and integrity. This means race conditions are an issue and simultaneous transactions can affect each other.
  • NoSQL has minimal in-built security features - Have to build them into your application instead.
  • Horizontal scaling has higher availability - There is no single point of failure unlike vertical scaling
Source: Silicon Valley | HBO

The Pros and Cons of SQL and NoSQL

Pros of SQL

  • Reduced data storage - Normalization removed data redundancy and duplication
  • ACID-compliant - Strong data integrity, security and consistency
  • Normalization - Database engines are better at optimizing queries to fit on-disk representations
  • Great for complex queries - SQL is efficient at processing queries and joining data across tables
  • Predefined schema - Validations are done on the database layer and not the application layer
  • Standardized language across different RDBMS

Cons of SQL

  • Vertical scaling - Relatively more expensive than horizontal scaling
  • Rigid data model - Require up-front design, and harder to make changes (require data migrations and possibly downtime)
  • Single point of failure - Mitigated by replication and failover techniques

Pros of NoSQL

  • Schemaless - Easy to make changes to databases as requirements change
  • Flexible data model - Each entity can have its unique structure
  • Horizontal scaling - Relatively cheaper and easier, especially for Big Data
  • Highly available - Designed to support seamless scaling without a significant single point of failure
  • High performance - Data that is accessed together typically stored together

Cons of NoSQL

  • Less secure - Difficult to verify data integrity and consistency (only eventual consistency achieved)
  • Schemaless - Lacks standardization of data types
  • No normalization - Database engines are not as good at optimizing queries
  • Larger data storage footprint - More data duplication due to lack of normalization. But a relatively small drawback as storage is relatively cheap
  • Different types of NoSQL databases - each have their use cases. E.g. Graph databases are good for analyzing relationships but not range queries

Use cases for SQL and NoSQL: When to Use Which?

If you are looking for textbook answers on whether you should use SQL or NoSQL, I am afraid you will be disappointed. There is no one-size-fits-all approach - it all depends on the use cases. 

Many companies have shown that they can scale and adapt the respective databases to suit their needs. For example, contrary to popular belief that SQL does not scale well, Facebook has proven to us that it is indeed possible to scale MySQL to manage many petabytes of data at their scale.

Besides, there are so many factors at play that one can argue that there are no clear correct or wrong decisions in many cases. What you should know instead, is to understand the requirements of your system and make a decision by evaluating the pros and cons of each type of database.

When to use SQL

  • Working with complex queries - NoSQL is less robust and has limited query functionality
  • ACID transactions are important e.g. financial or accounting transactions
  • Data is highly structured and does not change frequently - Predefined schemas allow for easy navigation

Use cases of SQL (with Examples)

When to use NoSQL

  • Unstructured data - different data entities can have different structure
  • Faster development speed - No need to design database schema
  • Priority is easy scalability and availability - Vertical scaling is expected to be too expensive
  • Priority is not data consistency and integrity - NoSQL is not ACID compliant

Use cases of NoSQL (with Examples)

Find this article useful? Here’s how you can benefit from it more!

While it is no doubt that being more knowledgeable in system design will help you ace your interviews, you also do not want to be caught off guard during an interview because you do not know what to expect. As such, NodeFlair also compiles interview questions and processes so that you know what to expect and can better prepare for them

Got a job offer? Contribute it anonymously!

NodeFlair is improving salary transparency to empower tech talents with the correct salary information so they are not lowballed by companies. Salary is a taboo subject in Singapore, so we have built it with anonymous compensation sharing in mind. Sharing your job offer anonymously will help the tech community inch one step towards salary transparency.

Share it on social media and with your friends

Related Articles