Pastebin System Design

Functional Requirements of the System.

➡️ Durability : The system should be highly reliable. Texts created by users should never be lost.
➡️ High Availability : The system needs to be highly available. Meaning that it needs to be able to handle a large number of requests at the same time.
➡️ Low Latency : User should be able post text and get it back in a timely manner.

Non-Functional Requirements of the System.

➡️ Paste size should not exceed some memory say 10MB.
➡️ Custom Path URL
➡️ Paste Expiry : Lets say 6 months.
➡️ User Login / Anonymous

Capacity Estimation

Assumptions

Estimation	Pastes	Reads	Paste Size
Per Day	1M	10 * 1M	Max 10MB

As per the above table we can say that the system can handle 100K pastes per day and can handle 10MB per paste. And also to notice that the system is Ready heavy, as user might access paste url more than the user actually posts.

Traffic

Writes : 1M	Reads : 10 * 1M
1M Per Day would be ~12 Requests per second.	10 * 1M pastes would be ~120 Requests per second.

➡️ Along with this you should always keep a buffer of 30%. Meaning that the system should be able to handle either 30% less or 30% more requests than the actual traffic.

Data Estimation

❓How much data you will need to for your system to run smoothly over a period of time?

Estimation	Pastes	Paste Size Max	Paste Size Average	Total Data Required - Average	Total Data Required - Max
Per Day	1M	10MB	100KB	1M * 100KB = ~100 GB	1M * 10MB = ~ 1TB
Per Month	1M * 30	10MB	100KB	1M * 100KB = ~3TB	1M * 10MB * 30 = ~ 30TB
Per Year	1M * 365	10MB	100KB	1M * 100KB = ~36.5TB	1M * 10MB = ~ 365TB

❓Which database will you be using for the system. (Between NoSQL and SQL).

❓What is the strategy for storing the text for the system?

➡️ So currently we have is two types of consideration for the size of the text (average and max).
➡️ So storing 100KB of text directly into the database is not a problem and its not even a burden on the database.
➡️ But if we want to store a lot of text, then we need to store it in a way that it can be accessed easily. For that we can use is S3 bucket to store it as Blob and in our DB we can store the reference of the Blob.
➡️ On top of it we can have a Hybrid approach that some memory data can be stored in the DB and the rest can be stored in S3. This way we will also be able to preview the file and the user wont feel the lag which the full file is loaded in the background.

❓What is the strategy for caching of the requests?

➡️ As discussed in the earlier point, Hybrid approach can be good for accessing data in a better way, eliminating the lag of the data.
➡️ But we need to be careful with the caching strategy, as big chunk of data can be cached in the memory but the problem would be burden on the DB and in the cache memory we cannot put so much of data for any user.

❓What is Blob?

➡️ BLOB stands for a Binary Large Object, a data type that stores binary data. Binary Large Objects (BLOBs) can be complex files like text, images or videos, unlike other data strings that only store letters and numbers. A BLOB will hold multimedia objects to add to a database; however, not all databases support BLOB storage.

Database Schema

Paste Table

Column	Type	Description
id	INT	Primary Key
content	TEXT	Text Content
size	INT	Size of the text
s3BucketURL	TEXT	S3 Bucket URL
created_at	DATETIME	Date and Time of creation
expires_at	DATETIME	Date and Time of expiry of the Paste

User Table

Column	Type
id	INT
name	TEXT
created_at	DATETIME

References

Paste bin system design | Software architecture for paste bin

Pastebin System Design

Functional Requirements of the System.​

Non-Functional Requirements of the System.​

Capacity Estimation​

Assumptions​

Traffic​

Data Estimation​

❓How much data you will need to for your system to run smoothly over a period of time?​

❓Which database will you be using for the system. (Between NoSQL and SQL).​

❓What is the strategy for storing the text for the system?​

❓What is the strategy for caching of the requests?​

❓What is Blob?​

Database Schema​

Paste Table​

User Table​

References​