NoSQL Part 3 – Infrastructure and Pricing

NoSQL Part 3 – Infrastructure and Pricing

Infrastructure – how much responsibility do you want to leave with an individual?

Pricing is always a topic of concern when we build cloud architecture. This is an area we want to consistently improve as our systems or applications grow.

A common mistake made with cloud is the lack of monitoring on resource consumption. This just shows the ignorance around monitoring on premise infrastructure. If you were to visit some companies and ask questions on expenditure, consumption, throughput, data size, etc, you would receive a response like “Oh! Our IT guys know the answer to this, but they aren’t in until….”

Sound familiar?

This is the great thing about IaaS, PaaS, and SaaS in the cloud. You can’t turn your back on what you’re consuming. We must always monitor, analyse, evaluate, re-architect (MAER – don’t you love acronyms?). The more you can automate monitoring and analytics, the more efficient you will become in this space. Cloud makes it easy for everyone and anyone to monitor and analyse through the use of metrics, alerting, autoscaling, and much more. With the amount of cloud monitoring services available, everyone has the ability to monitor and analyse. We don’t need specific infrastructure specialists, paying them exuberant amounts of money.

Being Cost Effective with NoSQL

We always want to be cost effective right? Let’s look at how we can minimise consumption with NoSQL.

Here are a small set of steps:

  1. Requirements
  2. Pricing Comparison – AWS, Google, or Azure
  3. Spin up – start small and grow
  4. Autoscaling
  5. Review, Rework and Expand (RRE)

1. Requirements

We know what your thinking, “cmon, no one likes doing requirements analysis”. Well let us hit you back and say, “do you want to save or waste money?” No matter how many IT books we look over, everywhere we are told to understand requirements. REQUIREMENTS ARE IMPORTANT! In cloud, this could not be more important. The consequence is MONEY. You will spend more than you actually need. Remember, if the service is spinning, so are the dollars.

NoSQL Requirements

What are we looking for?

  • What is the size of the data set?
  • Is it going to be regular accessed, or simply an archival – hot/cold/archival
  • Are we serving globally where latency matters?
  • Are we aiming for a single cloud or multi-cloud strategy?

Questions like these will ensure you spin up the most suitable services – most suitable being MOST COST EFFECTIVE.

Here is a rough guide:

Azure

Blob Storage + CDN

  • Media streaming scenarios serving videos or audio globally

Fig 1. Blob storage serving cold storage for machine learning to perform predictive analytics

Cosmos DB

  • Gaming – session state
  • Retail – service product catalogs for globally hot read access

Fig 2. Apps keeping session state for global reach.

AWS

DynamoDB

  • User sessions
  • Real time look up tables for invoicing

Fig 3. Invoicing table for real time look up.

GCP

Datastore

  • Static website content

Fig 4. Web app setup for hosting static content

2. Pricing Comparison – AWS, Google, or Azure

It’s very hard to give a general price across the board as price will differ based on items such as dataset size, item size and read/write throughputs, with other considerations. Let’s use a scenario where we have 10GB of data, each item or blob is 50KB, and our throughout is 50 reads and 50 writes per second with no deletions.

Note

This is a very high through put with a large data set, so all results are going to be expensive. Having this amount of writes per second is large, but not uncommon. Write throughput is going to be more costly than reads.

Using the pricing calculator for each cloud platform, lets break down some rough estimates:

  • Dataset size = 10 * 1,000 MB * 1,000KB = 10,000,000 KB
  • Number of entities = 10,000,000 KB / 50KB = 200,000 entities
  • Total reads per month = 50 * 60 seconds * 60 minutes * 24 hours * 31 days (most months) = 133,920,000
  • Total writes per month = 50 * 60 seconds * 60 minutes * 24 hours * 31 days = 133,920,000

ALL PRICES BELOW ARE PER MONTH

Azure Cosmos DB

Cosmos DB is charged different from the rest as we are priced based on request units (RUs). We are not charged separately on reads and writes.

  • SSD Storage (per GB) – $0.25 GB/month
  • Reserved RUs/second (per 100 RUs, 400 RUs minimum) – $0.008/hour

Total cost comes to $586.50USD – try the Azure pricing calculator for storage.

Note

Throughput on Cosmos DB will increase as we increase the number of replicated regions. This means our throughout weight will expand causing increase in the number of RUs – BE CAREFUL

Fig 5. Cosmos DB pricing calculation

Azure Blob Storage

In our estimation we are going with hot storage because we have a high throughput.

  • Write Operations (per 10,000) – $0.05
  • List and Create Container Operations (per 10,000) – $0.05
  • Read Operations (per 10,000) – $0.004
  • All other Operations (per 10,000), except Delete, which is free – $0.004
  • Data Retrieval (per GB) – Free
  • Data Write (per GB) – Free

We get DATA WRITE AND RETRIEVAL FREE for hot storage, and free support. This is great because support via AWS costs $123.71USD.

Total cost comes to $723.38USD – try the Azure pricing calculator for storage.

Fig 6. Blob Storage pricing calculation

AWS DynamoDB

AWS provides a sweet FREE TIER option giving:

  • 25 gb per month
  • 200 million requests per month
  • 2.5 million stream requests per month
  • ability to deploy Dynamodb global tables in two AWS regions

Thats really good for a FREE TIER, plus, you only pay for the resources you provision beyond these free tier limits. But there is a catch, when you tip over the free tier, pricing becomes very expensive.

Our consumption cost for the above scenario 1237.08 USD, this doesn’t include support. If we wan’t support included for this consumption the total cost becomes 1360.79 USD – try the AWS pricing calculator for DynamoDB.

Fig 6. DynamoDB pricing calculation

Google Cloud Datastore

Google Cloud offers 1GB of free stored data per day, with 50,000 free reads per day, 20,000 free writes per day and 20,000 free deletions per day.

We are looking at a total estimate of 319.40 USD per month – try the GCP pricing calculator for Datastore.

Fig 6. Google Cloud Datastore pricing calculation

GCP seems to come in at the lowest for the above scenario. But keep in mind the feature sets we looked at EARLIER.

Note

These figures were taken purely from the calculators provided by each platform without considering autoscaling.

3. Spin up – start small and grow

See that title, read it again. START SMALL.

Everything should begin with a small POC, test a small data set, connect it to an app, try pumping in some data and watch it autoscale. Make sure we don’t go overboard causing rapid growth with autoscaling. Play with the service and become familiar.

Do you know how often we see developers spin up services without even knowing how to use it?

All the time. Companies are getting better at prototyping but still not enough.

4. Autoscaling

We saw the above prices for each provider, but what about autoscaling?

We’ve done the base price comparison without considering autoscaling, but thats for the perfect world. In real world scenarios, apps require scaling to manage throughput load.

If we start at these prices as a base, what is going to happen if my reads double that month? How will it affect the cost?

Take an app that exists today, understand how the traffic may spike or sink on a day-to-day basis, and factor this influence into the base cost. Your app scenarios are going to be a case-by-case basis. MONITORING NoSQL metrics is advantageous because we can use it to predict our traffic.

5. Review, Rework and Expand (RRE)

Let’s talk about MONITORING. Having analytics from your consumed resources allows your to measure your spending, giving you the insight into what you need. Services such as AZURE’S APPLICATION INSIGHTS, or AWS CLOUD WATCH can provide you the insight. These should be spun up in parallel with your NoSQL services at POC stage. Start monitoring straight away so you have that insight. In NoSQL, we can monitor I/O on the service itself, but we also want metrics gathered and presented in tables or graphs, so every developer have the ability to overlook these metrics.

As our applications begin to expand, our NoSQL datasets grow, our throughput increases, this is where we must review our analytics above and determine if the current service is right for our new requirements, e.g:

  • Datasets size increases
  • throughput has doubled
  • maybe we need to archive to cold/archival storage, and clear our cache
  • maybe we need a cache or a CDN to reduce I/O on the storage service

See why we need to regular review our architecture?

In our next article, we will look at how we can setup a POC for a NoSQL service, and measure our spending at POC stage.

Posted in Blog, Learn