Azure Storage

The following is from Azure Administrator Training lab for AZ-103
These are some reference notes using Azure Storage services as of 12/2018. The main storage services available are listed below and explained further in the sections below.

  • Azure Blobs (objects/media)
  • Azure Data Lake Storage (gen2)
  • Azure Files (File Server)
  • Azure Queues
  • Azure Tables
  • Azure Cosmos DB
  • Azure Queues
  • Azure Disks (VM Images)

For each of these services Azure provides the following the following benefits

  • Durable and highly available
  • Secure
  • Scalable
  • Managed
  • Accessible

Each storage service requires a storage account. A storage account can contain one or many of the above services. This is explained further below.

Note that relational databases are not listed as part of the storage services above. Databases are in a category of its own. This is briefly described at the bottom of this article.

 

Azure Storage Accounts

Azure storage accounts are what contains each of the Azure Storage services that were listed above. It serves as a namespace and each account can contain one more of those services. Storage accounts define and control the following when using Azure Storage services:

  • Billing
  • Access
  • Authorization
  • Data Redundancy (multi-tier)
  • Encryption (at rest)
  • Scaling
  • Performance
  • Disaster Recovery definitions
  • Data Migration

Through the storage account we get the access urls. For example, these could look like:

  • http://mystorageaccount.blob.core.windows.net
  • http://mystorageaccount.table.core.windows.net
  • http://mystorageaccount.queue.core.windows.net

There 3 types of storage accounts, each having different pricing models. When creating an account the type of account needs to be considered first. The account types can be changed after creation.

  • General Purpose v2 Account
    • This type of account provides the lowest per-gigbyte capacity pricing from Azure. It has all the features of the v2 Account plus all the latest features released in Azure. These features include integration functionality of the services underneath this account. As such, there can be grannular control when using the v2 Account. Microsoft recommends using the General Purpose v2 Account for most scenarios. The account supports the following storage services:
      • Blobs
      • Files
      • Disks
      • Queues
      • Tables
  • General Purpose v1 Account
    • This account type does not contain all latest Azure features and therefore can be more costly than the v2 Account. Microsoft recommends using this account type when working with applications using Azure classic deployment, applications that are transaction intensive but may not require large capacity, and use Storage Service API version earlier than 2014-02-14. The following storage services are supported under this account:
      • Blobs
      • Files
      • Disks
      • Queues
      • Tables
  • Blob Storage Account
    • This account is specialized for blob storage only. It has all the same features as the General Purpose v2 account but does not support page blobs.

There 2 types of Account Performance:

  • Standard
  • Premium

The following articles walk through the process of creating a storage account, a blob container and then uploading some data into it:

https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal

 

Azure Blobs

Optimized for massive amounts of unstructured data such as text or binary. This service is ideal for serving media files, distributed access, streaming of audio or video, backup, archiving, or disaster recovery. Data can be accessed over HTTPS.

When working with Blob storage, there are 3 types of resources involved in creating a blob storage.

  • Storage Account
  • Container (in a storage account)
  • Blob (in a container)

When creating a blob storage, there are 3 different public access levels:

  • Private (default)
  • Blob (anonymous read for the blob only)
  • Container (anonymous read for blob and container – public can see list of all blobs under the container)

There are 3 tiers when creating blob storage

  • Hot – data readily accessible at all times
  • Cool – data can take some more time to access (but cheaper)
  • Archive – data can take even more time, perhaps 1+ hours (but even cheaper)

When uploading an object into the blob storage, we can define the following:

  • Authentication Type
    • OAuth (preview state)
    • SAS
  • Blob Type – cannot be changed once set
    • Block blob – text/binary up to 4.7TB. Each block can be managed individually
    • Append blob – similar to block blob but optimized for append operations (ex log files)
    • Page blob – random access files up to 8TB
  • Blob Size
    • 64 KB up to 100MB

Once an object has been uploaded you can view the properties of the object to get it’s url.

 

 

Azure Files

Using SMB (Server Message Block) the Azure Files service provides network file sharing, much like a network file server. Access can be controlled using SAS tokens (Shared Access Signature) and support for Active Directory-based authentication and ACL (Access Control Lists) are to be available soon.

To use this service users can do a ‘lift and shift’ of files into the cloud – where any mappings to previous NAS would be migrated and virtually no changes would be seen to end users. Mounts support are Windows, Linux and MacOS.

 

Queue Storage

The Queue Storage service provides message based storage where each message can be up to 64KB. A single queue can contain millions of messages. Generally, the messages are accessed asynchronously. This service provides benefits like queued processing. For example, when an image is uploaded into Azure Blob we could drop a message in the queue instructing Azure Functions to process metadata from the image to another database.

 

Table Storage / Cosmos DB

Azure Table Storage is a NoSQL solution similar to that of Cosmos DB. This is a key/value based storage service much like MongoDB. It uses schemaless design patterns and provides flexibility and speed for quick data access. Cosmos DB is a premium storage service and can be set to perform much faster than Table Storage. When using Cosmos DB the read RU and write WU is defined up front. This defines the performance but also affects the overall cost of the data service.

 

Disk Storage

The Disk Storage service is used for storing virtual machine images. This is part of the premium pricing tier.

 

Azure SQL Data Warehouse

This is a warehouse to store data that can be used for business intelligence tools (like Power BI) and other data analytics tools. This is a relational database but optimized for reporting purposes of large sizes (petabytes). It also provides data encryption at rest.

 

Azure Data Lake Storage

The Data Lake Storage service is an extension of the Azure Blob storage for dealing with big data analytics. It is designed to for solutions requiring petabytes of data with gigabits of throughput. The system uses HDFS (Hadoop Distributed File System) and can be accessed from other service such as Azure HDInsight, Databricks and SQL Datawarehouse.

This can support structured or unstructured data, but does not require a schema definition at write. Similar to the data warehouse, this is optimized for reporting purposes. There are no limits to file sizes and the data can be stored in native format.

 

OLTP vs OLAP

Azure Data Warehouse and Lake Store are OLAP (Online Analtytical Processing) based databases. All the other data storage services including databases are OLTP (Online Transactional Processing) based databases.

OLTP based databases are focused on fast query processing and handling short transactions (INSERT, UPDATE, DELETE). The database is designed (structured schema usually in 3rd normal form 3NF) such that data integrity remains high while transaction processing times are low. OLAP are focused more on historical or archived data and therefore have low volume of transactions. However, queries in OLAP can be very complex involving deep aggregations. The database schema is design in such a way to support these queries (usually using a Star Schema). The design also needs to support the aggregations where the query result is often a very large data set.

Note that although Azure SQL Data Warehouse and Data Lake are OLAP based databases, the two are very distinct. The data lake stores raw data (native format) whereas the data warehouse requires some upfront processing of cleaning and organizing the data into the warehouse. This processing and organizing of data is more done on the read step when using a data lake. See my other article on data schema on read vs data schema on write.

 

Azure Databases

The following table describes some of the database-related services that are available through Azure. These include both relational and non-relational databases. Each database is describe further below.

If you want… Use this
A globally distributed multi-model database, with support for NoSQL choices, with industry-leading performance and SLAs Azure Cosmos DB
A fully managed relational database that provisions quickly, scales on the fly, and includes built-in intelligence and security Azure SQL Database
A fully managed, scalable MySQL relational database with high availability and security built in at no extra cost Azure Database for MySQL
A fully managed, scalable PostgreSQL relational database with high availability and security built in at no extra cost Azure Database for PostgreSQL
To host enterprise SQL Server apps in the cloud SQL Server on Virtual Machines
A fully managed, elastic data warehouse with security at every level of scale at no extra cost SQL Data Warehouse
Help migrating your databases to the cloud with no application code changes Azure Database Migration Service
High throughput and consistent low-latency data access to power fast, scalable applications Azure Cache for Redis
A NoSQL key-value store for rapid development using massive semi-structured datasets Table Storage
A fully managed, scalable MariaDB relational database with high availability and security built in at no extra cost Azure Database for MariaDB

 

Azure SQL Databases

This is SQL Server service fully managed by Azure. It is scalable and reliable (99.99% SLA) using geo-replication, automatic tuning, threat detection and dynamic data masking. It is also backed up every 5 minutes automatically by Azure. Tools such as SSMS (Managed Studio) and Visual Studio can be used with it as well.

Azure Databases for MySQL

This MySQL service fully managed by Azure. It is also scalable and reliable (99.99% SLA). This is a popular option for LAMP stack based applications.

Azure Databases for PostgreSQL

Fully managed, scalable and reliable (99.99 % SLA). Azure supports PostgreSQL extensions such as string encryption and supports multiple languages for functions.

Azure Databases for MariaDB

This similar to MySQL and the other relation database services. It is fully managed, scalable and reliable (99.99% SLA). MariaDB can be used to support Apache Cassandra.

Cosmos DB

This is Azure’s non relational database service. That means no pre-defined schama is needed. Like the other database services Azure provides fully managed, scalable and reliable (99.99% SLA) with geo-replication. But on top of these standard offerings, Azure Cosmos DB also guarantees latency performances of <10ms for reads and <15ms on writes.

It can store data using the following APIs:

  • DocumentDB API
  • MongoDB API
  • Table API
  • Graph API
  • Apache Cassandra API

 

References

Storage Account Overview
https://docs.microsoft.com/en-us/azure/storage/common/storage-account-overview

Storage Account Encryption
https://docs.microsoft.com/en-us/azure/storage/common/storage-service-encryption

Storage Account Quick Start
https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal

Blob Storage
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

OLTP vs OLAP
https://www.datawarehouse4u.info/OLTP-vs-OLAP.html