Azure develop solution for Cosmos DB

The following is from Azure Developer Training lab for AZ-203

Azure Cosmos DB

Microsoft Azure Cosmos DB is a database service native to Azure that focuses on providing a high-performance database regardless of your selected API or data model. Azure Cosmos DB offers multiple APIs and models that can be used interchangeably for various application scenarios.

 

Core functionality

Global replication

Azure Cosmos DB has a feature referred to as turnkey global distribution that automatically replicates data to other Azure datacenters across the globe without the need to manually write code or build a replication infrastructure.

 

Consistency levels

Commercial distributed databases fall into two categories: databases that do not offer well-defined, provable consistency choices at all and databases that offer two extreme programmability choices (strong versus eventual consistency). The former burdens application developers with the minutia of their replication protocols and expects them to make difficult tradeoffs among consistency, availability, latency, and throughput. The latter pressure them to choose one of the two extremes.

Azure Cosmos DB provides five consistency levels: strong, bounded-staleness, session, consistent prefix, and eventual. Bounded-staleness, session, consistent prefix, and eventual are referred to as relaxed consistency models, because they provide less consistency than strong, which is the most highly consistent model available.

The consistency levels range from very strong consistency—where reads are guaranteed to be visible across replicas before a write is fully committed across all replicas—to eventual consistency, where writes are readable immediately, and replicas are eventually consistent with the primary.

Consistency Level Description
Strong When a write operation is performed on your primary database, the write operation is replicated to the replica instances. The write operation is committed (and visible) on the primary only after it has been committed and confirmed by all replicas.
Bounded Stateless This level is similar to the Strong level with the major difference that you can configure how stale documents can be within replicas. Staleness refers to the quantity of time (or the version count) a replica document can be behind the primary document.
Session This level guarantees that all read and write operations are consistent within a user session. Within the user session, all reads and writes are monotonic and guaranteed to be consistent across primary and replica instances.
Consistent Prefix This level has loose consistency but guarantees that when updates show up in replicas, they will show up in the correct order (that is, as prefixes of other updates) without any gaps.
Eventual This level has the loosest consistency and essentially commits any write operation against the primary immediately. Replica transactions are asynchronously handled and will eventually (over time) be consistent with the primary. This tier has the best performance, because the primary database does not need to wait for replicas to commit to finalize it’s transactions.

Choose the right consistency level for your application

Distributed databases relying on replication for high availability, low latency or both, make the fundamental tradeoff between the read consistency vs. availability, latency, and throughput. Most commercially available distributed databases ask developers to choose between the two extreme consistency models: strong consistency and eventual consistency. Azure Cosmos DB allows developers to choose among the five well-defined consistency models: strong, bounded staleness, session, consistent prefix, and eventual.Each of these consistency models is well-defined, intuitive and can be used for specific real-world scenarios. Each of the five consistency models provide availability and performance tradeoffs and are backed by comprehensive SLAs. The following simple considerations will help you make the right choice in many common scenarios.

 

SQL API and Table API

Consider the following points if your application is built by using Cosmos DB SQL API or Table API

  • For many real-world scenarios, session consistency is optimal and it’s the recommended option.
  • If your application requires strong consistency, it is recommended that you use bounded staleness consistency level.
  • If you need stricter consistency guarantees than the ones provided by session consistency and single-digit-millisecond latency for writes, it is recommended that you use bounded staleness consistency level.
  • If your application requires eventual consistency, it is recommended that you use consistent prefix consistency level.
  • If you need less strict consistency guarantees than the ones provided by session consistency, it is recommended that you use consistent prefix consistency level.
  • If you need the highest availability and lowest latency, then use eventual consistency level.

 

Consistency guarantees in practice

You may get stronger consistency guarantees in practice. Consistency guarantees for a read operation correspond to the freshness and ordering of the database state that you request. Read-consistency is tied to the ordering and propagation of the write/update operations.

  • When the consistency level is set to bounded staleness, Cosmos DB guarantees that the clients always read the value of a previous write, with a lag bounded by the staleness window.
  • When the consistency level is set to strong, the staleness window is equivalent to zero, and the clients are guaranteed to read the latest committed value of the write operation.
  • For the remaining three consistency levels, the staleness window is largely dependent on your workload. For example, if there are no write operations on the database, a read operation with eventual, session, or consistent prefix consistency levels is likely toyield the same results as a read operation with strong consistency level.

If your Cosmos DB account is configured with a consistency level other than the strong consistency, you can find out the probability that your clients may get strong and consistent reads for your workloads by looking at the Probabilistic Bounded Staleness (PBS)metric. This metric is exposed in the Azure portal.

Probabilistic bounded staleness shows how eventual is your eventual consistency. This metric provides an insight into how often you can get a stronger consistency than the consistency level that you have currently configured on your Cosmos DB account. In other words, you can see the probability (measured in milliseconds) of getting strongly consistent reads for a combination of write and read regions.

 

Consistency levels and Azure Cosmos DB APIs

Five consistency models offered by Azure Cosmos DB are natively supported by the Azure Cosmos DB SQL API. When you use Azure Cosmos DB, the SQL API is the default.

Azure Cosmos DB also provides native support for wire protocol-compatible APIs for popular databases. Databases include MongoDB, Apache Cassandra, Gremlin, and Azure Table storage. These databases don’t offer precisely defined consistency models or SLA-backed guarantees for consistency levels. They typically provide only a subset of the five consistency models offered by Azure Cosmos DB. For the SQL API, Gremlin API, and Table API, the default consistency level configured on the Azure Cosmos DB account is used.

The following sections show the mapping between the data consistency requested by an OSS client driver for Apache Cassandra 4.x and MongoDB 3.4. This document also shows the corresponding Azure Cosmos DB consistency levels for Apache Cassandra and MongoDB.

 

Mapping between Apache Cassandra and Azure Cosmos DB consistency levels

This table shows the “read consistency” mapping between the Apache Cassandra 4.x client and the default consistency level in Azure Cosmos DB. The table shows multi-region and single-region deployments.

Apache Cassandra 4.x Azure Cosmos DB (multi-region) Azure Cosmos DB (single region)
ONE, TWO, THREE Consistent prefix Consistent prefix
LOCAL_ONE Consistent prefix Consistent prefix
QUORUM, ALL, SERIAL Bounded staleness is the default. Strong is in private preview. Strong
LOCAL_QUORUM Bounded staleness Strong
LOCAL_SERIAL Bounded staleness Strong

Mapping between MongoDB 3.4 and Azure Cosmos DB consistency levels

The following table shows the “read concerns” mapping between MongoDB 3.4 and the default consistency level in Azure Cosmos DB. The table shows multi-region and single-region deployments.

MongoDB 3.4 Azure Cosmos DB (multi-region) Azure Cosmos DB (single region)
Linearizable Strong Strong
Majority Bounded staleness Strong
Local Consistent prefix Consistent prefix

Azure Cosmos DB supported APIs

Today, Azure Cosmos DB can be accessed by using five different APIs. The underlying data structure in Azure Cosmos DB is a data model based on atom record sequences that enabled Azure Cosmos DB to support multiple data models. Because of the flexiblenature of atom record sequences, Azure Cosmos DB will be able to support many more models and APIs over time.

MongoDB API

The MongoDB API in Azure Cosmos DB acts as a massively scalable MongoDB service powered by the Azure Cosmos DB platform. It is compatible with existing MongoDB libraries, drivers, tools, and applications.

Table API

The Table API in Azure Cosmos DB is a key-value database service built to provide premium capabilities (for example, automatic indexing, guaranteed low latency, and global distribution) to existing Azure Table storage applications without making any appchanges.

Gremlin API

The Gremlin API in Azure Cosmos DB is a fully managed, horizontally scalable graph database service that makes it easy to build and run applications that work with highly connected datasets supporting Open Graph APIs (based on the Apache TinkerPopspecification, Apache Gremlin).

Apache Cassandra API

The Cassandra API in Azure Cosmos DB is a globally distributed Apache Cassandra service powered by the Azure Cosmos DB platform. Compatible with existing Apache Cassandra libraries, drivers, tools, and applications.

SQL API

The SQL API in Azure Cosmos DB is a JavaScript and JavaScript Object Notation (JSON) native API based on the Azure Cosmos DB database engine. The SQL API also provides query capabilities rooted in the familiar SQL query language. Using SQL, you canquery for documents based on their identifiers or make deeper queries based on properties of the document, complex objects, or even the existence of specific properties. The SQL API supports the execution of JavaScript logic within the database in the form ofstored procedures, triggers, and user-defined functions.

Migrating from NoSQL

Many NoSQL database engines are simple to get started with, but they provide problems as you scale, including:

  • Tedious set-up and maintenance requirements for a multiple-server database cluster
  • Expensive and complex high-availability solutions
  • Challenges in achieving end-to-end security, including encryption at rest and in flight
  • Required resource overprovisioning and unpredictable costs to achieve scale

Azure Cosmos DB has a MongoDB API and a Cassandra API to provide a NoSQL service offering for two of the most popular NoSQL database platforms. Both APIs are protocol compatible with the Cassandra API supporting CQLv4 and the MongoDB APIsupporting MongoDB v5. Many applications can be “lifted and shifted” to Azure Cosmos DB without the need to rewrite code.

To achieve a successful migration, it is important to keep a few tips in mind:

  • Instead of writing custom code, you should use native tools, such as the Cassandra shell, mongodump, and mongoexport.
  • Azure Cosmos DB containers should be allocated prior to the migration with the appropriate throughput levels set. Many of the tools will create containers for you with default settings that are not ideal.
  • Prior to migrating, you should increase the container’s throughput to at least 1,000 Request Units (RUs) per second so that the import tools are not throttled. The throughput can be reverted back to the typical values after the import is complete.

 

Managing Containers and Items

Resource hierarchy

The JSON documents stored in the Azure Cosmos DB SQL API are managed through a well-defined hierarchy of database resources. The Azure Cosmos DB hierarchical resource model consists of sets of resources under a database account, each addressable via a logical and stable URI. A set of resources is referred to as a feed.

Resource Description
Account A database account is associated with a set of databases and a fixed amount of large object (blob) storage for attachments. You can create one or more database accounts by using your Azure subscription. For more information, visit the pricing page.
Database A database is a logical container of document storage partitioned across collections. It is also a users container.
Collection (container) A collection is a container of JSON documents and the associated JavaScript application logic. Collections can span one or more partitions or servers and can scale to handle practically unlimited volumes of storage or throughput.
Document (item) User-defined (arbitrary) JSON content. By default, no schema needs to be defined nor do secondary indexes need to be provided for all the documents added to a collection.
Stored procedure (sproc) Application logic written in JavaScript that is registered with a collection and executed within the database engine as a transaction.
Trigger Application logic written in JavaScript executed before or after either an insert, replace, or delete operation.
User-defined function Application logic written in JavaScript. User-defined functions enable you to model a custom query operator and thereby extend the core SQL API query language.

Collections

In the Azure Cosmos DB SQL API, databases are essentially containers for collections. Collections are where you place individual documents. A collection is intrinsically elastic—it automatically grows and shrinks as you add or remove documents.

Each collection is assigned a throughput value, and that value dictates the maximum throughput for that collection and its corresponding documents. Alternatively, you can assign the throughput at the database level and share the throughput values among the collections in the database. If you have a set of documents that needs throughput beyond the limits of an individual collection, you can distribute the documents among multiple collections. Each collection has its own distinct throughput level.

If a particular collection is seeing spikes in throughput, you can manage its throughput level in isolation by increasing or decreasing the value. This change to the throughput level of a particular collection will not cause side effects for the other collections. This allows you to adjust to meet the performance needs of any workload in isolation.

You can also scale workloads across collections, if you have a workload that needs to be partitioned, you can scale that workload by distributing its associated documents across multiple collections. The SQL API for Azure Cosmos DB includes a client-side partition resolver that allows you to manage transactions and point them in code to the correct partition based on a partition key field.

Collection types

Azure Cosmos DB containers can be created as fixed or unlimited in the Azure portal. Fixed-size containers have a maximum limit of 10 GB and a 10,000 RU/s throughput. To create a container as unlimited, you must specify a partition key and a minimum throughput of 1,000 RU/s. Azure Cosmos DB containers can also be configured to share throughput among the containers in a database.

If you created a fixed container with no partition key or a throughput less than 1,000 RU/s, the container will not automatically scale. To migrate the data from a fixed container to an unlimited container, you need to use the data migration tool or the Change Feed library.

 

Partitioning

Azure Cosmos DB provides containers for storing data called collections (for documents), graphs, or tables. Containers are logical resources and can span one or more physical partitions or servers. The number of partitions is determined by Azure Cosmos DB based on the storage size and throughput provisioned for a container or set of containers.

If you are already familiar with the sharding pattern, the idea of dynamic partitioning is not very different.

 

A physical partition is a fixed amount of reserved solid-state drive (SSD) backend storage combined with a variable amount of compute resources (CPU and memory). Each physical partition is replicated for high availability. A physical partition is an internal concept of Azure Cosmos DB, and physical partitions are transient. Azure Cosmos DB will automatically scale the number of physical partitions based on your workload.

A logical partition is a partition within a physical partition that stores all the data associated with a single partition key value. Partition ranges can be dynamically subdivided to seamlessly grow the database as the application grows while simultaneously maintaining high availability. When a container meets the partitioning prerequisites, partitioning is completely transparent to your application. Azure Cosmos DB handles distributing data across physical and logical partitions and routing query requests to the right partition.