Azure developing scalable apps

The following is from Azure Developer Training lab for AZ-203

Common autoscale patterns

Note: Azure Monitor autoscale currently applies only to Virtual Machine Scale Sets, Cloud Services, App Service – Web Apps, and API Management services.

Scale based on CPU

You have a web app (/VMSS/cloud service role) and

You want to scale out/scale in based on CPU.
Additionally, you want to ensure there is a minimum number of instances.
Also, you want to ensure that you set a maximum limit to the number of instances you can scale to.

Scale differently on weekdays vs weekends

You have a web app (/VMSS/cloud service role) and

You want 3 instances by default (on weekdays)
You don’t expect traffic on weekends and hence you want to scale down to 1 instance on weekends.

Scale differently during holidays

You have a web app (/VMSS/cloud service role) and

You want to scale up/down based on CPU usage by default
However, during holiday season (or specific days that are important for your business) you want to override the defaults and have more capacity at your disposal.

Scale based on custom metric

You have a web front end and a API tier that communicates with the backend.

You want to scale the API tier based on custom events in the front end (example: You want to scale your checkout process based on the number of items in the shopping cart)

Understand Autoscale settings

Autoscale settings help ensure that you have the right amount of resources running to handle the fluctuating load of your application. You can configure Autoscale settings to be triggered based on metrics that indicate load or performance, or triggered at ascheduled date and time. This article takes a detailed look at the anatomy of an Autoscale setting. The article begins with the schema and properties of a setting, and then walks through the different profile types that can be configured. Finally, the article discusseshow the Autoscale feature in Azure evaluates which profile to execute at any given time.

Autoscale setting schema

To illustrate the Autoscale setting schema, the following Autoscale setting is used. It is important to note that this Autoscale setting has:

One profile.
Two metric rules in this profile: one for scale out, and one for scale in.
- The scale-out rule is triggered when the virtual machine scale set’s average percentage CPU metric is greater than 85 percent for the past 10 minutes.
- The scale-in rule is triggered when the virtual machine scale set’s average is less than 60 percent for the past minute.

{ 
 "id": "/subscriptions/s1/resourceGroups/rg1/providers/microsoft.insights/autoscalesettings/setting1", 
 "name": "setting1", 
 "type": "Microsoft.Insights/autoscaleSettings", 
 "location": "East US", 
 "properties": { 
 "enabled": true, 
 "targetResourceUri": "/subscriptions/s1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachineScaleSets/vmss1", 
 "profiles": [ { 
 "name": "mainProfile", 
 "capacity": { 
 "minimum": "1", 
 "maximum": "4", 
 "default": "1" 
 }, 
 "rules": [ { 
 "metricTrigger": { 
 "metricName": "Percentage CPU", 
 "metricResourceUri": "/subscriptions/s1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachineScaleSets/vmss1", 
 "timeGrain": "PT1M", 
 "statistic": "Average", 
 "timeWindow": "PT10M", 
 "timeAggregation": "Average", 
 "operator": "GreaterThan", 
 "threshold": 85 
 }, 
 "scaleAction": { 
 "direction": "Increase", 
 "type": "ChangeCount", 
 "value": "1", "cooldown": 
 "PT5M" } 
 }, { 
 "metricTrigger": { 
 "metricName": "Percentage CPU", 
 "metricResourceUri": "/subscriptions/s1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachineScaleSets/vmss1", 
 "timeGrain": "PT1M", 
 "statistic": "Average", 
 "timeWindow": "PT10M", 
 "timeAggregation": "Average", 
 "operator": "LessThan", 
 "threshold": 60 
 }, 
 "scaleAction": { 
 "direction": "Decrease", 
 "type": "ChangeCount", 
 "value": "1", "cooldown": "PT5M" 
 } 
 } ] 
 } ] 
 } 
 }

Section	Element name	Description
Setting	ID	The Autoscale setting’s resource ID. Autoscale settings are an Azure ResourceManager resource.
Setting	name	The Autoscale setting name.
Setting	location	The location of the Autoscale setting. This location can be different from the locationof the resource being scaled.
properties	targetResourceUri	The resource ID of the resource being scaled. You can only have one Autoscalesetting per resource.
properties	profiles	An Autoscale setting is composed of one or more profiles. Each time the Autoscaleengine runs, it executes one profile.
profile	name	The name of the profile. You can choose any name that helps you identify the profile.
profile	Capacity.maximum	The maximum capacity allowed. It ensures that Autoscale, when executing thisprofile, does not scale your resource above this number.
profile	Capacity.minimum	The minimum capacity allowed. It ensures that Autoscale, when executing thisprofile, does not scale your resource below this number.
profile	Capacity.default	If there is a problem reading the resource metric (in this case, the CPU of “vmss1”),and the current capacity is below the default, Autoscale scales out to the default.This is to ensure the availability of the resource. If the current capacity is alreadyhigher than the default capacity, Autoscale does not scale in.
profile	rules	Autoscale automatically scales between the maximum and minimum capacities, byusing the rules in the profile. You can have multiple rules in a profile. Typically thereare two rules: one to determine when to scale out, and the other to determine whento scale in.
rule	metricTrigger	Defines the metric condition of the rule.
metricTrigger	metricName	The name of the metric.
metricTrigger	metricResourceUri	The resource ID of the resource that emits the metric. In most cases, it is the same asthe resource being scaled. In some cases, it can be different. For example, you canscale a virtual machine scale set based on the number of messages in a storagequeue.
metricTrigger	timeGrain	The metric sampling duration. For example, TimeGrain = “PT1M” means that themetrics should be aggregated every 1 minute, by using the aggregation methodspecified in the statistic element.
metricTrigger	statistic	The aggregation method within the timeGrain period. For example, statistic =“Average” and timeGrain = “PT1M” means that the metrics should be aggregatedevery 1 minute, by taking the average. This property dictates how the metric issampled.
metricTrigger	timeWindow	The amount of time to look back for metrics. For example, timeWindow = “PT10M”means that every time Autoscale runs, it queries metrics for the past 10 minutes. Thetime window allows your metrics to be normalized, and avoids reacting to transientspikes.
metricTrigger	timeAggregation	The aggregation method used to aggregate the sampled metrics. For example,TimeAggregation = “Average” should aggregate the sampled metrics by taking theaverage. In the preceding case, take the ten 1-minute samples, and average them.
rule	scaleAction	The action to take when the metricTrigger of the rule is triggered.
scaleAction	direction	“Increase” to scale out, or “Decrease” to scale in.
scaleAction	value	How much to increase or decrease the capacity of the resource.
scaleAction	cooldown	The amount of time to wait after a scale operation before scaling again. Forexample, if cooldown = “PT10M”, Autoscale does not attempt to scale again foranother 10 minutes. The cooldown is to allow the metrics to stabilize after theaddition or removal of instances.

Autoscale profiles

There are three types of Autoscale profiles:

Regular profile: The most common profile. If you don’t need to scale your resource based on the day of the week, or on a particular day, you can use a regular profile. This profile can then be configured with metric rules that dictate when to scale out andwhen to scale in. You should only have one regular profile defined.The example profile used earlier in this article is an example of a regular profile. Note that it is also possible to set a profile to scale to a static instance count for your resource.
Fixed date profile: This profile is for special cases. For example, let’s say you have an important event coming up on December 26, 2017 (PST). You want the minimum and maximum capacities of your resource to be different on that day, but still scale on thesame metrics. In this case, you should add a fixed date profile to your setting’s list of profiles. The profile is configured to run only on the event’s day. For any other day, Autoscale uses the regular profile.
```
"profiles": [{ "name": " regularProfile", "capacity": { ... }, "rules": [{ ... }, { ... }] }, { "name": "eventProfile", "capacity": { ... }, "rules": [{ ... }, { ... }], "fixedDate": { "timeZone": "Pacific Standard Time", "start": "2017-12-26T00:00:00", "end": "2017-12-26T23:59:00" }} ]
```
Recurrence profile: This type of profile enables you to ensure that this profile is always used on a particular day of the week. Recurrence profiles only have a start time. They run until the next recurrence profile or fixed date profile is set to start. AnAutoscale setting with only one recurrence profile runs that profile, even if there is a regular profile defined in the same setting. The following example illustrates a way this profile is used:

Weekdays vs. weekends

Let’s say that on weekends, you want your maximum capacity to be 4. On weekdays, because you expect more load, you want your maximum capacity to be 10. In this case, your setting would contain two recurrence profiles, one to run on weekends and the otheron weekdays. The setting looks like this:

"profiles": [ { "name": "weekdayProfile", "capacity": { ... }, "rules": [{ ... }], "recurrence": { "frequency": "Week", "schedule": { "timeZone": "Pacific Standard Time", "days": [ "Monday" ], "hours": [ 0 ], "minutes": [ 0 ] } }} }, { "name": "weekendProfile", "capacity": { ... }, "rules": [{ ... }] "recurrence": { "frequency": "Week", "schedule": { "timeZone": "Pacific Standard Time", "days": [ "Saturday" ], "hours": [ 0 ], "minutes": [ 0 ] } } }]

The preceding setting shows that each recurrence profile has a schedule. This schedule determines when the profile starts running. The profile stops when it’s time to run another profile.

For example, in the preceding setting, “weekdayProfile” is set to start on Monday at 12:00 AM. That means this profile starts running on Monday at 12:00 AM. It continues until Saturday at 12:00 AM, when “weekendProfile” is scheduled to start running.

Autoscale evaluation

Given that Autoscale settings can have multiple profiles, and each profile can have multiple metric rules, it is important to understand how an Autoscale setting is evaluated. Each time the Autoscale job runs, it begins by choosing the profile that is applicable.Then Autoscale evaluates the minimum and maximum values, and any metric rules in the profile, and decides if a scale action is necessary.

Which profile will Autoscale pick?

Autoscale uses the following sequence to pick the profile:

It first looks for any fixed date profile that is configured to run now. If there is, Autoscale runs it. If there are multiple fixed date profiles that are supposed to run, Autoscale selects the first one.
If there are no fixed date profiles, Autoscale looks at recurrence profiles. If a recurrence profile is found, it runs it.
If there are no fixed date or recurrence profiles, Autoscale runs the regular profile.

How does Autoscale evaluate multiple rules?

After Autoscale determines which profile to run, it evaluates all the scale-out rules in the profile (these are rules with direction = “Increase”).

If one or more scale-out rules are triggered, Autoscale calculates the new capacity determined by the scaleAction of each of those rules. Then it scales out to the maximum of those capacities, to ensure service availability.

For example, let’s say there is a virtual machine scale set with a current capacity of 10. There are two scale-out rules: one that increases capacity by 10 percent, and one that increases capacity by 3 counts. The first rule would result in a new capacity of 11, andthe second rule would result in a capacity of 13. To ensure service availability, Autoscale chooses the action that results in the maximum capacity, so the second rule is chosen.

If no scale-out rules are triggered, Autoscale evaluates all the scale-in rules (rules with direction = “Decrease”). Autoscale only takes a scale-in action if all of the scale-in rules are triggered.

Autoscale calculates the new capacity determined by the scaleAction of each of those rules. Then it chooses the scale action that results in the maximum of those capacities to ensure service availability.

For example, let’s say there is a virtual machine scale set with a current capacity of 10. There are two scale-in rules: one that decreases capacity by 50 percent, and one that decreases capacity by 3 counts. The first rule would result in a new capacity of 5, and the second rule would result in a capacity of 7. To ensure service availability, Autoscale chooses the action that results in the maximum capacity, so the second rule is chosen.

How to set autoscale by using a custom metric

Getting started

This lesson assumes that you have a web app with application insights configured. If you don’t have one already, you can set up Application Insights for your ASP.NET website.

Open Azure portal
Click on Azure Monitor icon in the left navigation pane.
Click on Autoscale setting to view all the resources for which auto scale is applicable, along with its current autoscale status
Open the Autoscale blade in Azure Monitor and select a resource you want to scaleNote: The steps below use an app service plan associated with a web app that has App Insights configured.
In the scale setting blade for the resource, notice that the current instance count is Click on Enable autoscale.
Provide a name for the scale setting, and the click on Add a rule. Notice the scale rule options that opens as a context pane in the right hand side. By default, it sets the option to scale your instance count by 1 if the CPU percentage of the resource exceeds70%. Change the metric source at the top to Application Insights, select the app insights resource in the Resource dropdown and then select the custom metric based on which you want to scale.
Similar to the step above, add a scale rule that will scale in and decrease the scale count by 1 if the custom metric is below a threshold.
Set the you instance limits. For example, if you want to scale between 2-5 instances depending on the custom metric fluctuations, set Minimum to ‘2’, Maximum to ‘5’ and Default to ‘2’Note: In case there is a problem reading the resource metrics and thecurrent capacity is below the default capacity, then to ensure the availability of the resource, Autoscale will scale out to the default value. If the current capacity is already higher than default capacity, Autoscale will not scale in.
Click on Save

Best practices for Autoscale

Autoscale concepts

A resource can have only one autoscale setting
An autoscale setting can have one or more profiles and each profile can have one or more autoscale rules.
An autoscale setting scales instances horizontally, which is out by increasing the instances and in by decreasing the number of instances. An autoscale setting has a maximum, minimum, and default value of instances.
An autoscale job always reads the associated metric to scale by, checking if it has crossed the configured threshold for scale-out or scale-in. You can view a list of metrics that autoscale can scale by at Azure Monitor autoscaling common metrics.
All thresholds are calculated at an instance level. For example, “scale out by one instance when average CPU > 80% when instance count is 2”, means scale-out when the average CPU across all instances is greater than 80%.
All autoscale failures are logged to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there is an autoscale failure.
Similarly, all successful scale actions are posted to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there is a successful autoscale action. You can also configure email orwebhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting.

Autoscale best practices

Use the following best practices as you use autoscale.

Ensure the maximum and minimum values are different and have an adequate margin between them

If you have a setting that has minimum=2, maximum=2 and the current instance count is 2, no scale action can occur. Keep an adequate margin between the maximum and minimum instance counts, which are inclusive. Autoscale always scales between theselimits.

Manual scaling is reset by autoscale min and max

If you manually update the instance count to a value above or below the maximum, the autoscale engine automatically scales back to the minimum (if below) or the maximum (if above). For example, you set the range between 3 and 6. If you have one runninginstance, the autoscale engine scales to three instances on its next run. Likewise, if you manually set the scale to eight instances, on the next run autoscale will scale it back to six instances on its next run. Manual scaling is temporary unless you reset the autoscalerules as well.

Always use a scale-out and scale-in rule combination that performs an increase and decrease

If you use only one part of the combination, autoscale will only take action in a single direction (scale out, or in) until it reaches the maximum, or minimum instance counts of defined in the profile. This is not optimal, ideally you want your resource to scale up attimes of high usage to ensure availability. Similarly, at times of low usage you want your resource to scale down, so you can realize cost savings.

Choose the appropriate statistic for your diagnostics metric

For diagnostics metrics, you can choose among Average, Minimum, Maximum and Total as a metric to scale by. The most common statistic is Average.

Choose the thresholds carefully for all metric types

We recommend carefully choosing different thresholds for scale-out and scale-in based on practical situations.

We do not recommend autoscale settings like the examples below with the same or very similar threshold values for out and in conditions:

Increase instances by 1 count when Thread Count <= 600
Decrease instances by 1 count when Thread Count >= 600

Let’s look at an example of what can lead to a behavior that may seem confusing. Consider the following sequence.

Assume there are two instances to begin with and then the average number of threads per instance grows to 625.
Autoscale scales out adding a third instance.
Next, assume that the average thread count across instance falls to 575.
Before scaling down, autoscale tries to estimate what the final state will be if it scaled in. For example, 575 x 3 (current instance count) = 1,725 / 2 (final number of instances when scaled down) = 862.5 threads. This means autoscale would have toimmediately scale-out again even after it scaled in, if the average thread count remains the same or even falls only a small amount. However, if it scaled up again, the whole process would repeat, leading to an infinite loop.
To avoid this situation (termed “flapping”), autoscale does not scale down at all. Instead, it skips and reevaluates the condition again the next time the service’s job executes. This can confuse many people because autoscale wouldn’t appear to work when theaverage thread count was 575.

Estimation during a scale-in is intended to avoid “flapping” situations, where scale-in and scale-out actions continually go back and forth. Keep this behavior in mind when you choose the same thresholds for scale-out and in.

We recommend choosing an adequate margin between the scale-out and in thresholds. As an example, consider the following better rule combination.

Increase instances by 1 count when CPU% >= 80
Decrease instances by 1 count when CPU% <= 60

In this case

Assume there are 2 instances to start with.
If the average CPU% across instances goes to 80, autoscale scales out adding a third instance.
Now assume that over time the CPU% falls to 60.
Autoscale’s scale-in rule estimates the final state if it were to scale-in. For example, 60 x 3 (current instance count) = 180 / 2 (final number of instances when scaled down) = 90. So autoscale does not scale-in because it would have to scale-out againimmediately. Instead, it skips scaling down.
The next time autoscale checks, the CPU continues to fall to 50. It estimates again – 50 x 3 instance = 150 / 2 instances = 75, which is below the scale-out threshold of 80, so it scales in successfully to 2 instances.

Considerations for scaling threshold values for special metrics

For special metrics such as Storage or Service Bus Queue length metric, the threshold is the average number of messages available per current number of instances. Carefully choose the threshold value for this metric.

Let’s illustrate it with an example to ensure you understand the behavior better.

Increase instances by 1 count when Storage Queue message count >= 50
Decrease instances by 1 count when Storage Queue message count <= 10

Consider the following sequence:

There are two storage queue instances.
Messages keep coming and when you review the storage queue, the total count reads 50. You might assume that autoscale should start a scale-out action. However, note that it is still 50/2 = 25 messages per instance. So, scale-out does not occur. For the firstscale-out to happen, the total message count in the storage queue should be 100.
Next, assume that the total message count reaches 100.
A 3rd storage queue instance is added due to a scale-out action. The next scale-out action will not happen until the total message count in the queue reaches 150 because 150/3 = 50.
Now the number of messages in the queue gets smaller. With three instances, the first scale-in action happens when the total messages in all queues add up to 30 because 30/3 = 10 messages per instance, which is the scale-in threshold.

Considerations for scaling when multiple rules are configured in a profile

There are cases where you may have to set multiple rules in a profile. The following set of autoscale rules are used by services use when multiple rules are set.

On scale out, autoscale runs if any rule is met. On scale-in, autoscale require all rules to be met.

To illustrate, assume that you have the following four autoscale rules:

If CPU < 30 %, scale-in by 1
If Memory < 50%, scale-in by 1
If CPU > 75%, scale-out by 1
If Memory > 75%, scale-out by 1

Then the follow occurs:

If CPU is 76% and Memory is 50%, we scale-out.
If CPU is 50% and Memory is 76% we scale-out.

On the other hand, if CPU is 25% and memory is 51% autoscale does not scale-in. In order to scale-in, CPU must be 29% and Memory 49%.

Always select a safe default instance count

The default instance count is important autoscale scales your service to that count when metrics are not available. Therefore, select a default instance count that’s safe for your workloads.

Configure autoscale notifications

Autoscale will post to the Activity Log if any of the following conditions occur:

Autoscale issues a scale operation
Autoscale service successfully completes a scale action
Autoscale service fails to take a scale action.
Metrics are not available for autoscale service to make a scale decision.
Metrics are available (recovery) again to make a scale decision.

You can also use an Activity Log alert to monitor the health of the autoscale engine. In addition to using activity log alerts, you can also configure email or webhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting.

Querying resources using Azure CLI

The Azure Command-Line Interface (Azure CLI) is the Microsoft cross-platform command-line experience for managing Azure resources. You can use it in your browser with Azure Cloud Shell or install it on macOS, Linux, or Windows and run it from the command line. Azure CLI is optimized for managing and administering Azure resources from the command line and for building automation scripts that work against the Azure Resource Manager.

The Azure CLI uses the –query argument to execute a JMESPath query on the results of commands. JMESPath is a query language for JavaScript Object Notation (JSON) that gives you the ability to select and present data from Azure CLI output. These queries are executed on the JSON output before they perform any other display formatting. The –query argument is supported by all commands in the Azure CLI.

Many CLI commands will return more than one value. These commands always return a JSON array instead of a JSON document. Arrays can have their elements accessed by index, but there’s never an order guarantee from the Azure CLI. To make the arrays easier to query, we can flatten them using the JMESPath [] operator.

In the following example, we use the az vm list command to query for a list of virtual machine (VM) instances:

az vm list

The query will return an array of large JSON objects for each VM in your subscription:

[ { "availabilitySet": null, "diagnosticsProfile": null, "hardwareProfile": { "vmSize": "Standard_B1s" }, "id": "/subscriptions/9103844d-1370-4716-b02b-69ce936865c6/resourceGroups/VM/providers/Microsoft.Compute/virtualMachines/simple", "identity": null, "instanceView": null, "licenseType": null, "location": "eastus", "name": "simple", "networkProfile": { "networkInterfaces": [{ "id": "/subscriptions/9103844d-1370-4716-b02b-69ce936865c6/resourceGroups/VM/providers/Microsoft.Network/networkInterfaces/simple159", "primary": null, "resourceGroup": "VM" }] }, "osProfile": { "adminPassword": null, "adminUsername": "simple", "computerName": "simple", "customData": null, "linuxConfiguration": { "disablePasswordAuthentication": false, "ssh": null }, "secrets": [], "windowsConfiguration": null }, "plan": null, "provisioningState": "Creating", "resourceGroup": "VM", "resources": null, "storageProfile": { "dataDisks": [], "imageReference": { "id": null, "offer": "UbuntuServer", "publisher": "Canonical", "sku": "17.10", "version": "latest" }, "osDisk": { "caching": "ReadWrite", "createOption": "FromImage", "diskSizeGb": 30, "encryptionSettings": null, "image": null, "managedDisk": { "id": "/subscriptions/9103844d-1370-4716-b02b-69ce936865c6/resourceGroups/VM/providers/Microsoft.Compute/disks/simple_OsDisk_1_4da948f5ef1a4232ad2f632077326d0a", "resourceGroup": "VM", "storageAccountType": "Premium_LRS" }, "name": "simple_OsDisk_1_4da948f5ef1a4232ad2f632077326d0a", "osType": "Linux", "vhd": null, "writeAcceleratorEnabled": null } }, "tags": null, "type": "Microsoft.Compute/virtualMachines", "vmId": "6aed2e80-64b2-401b-a8a0-b82ac8a6ed5c", "zones": null }, { ... } ]

Using the --query argument, we can specify project-specific fields to make the JSON object more useful and easier to read. This is useful if you are deserializing the JSON object into a specific type in your code:

az vm list –query ‘[].{name:name, image:storageProfile.imageReference.offer}’

[ { "image": "UbuntuServer", "name": "linuxvm" }, { "image": "WindowsServer", "name": "winvm" } ]

Using the [ ] operator, you can create queries that filter your result set by comparing the values of various JSON properties:

az vm list --query "[?starts_with(storageProfile.imageReference.offer, 'WindowsServer')]"

You can even combine filtering and projection to create custom queries that only return the resources you need and project only the fields that are useful to your application:

az vm list --query "[?starts_with(storageProfile.imageReference.offer, 'Ubuntu')].{name:name, id:vmId}"

[ { "name": "linuxvm", "id": "6aed2e80-64b2-401b-a8a0-b82ac8a6ed5c" } ]

Querying resources using the fluent Azure SDK

In a manner similar to how you use the Azure CLI, you can use the Azure SDK to query resources in your subscription. The SDK may be a better option if you intend to write code to find connection information for a specific application instance. For example, youmay need to write code to get the IP address of a specific VM in your subscription.

Connecting using the fluent Azure SDK

To use the APIs in the Azure management libraries for Microsoft .NET, as the first step, you need to create an authenticated client. The Azure SDK requires that you invoke the Azure.Authenticate static method to return an object that can fluently queryresources and access their metadata. The Authenticate method requires a parameter that specifies an authorization file:

Azure azure = Azure.Authenticate(“azure.auth”).WithDefaultSubscription();

The authentication file, referenced as azure.auth above, contains information necessary to access your subscription using a service principal. The authorization file will look similar to the format below:

{ "clientId": "b52dd125-9272-4b21-9862-0be667bdf6dc", "clientSecret": "ebc6e170-72b2-4b6f-9de2-99410964d2d0", "subscriptionId": "ffa52f27-be12-4cad-b1ea-c2c241b6cceb", "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47", "activeDirectoryEndpointUrl": "https://login.microsoftonline.com", "resourceManagerEndpointUrl": "https://management.azure.com/", "activeDirectoryGraphResourceId": "https://graph.windows.net/", "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/", "galleryEndpointUrl": "https://gallery.azure.com/", "managementEndpointUrl": "https://management.core.windows.net/" }

If you do not already have a service principal, you can generate a service principal and this file using the Azure CLI:

az ad sp create-for-rbac --sdk-auth > azure.auth

Listing virtual machines using the fluent Azure SDK

Once you have a variable of type IAzure, you can access various resources by using properties of the IAzure interface. For example, you can access VMs using the VirtualMachines property in the manner displayed below:

azure.VirtualMachines

The properties have both synchronous and asynchronous versions of methods to perform actions such as Create, Delete, List, and Get. If we wanted to get a list of VMs asynchronously, we could use the ListAsync method:

var vms = await azure.VirtualMachines.ListAsync(); foreach(var vm in vms) { Console.WriteLine(vm.Name); }

You can also use any language-integrated query mechanism, like language-integrated query (LINQ) in C#, to filter your VM list to a specific subset of VMs that match a filter criteria:

var allvms = await azure.VirtualMachines.ListAsync(); IVirtualMachine targetvm = allvms.Where(vm => vm.Name == "simple").SingleOrDefault(); Console.WriteLine(targetvm?.Id);

Gathering virtual machine metadata to determine the IP address

Now that we can filter to a specific VM, we can access various properties of the IVirtualMachine interface and other related interfaces to get that resource’s IP address.

To start, the IVirtualMachine.GetPrimaryNetworkInterface method implementation will return the network adapter that we need to access the VM:

INetworkInterface targetnic = targetvm.GetPrimaryNetworkInterface();

The INetworkInterface interface has a property named PrimaryIPConfiguration that will get the configuration of the primary IP address for the current network adapter:

INicIPConfiguration targetipconfig = targetnic.PrimaryIPConfiguration;

The INicIPConfiguration interface has a method named GetPublicIPAddress that will get the IP address resource that is public and associated with the current specified configuration:

IPublicIPAddress targetipaddress = targetipconfig.GetPublicIPAddress();

Finally, the IPublicIPAddress interface has a property named IPAddress that contains the current IP address as a string value:

Console.WriteLine($"IP Address:\t{targetipaddress.IPAddress}");

Your application can now use this specific IP address to communicate directly with the intended compute instance.

Transient errors

An application that communicates with elements running in the cloud has to be sensitive to the transient faults that can occur in this environment. Faults include the momentary loss of network connectivity to components and services, the temporary unavailabilityof a service, or timeouts that occur when a service is busy.

These faults are typically self-correcting, and if the action that triggered a fault is repeated after a suitable delay, it’s likely to be successful. For example, a database service that’s processing a large number of concurrent requests can implement a throttlingstrategy that temporarily rejects any further requests until its workload has eased. An application trying to access the database might fail to connect, but if it tries again after a delay, it might succeed.

Handling transient errors

In the cloud, transient faults aren’t uncommon, and an application should be designed to handle them elegantly and transparently. This minimizes the effects faults can have on the business tasks the application is performing.

If an application detects a failure when it tries to send a request to a remote service, it can handle the failure using the following strategies:

Cancel: If the fault indicates that the failure isn’t transient or is unlikely to be successful if repeated, the application should cancel the operation and report an exception. For example, an authentication failure caused by providing invalid credentials is notlikely to succeed no matter how many times it’s attempted.
Retry: If the specific fault reported is unusual or rare, it might have been caused by unusual circumstances, such as a network packet becoming corrupted while it was being transmitted. In this case, the application could retry the failing request againimmediately, because the same failure is unlikely to be repeated, and the request will probably be successful.
Retry after a delay: If the fault is caused by one of the more commonplace connectivity or busy failures, the network or service might need a short period of time while the connectivity issues are corrected or the backlog of work is cleared. The applicationshould wait for a suitable amount of time before retrying the request.

For the more common transient failures, the period between retries should be chosen to spread requests from multiple instances of the application as evenly as possible. This reduces the chance of a busy service continuing to be overloaded. If many instances of anapplication are continually overwhelming a service with retry requests, it’ll take the service longer to recover.

If the request still fails, the application can wait and make another attempt. If necessary, this process can be repeated with increasing delays between retry attempts, until some maximum number of requests have been attempted. The delay can be increasedincrementally or exponentially depending on the type of failure and the probability that it’ll be corrected during this time.

Retrying after a transient error

The following diagram illustrates invoking an operation in a hosted service using this pattern. If the request is unsuccessful after a predefined number of attempts, the application should treat the fault as an exception and handle it accordingly.

The application invokes an operation on a hosted service. The request fails, and the service host responds with HTTP response code 500 (internal server error).
The application waits for a short interval and tries again. The request still fails with HTTP response code 500.
The application waits for a longer interval and tries again. The request succeeds with HTTP response code 200 (OK).

The application should wrap all attempts to access a remote service in code that implements a retry policy matching one of the strategies listed above. Requests sent to different services can be subject to different policies. Some vendors provide libraries thatimplement retry policies, where the application can specify the maximum number of retries, the amount of time between retry attempts, and other parameters.

An application should log the details of faults and failing operations. This information is useful to operators. If a service is frequently unavailable or busy, it’s often because the service has exhausted its resources. You can reduce the frequency of these faults byscaling out the service. For example, if a database service is continually overloaded, it might be beneficial to partition the database and spread the load across multiple servers.

Handling transient errors in code

This example in C# illustrates an implementation of this pattern. The OperationWithBasicRetryAsync method, shown below, invokes an external service asynchronously through the TransientOperationAsync method. The details of theTransientOperationAsync method will be specific to the service and are omitted from the sample code:

private int retryCount = 3; private readonly TimeSpan delay = TimeSpan.FromSeconds(5); public async Task OperationWithBasicRetryAsync() { int currentRetry = 0; for (;;) { try { await TransientOperationAsync(); break; } catch (Exception ex) { Trace.TraceError(“Operation Exception”); currentRetry++; if (currentRetry > this.retryCount || !IsTransient(ex)) { throw; } } await Task.Delay(delay); } } private async Task TransientOperationAsync() { … }

The statement that invokes this method is contained in a try/catch block wrapped in a for loop. The for loop exits if the call to the TransientOperationAsync method succeeds without throwing an exception. If the TransientOperationAsync method fails, the catch block examines the reason for the failure. If it’s believed to be a transient error, the code waits for a short delay before retrying the operation.

The for loop also tracks the number of times that the operation has been attempted, and if the code fails three times, the exception is assumed to be more long lasting. If the exception isn’t transient or it’s long lasting, the catch handler will throw an exception. This exception exists in the for loop and should be caught by the code that invokes the OperationWithBasicRetryAsync method.

Detecting if an error is transient in code

The IsTransient method, shown below, checks for a specific set of exceptions that are relevant to the environment the code is run in. The definition of a transient exception will vary according to the resources being accessed and the environment the operation is being performed in:

private bool IsTransient(Exception ex) { if (ex is OperationTransientException) return true; var webException = ex as WebException; if (webException != null) { return new[] { WebExceptionStatus.ConnectionClosed, WebExceptionStatus.Timeout, WebExceptionStatus.RequestCanceled }.Contains(webException.Status); } return false; }

Common autoscale patterns

Scale based on CPU

Scale differently on weekdays vs weekends

Scale differently during holidays

Scale based on custom metric

Understand Autoscale settings

Autoscale setting schema

Autoscale profiles

Weekdays vs. weekends

Autoscale evaluation

Which profile will Autoscale pick?

How does Autoscale evaluate multiple rules?

How to set autoscale by using a custom metric

Getting started

Best practices for Autoscale

Autoscale concepts

Autoscale best practices

Ensure the maximum and minimum values are different and have an adequate margin between them

Manual scaling is reset by autoscale min and max

Always use a scale-out and scale-in rule combination that performs an increase and decrease

Choose the appropriate statistic for your diagnostics metric

Choose the thresholds carefully for all metric types

Considerations for scaling threshold values for special metrics

Considerations for scaling when multiple rules are configured in a profile

Always select a safe default instance count

Configure autoscale notifications

Querying resources using Azure CLI

Querying resources using the fluent Azure SDK

Connecting using the fluent Azure SDK

Listing virtual machines using the fluent Azure SDK

Gathering virtual machine metadata to determine the IP address

Transient errors

Handling transient errors

Retrying after a transient error

Handling transient errors in code

Detecting if an error is transient in code

Related