The following is from Azure Developer Training lab for AZ-203
Common autoscale patterns
Note: Azure Monitor autoscale currently applies only to Virtual Machine Scale Sets, Cloud Services, App Service – Web Apps, and API Management services.
Scale based on CPU
You have a web app (/VMSS/cloud service role) and
- You want to scale out/scale in based on CPU.
- Additionally, you want to ensure there is a minimum number of instances.
- Also, you want to ensure that you set a maximum limit to the number of instances you can scale to.
Scale differently on weekdays vs weekends
You have a web app (/VMSS/cloud service role) and
- You want 3 instances by default (on weekdays)
- You don’t expect traffic on weekends and hence you want to scale down to 1 instance on weekends.
Scale differently during holidays
You have a web app (/VMSS/cloud service role) and
- You want to scale up/down based on CPU usage by default
- However, during holiday season (or specific days that are important for your business) you want to override the defaults and have more capacity at your disposal.
Scale based on custom metric
You have a web front end and a API tier that communicates with the backend.
- You want to scale the API tier based on custom events in the front end (example: You want to scale your checkout process based on the number of items in the shopping cart)
Understand Autoscale settings
Autoscale settings help ensure that you have the right amount of resources running to handle the fluctuating load of your application. You can configure Autoscale settings to be triggered based on metrics that indicate load or performance, or triggered at ascheduled date and time. This article takes a detailed look at the anatomy of an Autoscale setting. The article begins with the schema and properties of a setting, and then walks through the different profile types that can be configured. Finally, the article discusseshow the Autoscale feature in Azure evaluates which profile to execute at any given time.
Autoscale setting schema
To illustrate the Autoscale setting schema, the following Autoscale setting is used. It is important to note that this Autoscale setting has:
- One profile.
- Two metric rules in this profile: one for scale out, and one for scale in.
- The scale-out rule is triggered when the virtual machine scale set’s average percentage CPU metric is greater than 85 percent for the past 10 minutes.
- The scale-in rule is triggered when the virtual machine scale set’s average is less than 60 percent for the past minute.
{ "id": "/subscriptions/s1/resourceGroups/rg1/providers/microsoft.insights/autoscalesettings/setting1", "name": "setting1", "type": "Microsoft.Insights/autoscaleSettings", "location": "East US", "properties": { "enabled": true, "targetResourceUri": "/subscriptions/s1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachineScaleSets/vmss1", "profiles": [ { "name": "mainProfile", "capacity": { "minimum": "1", "maximum": "4", "default": "1" }, "rules": [ { "metricTrigger": { "metricName": "Percentage CPU", "metricResourceUri": "/subscriptions/s1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachineScaleSets/vmss1", "timeGrain": "PT1M", "statistic": "Average", "timeWindow": "PT10M", "timeAggregation": "Average", "operator": "GreaterThan", "threshold": 85 }, "scaleAction": { "direction": "Increase", "type": "ChangeCount", "value": "1", "cooldown": "PT5M" } }, { "metricTrigger": { "metricName": "Percentage CPU", "metricResourceUri": "/subscriptions/s1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachineScaleSets/vmss1", "timeGrain": "PT1M", "statistic": "Average", "timeWindow": "PT10M", "timeAggregation": "Average", "operator": "LessThan", "threshold": 60 }, "scaleAction": { "direction": "Decrease", "type": "ChangeCount", "value": "1", "cooldown": "PT5M" } } ] } ] } }
Section | Element name | Description |
Setting | ID | The Autoscale setting’s resource ID. Autoscale settings are an Azure ResourceManager resource. |
Setting | name | The Autoscale setting name. |
Setting | location | The location of the Autoscale setting. This location can be different from the locationof the resource being scaled. |
properties | targetResourceUri | The resource ID of the resource being scaled. You can only have one Autoscalesetting per resource. |
properties | profiles | An Autoscale setting is composed of one or more profiles. Each time the Autoscaleengine runs, it executes one profile. |
profile | name | The name of the profile. You can choose any name that helps you identify the profile. |
profile | Capacity.maximum | The maximum capacity allowed. It ensures that Autoscale, when executing thisprofile, does not scale your resource above this number. |
profile | Capacity.minimum | The minimum capacity allowed. It ensures that Autoscale, when executing thisprofile, does not scale your resource below this number. |
profile | Capacity.default | If there is a problem reading the resource metric (in this case, the CPU of “vmss1”),and the current capacity is below the default, Autoscale scales out to the default.This is to ensure the availability of the resource. If the current capacity is alreadyhigher than the default capacity, Autoscale does not scale in. |
profile | rules | Autoscale automatically scales between the maximum and minimum capacities, byusing the rules in the profile. You can have multiple rules in a profile. Typically thereare two rules: one to determine when to scale out, and the other to determine whento scale in. |
rule | metricTrigger | Defines the metric condition of the rule. |
metricTrigger | metricName | The name of the metric. |
metricTrigger | metricResourceUri | The resource ID of the resource that emits the metric. In most cases, it is the same asthe resource being scaled. In some cases, it can be different. For example, you canscale a virtual machine scale set based on the number of messages in a storagequeue. |
metricTrigger | timeGrain | The metric sampling duration. For example, TimeGrain = “PT1M” means that themetrics should be aggregated every 1 minute, by using the aggregation methodspecified in the statistic element. |
metricTrigger | statistic | The aggregation method within the timeGrain period. For example, statistic =“Average” and timeGrain = “PT1M” means that the metrics should be aggregatedevery 1 minute, by taking the average. This property dictates how the metric issampled. |
metricTrigger | timeWindow | The amount of time to look back for metrics. For example, timeWindow = “PT10M”means that every time Autoscale runs, it queries metrics for the past 10 minutes. Thetime window allows your metrics to be normalized, and avoids reacting to transientspikes. |
metricTrigger | timeAggregation | The aggregation method used to aggregate the sampled metrics. For example,TimeAggregation = “Average” should aggregate the sampled metrics by taking theaverage. In the preceding case, take the ten 1-minute samples, and average them. |
rule | scaleAction | The action to take when the metricTrigger of the rule is triggered. |
scaleAction | direction | “Increase” to scale out, or “Decrease” to scale in. |
scaleAction | value | How much to increase or decrease the capacity of the resource. |
scaleAction | cooldown | The amount of time to wait after a scale operation before scaling again. Forexample, if cooldown = “PT10M”, Autoscale does not attempt to scale again foranother 10 minutes. The cooldown is to allow the metrics to stabilize after theaddition or removal of instances. |
Autoscale profiles
There are three types of Autoscale profiles:
- Regular profile: The most common profile. If you don’t need to scale your resource based on the day of the week, or on a particular day, you can use a regular profile. This profile can then be configured with metric rules that dictate when to scale out andwhen to scale in. You should only have one regular profile defined.The example profile used earlier in this article is an example of a regular profile. Note that it is also possible to set a profile to scale to a static instance count for your resource.
- Fixed date profile: This profile is for special cases. For example, let’s say you have an important event coming up on December 26, 2017 (PST). You want the minimum and maximum capacities of your resource to be different on that day, but still scale on thesame metrics. In this case, you should add a fixed date profile to your setting’s list of profiles. The profile is configured to run only on the event’s day. For any other day, Autoscale uses the regular profile.
"profiles": [{ "name": " regularProfile", "capacity": { ... }, "rules": [{ ... }, { ... }] }, { "name": "eventProfile", "capacity": { ... }, "rules": [{ ... }, { ... }], "fixedDate": { "timeZone": "Pacific Standard Time", "start": "2017-12-26T00:00:00", "end": "2017-12-26T23:59:00" }} ]
- Recurrence profile: This type of profile enables you to ensure that this profile is always used on a particular day of the week. Recurrence profiles only have a start time. They run until the next recurrence profile or fixed date profile is set to start. AnAutoscale setting with only one recurrence profile runs that profile, even if there is a regular profile defined in the same setting. The following example illustrates a way this profile is used:
Weekdays vs. weekends
Let’s say that on weekends, you want your maximum capacity to be 4. On weekdays, because you expect more load, you want your maximum capacity to be 10. In this case, your setting would contain two recurrence profiles, one to run on weekends and the otheron weekdays. The setting looks like this:
"profiles": [ { "name": "weekdayProfile", "capacity": { ... }, "rules": [{ ... }], "recurrence": { "frequency": "Week", "schedule": { "timeZone": "Pacific Standard Time", "days": [ "Monday" ], "hours": [ 0 ], "minutes": [ 0 ] } }} }, { "name": "weekendProfile", "capacity": { ... }, "rules": [{ ... }] "recurrence": { "frequency": "Week", "schedule": { "timeZone": "Pacific Standard Time", "days": [ "Saturday" ], "hours": [ 0 ], "minutes": [ 0 ] } } }]
The preceding setting shows that each recurrence profile has a schedule. This schedule determines when the profile starts running. The profile stops when it’s time to run another profile.
For example, in the preceding setting, “weekdayProfile” is set to start on Monday at 12:00 AM. That means this profile starts running on Monday at 12:00 AM. It continues until Saturday at 12:00 AM, when “weekendProfile” is scheduled to start running.
Autoscale evaluation
Given that Autoscale settings can have multiple profiles, and each profile can have multiple metric rules, it is important to understand how an Autoscale setting is evaluated. Each time the Autoscale job runs, it begins by choosing the profile that is applicable.Then Autoscale evaluates the minimum and maximum values, and any metric rules in the profile, and decides if a scale action is necessary.
Which profile will Autoscale pick?
Autoscale uses the following sequence to pick the profile:
- It first looks for any fixed date profile that is configured to run now. If there is, Autoscale runs it. If there are multiple fixed date profiles that are supposed to run, Autoscale selects the first one.
- If there are no fixed date profiles, Autoscale looks at recurrence profiles. If a recurrence profile is found, it runs it.
- If there are no fixed date or recurrence profiles, Autoscale runs the regular profile.
How does Autoscale evaluate multiple rules?
After Autoscale determines which profile to run, it evaluates all the scale-out rules in the profile (these are rules with direction = “Increase”).
If one or more scale-out rules are triggered, Autoscale calculates the new capacity determined by the scaleAction of each of those rules. Then it scales out to the maximum of those capacities, to ensure service availability.
For example, let’s say there is a virtual machine scale set with a current capacity of 10. There are two scale-out rules: one that increases capacity by 10 percent, and one that increases capacity by 3 counts. The first rule would result in a new capacity of 11, andthe second rule would result in a capacity of 13. To ensure service availability, Autoscale chooses the action that results in the maximum capacity, so the second rule is chosen.
If no scale-out rules are triggered, Autoscale evaluates all the scale-in rules (rules with direction = “Decrease”). Autoscale only takes a scale-in action if all of the scale-in rules are triggered.
Autoscale calculates the new capacity determined by the scaleAction of each of those rules. Then it chooses the scale action that results in the maximum of those capacities to ensure service availability.
For example, let’s say there is a virtual machine scale set with a current capacity of 10. There are two scale-in rules: one that decreases capacity by 50 percent, and one that decreases capacity by 3 counts. The first rule would result in a new capacity of 5, and the second rule would result in a capacity of 7. To ensure service availability, Autoscale chooses the action that results in the maximum capacity, so the second rule is chosen.
How to set autoscale by using a custom metric
Getting started
This lesson assumes that you have a web app with application insights configured. If you don’t have one already, you can set up Application Insights for your ASP.NET website.
- Open Azure portal
- Click on Azure Monitor icon in the left navigation pane.
- Click on Autoscale setting to view all the resources for which auto scale is applicable, along with its current autoscale status
- Open the Autoscale blade in Azure Monitor and select a resource you want to scaleNote: The steps below use an app service plan associated with a web app that has App Insights configured.
- In the scale setting blade for the resource, notice that the current instance count is Click on Enable autoscale.
- Provide a name for the scale setting, and the click on Add a rule. Notice the scale rule options that opens as a context pane in the right hand side. By default, it sets the option to scale your instance count by 1 if the CPU percentage of the resource exceeds70%. Change the metric source at the top to Application Insights, select the app insights resource in the Resource dropdown and then select the custom metric based on which you want to scale.
- Similar to the step above, add a scale rule that will scale in and decrease the scale count by 1 if the custom metric is below a threshold.
- Set the you instance limits. For example, if you want to scale between 2-5 instances depending on the custom metric fluctuations, set Minimum to ‘2’, Maximum to ‘5’ and Default to ‘2’Note: In case there is a problem reading the resource metrics and thecurrent capacity is below the default capacity, then to ensure the availability of the resource, Autoscale will scale out to the default value. If the current capacity is already higher than default capacity, Autoscale will not scale in.
- Click on Save
Best practices for Autoscale
Autoscale concepts
- A resource can have only one autoscale setting
- An autoscale setting can have one or more profiles and each profile can have one or more autoscale rules.
- An autoscale setting scales instances horizontally, which is out by increasing the instances and in by decreasing the number of instances. An autoscale setting has a maximum, minimum, and default value of instances.
- An autoscale job always reads the associated metric to scale by, checking if it has crossed the configured threshold for scale-out or scale-in. You can view a list of metrics that autoscale can scale by at Azure Monitor autoscaling common metrics.
- All thresholds are calculated at an instance level. For example, “scale out by one instance when average CPU > 80% when instance count is 2”, means scale-out when the average CPU across all instances is greater than 80%.
- All autoscale failures are logged to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there is an autoscale failure.
- Similarly, all successful scale actions are posted to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there is a successful autoscale action. You can also configure email orwebhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting.
Autoscale best practices
Use the following best practices as you use autoscale.
Ensure the maximum and minimum values are different and have an adequate margin between them
If you have a setting that has minimum=2, maximum=2 and the current instance count is 2, no scale action can occur. Keep an adequate margin between the maximum and minimum instance counts, which are inclusive. Autoscale always scales between theselimits.
Manual scaling is reset by autoscale min and max
If you manually update the instance count to a value above or below the maximum, the autoscale engine automatically scales back to the minimum (if below) or the maximum (if above). For example, you set the range between 3 and 6. If you have one runninginstance, the autoscale engine scales to three instances on its next run. Likewise, if you manually set the scale to eight instances, on the next run autoscale will scale it back to six instances on its next run. Manual scaling is temporary unless you reset the autoscalerules as well.
Always use a scale-out and scale-in rule combination that performs an increase and decrease
If you use only one part of the combination, autoscale will only take action in a single direction (scale out, or in) until it reaches the maximum, or minimum instance counts of defined in the profile. This is not optimal, ideally you want your resource to scale up attimes of high usage to ensure availability. Similarly, at times of low usage you want your resource to scale down, so you can realize cost savings.
Choose the appropriate statistic for your diagnostics metric
For diagnostics metrics, you can choose among Average, Minimum, Maximum and Total as a metric to scale by. The most common statistic is Average.
Choose the thresholds carefully for all metric types
We recommend carefully choosing different thresholds for scale-out and scale-in based on practical situations.
We do not recommend autoscale settings like the examples below with the same or very similar threshold values for out and in conditions:
- Increase instances by 1 count when Thread Count <= 600
- Decrease instances by 1 count when Thread Count >= 600
Let’s look at an example of what can lead to a behavior that may seem confusing. Consider the following sequence.
- Assume there are two instances to begin with and then the average number of threads per instance grows to 625.
- Autoscale scales out adding a third instance.
- Next, assume that the average thread count across instance falls to 575.
- Before scaling down, autoscale tries to estimate what the final state will be if it scaled in. For example, 575 x 3 (current instance count) = 1,725 / 2 (final number of instances when scaled down) = 862.5 threads. This means autoscale would have toimmediately scale-out again even after it scaled in, if the average thread count remains the same or even falls only a small amount. However, if it scaled up again, the whole process would repeat, leading to an infinite loop.
- To avoid this situation (termed “flapping”), autoscale does not scale down at all. Instead, it skips and reevaluates the condition again the next time the service’s job executes. This can confuse many people because autoscale wouldn’t appear to work when theaverage thread count was 575.
Estimation during a scale-in is intended to avoid “flapping” situations, where scale-in and scale-out actions continually go back and forth. Keep this behavior in mind when you choose the same thresholds for scale-out and in.
We recommend choosing an adequate margin between the scale-out and in thresholds. As an example, consider the following better rule combination.
- Increase instances by 1 count when CPU% >= 80
- Decrease instances by 1 count when CPU% <= 60
In this case
- Assume there are 2 instances to start with.
- If the average CPU% across instances goes to 80, autoscale scales out adding a third instance.
- Now assume that over time the CPU% falls to 60.
- Autoscale’s scale-in rule estimates the final state if it were to scale-in. For example, 60 x 3 (current instance count) = 180 / 2 (final number of instances when scaled down) = 90. So autoscale does not scale-in because it would have to scale-out againimmediately. Instead, it skips scaling down.
- The next time autoscale checks, the CPU continues to fall to 50. It estimates again – 50 x 3 instance = 150 / 2 instances = 75, which is below the scale-out threshold of 80, so it scales in successfully to 2 instances.
Considerations for scaling threshold values for special metrics
For special metrics such as Storage or Service Bus Queue length metric, the threshold is the average number of messages available per current number of instances. Carefully choose the threshold value for this metric.
Let’s illustrate it with an example to ensure you understand the behavior better.
- Increase instances by 1 count when Storage Queue message count >= 50
- Decrease instances by 1 count when Storage Queue message count <= 10
Consider the following sequence:
- There are two storage queue instances.
- Messages keep coming and when you review the storage queue, the total count reads 50. You might assume that autoscale should start a scale-out action. However, note that it is still 50/2 = 25 messages per instance. So, scale-out does not occur. For the firstscale-out to happen, the total message count in the storage queue should be 100.
- Next, assume that the total message count reaches 100.
- A 3rd storage queue instance is added due to a scale-out action. The next scale-out action will not happen until the total message count in the queue reaches 150 because 150/3 = 50.
- Now the number of messages in the queue gets smaller. With three instances, the first scale-in action happens when the total messages in all queues add up to 30 because 30/3 = 10 messages per instance, which is the scale-in threshold.
Considerations for scaling when multiple rules are configured in a profile
There are cases where you may have to set multiple rules in a profile. The following set of autoscale rules are used by services use when multiple rules are set.
On scale out, autoscale runs if any rule is met. On scale-in, autoscale require all rules to be met.
To illustrate, assume that you have the following four autoscale rules:
- If CPU < 30 %, scale-in by 1
- If Memory < 50%, scale-in by 1
- If CPU > 75%, scale-out by 1
- If Memory > 75%, scale-out by 1
Then the follow occurs:
- If CPU is 76% and Memory is 50%, we scale-out.
- If CPU is 50% and Memory is 76% we scale-out.
On the other hand, if CPU is 25% and memory is 51% autoscale does not scale-in. In order to scale-in, CPU must be 29% and Memory 49%.
Always select a safe default instance count
The default instance count is important autoscale scales your service to that count when metrics are not available. Therefore, select a default instance count that’s safe for your workloads.
Configure autoscale notifications
Autoscale will post to the Activity Log if any of the following conditions occur:
- Autoscale issues a scale operation
- Autoscale service successfully completes a scale action
- Autoscale service fails to take a scale action.
- Metrics are not available for autoscale service to make a scale decision.
- Metrics are available (recovery) again to make a scale decision.
You can also use an Activity Log alert to monitor the health of the autoscale engine. In addition to using activity log alerts, you can also configure email or webhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting.
Querying resources using Azure CLI
The Azure Command-Line Interface (Azure CLI) is the Microsoft cross-platform command-line experience for managing Azure resources. You can use it in your browser with Azure Cloud Shell or install it on macOS, Linux, or Windows and run it from the command line. Azure CLI is optimized for managing and administering Azure resources from the command line and for building automation scripts that work against the Azure Resource Manager.
The Azure CLI uses the –query argument to execute a JMESPath query on the results of commands. JMESPath is a query language for JavaScript Object Notation (JSON) that gives you the ability to select and present data from Azure CLI output. These queries are executed on the JSON output before they perform any other display formatting. The –query argument is supported by all commands in the Azure CLI.
Many CLI commands will return more than one value. These commands always return a JSON array instead of a JSON document. Arrays can have their elements accessed by index, but there’s never an order guarantee from the Azure CLI. To make the arrays easier to query, we can flatten them using the JMESPath [] operator.
In the following example, we use the az vm list command to query for a list of virtual machine (VM) instances:
az vm list
The query will return an array of large JSON objects for each VM in your subscription:
[ { "availabilitySet": null, "diagnosticsProfile": null, "hardwareProfile": { "vmSize": "Standard_B1s" }, "id": "/subscriptions/9103844d-1370-4716-b02b-69ce936865c6/resourceGroups/VM/providers/Microsoft.Compute/virtualMachines/simple", "identity": null, "instanceView": null, "licenseType": null, "location": "eastus", "name": "simple", "networkProfile": { "networkInterfaces": [{ "id": "/subscriptions/9103844d-1370-4716-b02b-69ce936865c6/resourceGroups/VM/providers/Microsoft.Network/networkInterfaces/simple159", "primary": null, "resourceGroup": "VM" }] }, "osProfile": { "adminPassword": null, "adminUsername": "simple", "computerName": "simple", "customData": null, "linuxConfiguration": { "disablePasswordAuthentication": false, "ssh": null }, "secrets": [], "windowsConfiguration": null }, "plan": null, "provisioningState": "Creating", "resourceGroup": "VM", "resources": null, "storageProfile": { "dataDisks": [], "imageReference": { "id": null, "offer": "UbuntuServer", "publisher": "Canonical", "sku": "17.10", "version": "latest" }, "osDisk": { "caching": "ReadWrite", "createOption": "FromImage", "diskSizeGb": 30, "encryptionSettings": null, "image": null, "managedDisk": { "id": "/subscriptions/9103844d-1370-4716-b02b-69ce936865c6/resourceGroups/VM/providers/Microsoft.Compute/disks/simple_OsDisk_1_4da948f5ef1a4232ad2f632077326d0a", "resourceGroup": "VM", "storageAccountType": "Premium_LRS" }, "name": "simple_OsDisk_1_4da948f5ef1a4232ad2f632077326d0a", "osType": "Linux", "vhd": null, "writeAcceleratorEnabled": null } }, "tags": null, "type": "Microsoft.Compute/virtualMachines", "vmId": "6aed2e80-64b2-401b-a8a0-b82ac8a6ed5c", "zones": null }, { ... } ]
Using the --query argument, we can specify project-specific fields to make the JSON object more useful and easier to read. This is useful if you are deserializing the JSON object into a specific type in your code:
az vm list –query ‘[].{name:name, image:storageProfile.imageReference.offer}’
[ { "image": "UbuntuServer", "name": "linuxvm" }, { "image": "WindowsServer", "name": "winvm" } ]
Using the [ ] operator, you can create queries that filter your result set by comparing the values of various JSON properties:
az vm list --query "[?starts_with(storageProfile.imageReference.offer, 'WindowsServer')]"
You can even combine filtering and projection to create custom queries that only return the resources you need and project only the fields that are useful to your application:
az vm list --query "[?starts_with(storageProfile.imageReference.offer, 'Ubuntu')].{name:name, id:vmId}"
[ { "name": "linuxvm", "id": "6aed2e80-64b2-401b-a8a0-b82ac8a6ed5c" } ]
Querying resources using the fluent Azure SDK
In a manner similar to how you use the Azure CLI, you can use the Azure SDK to query resources in your subscription. The SDK may be a better option if you intend to write code to find connection information for a specific application instance. For example, youmay need to write code to get the IP address of a specific VM in your subscription.
Connecting using the fluent Azure SDK
To use the APIs in the Azure management libraries for Microsoft .NET, as the first step, you need to create an authenticated client. The Azure SDK requires that you invoke the Azure.Authenticate static method to return an object that can fluently queryresources and access their metadata. The Authenticate method requires a parameter that specifies an authorization file:
Azure azure = Azure.Authenticate(“azure.auth”).WithDefaultSubscription();
The authentication file, referenced as azure.auth above, contains information necessary to access your subscription using a service principal. The authorization file will look similar to the format below:
{ "clientId": "b52dd125-9272-4b21-9862-0be667bdf6dc", "clientSecret": "ebc6e170-72b2-4b6f-9de2-99410964d2d0", "subscriptionId": "ffa52f27-be12-4cad-b1ea-c2c241b6cceb", "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47", "activeDirectoryEndpointUrl": "https://login.microsoftonline.com", "resourceManagerEndpointUrl": "https://management.azure.com/", "activeDirectoryGraphResourceId": "https://graph.windows.net/", "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/", "galleryEndpointUrl": "https://gallery.azure.com/", "managementEndpointUrl": "https://management.core.windows.net/" }
If you do not already have a service principal, you can generate a service principal and this file using the Azure CLI:
az ad sp create-for-rbac --sdk-auth > azure.auth
Listing virtual machines using the fluent Azure SDK
Once you have a variable of type IAzure, you can access various resources by using properties of the IAzure interface. For example, you can access VMs using the VirtualMachines property in the manner displayed below:
azure.VirtualMachines
The properties have both synchronous and asynchronous versions of methods to perform actions such as Create, Delete, List, and Get. If we wanted to get a list of VMs asynchronously, we could use the ListAsync method:
var vms = await azure.VirtualMachines.ListAsync(); foreach(var vm in vms) { Console.WriteLine(vm.Name); }
You can also use any language-integrated query mechanism, like language-integrated query (LINQ) in C#, to filter your VM list to a specific subset of VMs that match a filter criteria:
var allvms = await azure.VirtualMachines.ListAsync(); IVirtualMachine targetvm = allvms.Where(vm => vm.Name == "simple").SingleOrDefault(); Console.WriteLine(targetvm?.Id);
Gathering virtual machine metadata to determine the IP address
Now that we can filter to a specific VM, we can access various properties of the IVirtualMachine interface and other related interfaces to get that resource’s IP address.
To start, the IVirtualMachine.GetPrimaryNetworkInterface method implementation will return the network adapter that we need to access the VM:
INetworkInterface targetnic = targetvm.GetPrimaryNetworkInterface();
The INetworkInterface interface has a property named PrimaryIPConfiguration that will get the configuration of the primary IP address for the current network adapter:
INicIPConfiguration targetipconfig = targetnic.PrimaryIPConfiguration;
The INicIPConfiguration interface has a method named GetPublicIPAddress that will get the IP address resource that is public and associated with the current specified configuration:
IPublicIPAddress targetipaddress = targetipconfig.GetPublicIPAddress();
Finally, the IPublicIPAddress interface has a property named IPAddress that contains the current IP address as a string value:
Console.WriteLine($"IP Address:\t{targetipaddress.IPAddress}");
Your application can now use this specific IP address to communicate directly with the intended compute instance.
Transient errors
An application that communicates with elements running in the cloud has to be sensitive to the transient faults that can occur in this environment. Faults include the momentary loss of network connectivity to components and services, the temporary unavailabilityof a service, or timeouts that occur when a service is busy.
These faults are typically self-correcting, and if the action that triggered a fault is repeated after a suitable delay, it’s likely to be successful. For example, a database service that’s processing a large number of concurrent requests can implement a throttlingstrategy that temporarily rejects any further requests until its workload has eased. An application trying to access the database might fail to connect, but if it tries again after a delay, it might succeed.
Handling transient errors
In the cloud, transient faults aren’t uncommon, and an application should be designed to handle them elegantly and transparently. This minimizes the effects faults can have on the business tasks the application is performing.
If an application detects a failure when it tries to send a request to a remote service, it can handle the failure using the following strategies:
- Cancel: If the fault indicates that the failure isn’t transient or is unlikely to be successful if repeated, the application should cancel the operation and report an exception. For example, an authentication failure caused by providing invalid credentials is notlikely to succeed no matter how many times it’s attempted.
- Retry: If the specific fault reported is unusual or rare, it might have been caused by unusual circumstances, such as a network packet becoming corrupted while it was being transmitted. In this case, the application could retry the failing request againimmediately, because the same failure is unlikely to be repeated, and the request will probably be successful.
- Retry after a delay: If the fault is caused by one of the more commonplace connectivity or busy failures, the network or service might need a short period of time while the connectivity issues are corrected or the backlog of work is cleared. The applicationshould wait for a suitable amount of time before retrying the request.
For the more common transient failures, the period between retries should be chosen to spread requests from multiple instances of the application as evenly as possible. This reduces the chance of a busy service continuing to be overloaded. If many instances of anapplication are continually overwhelming a service with retry requests, it’ll take the service longer to recover.
If the request still fails, the application can wait and make another attempt. If necessary, this process can be repeated with increasing delays between retry attempts, until some maximum number of requests have been attempted. The delay can be increasedincrementally or exponentially depending on the type of failure and the probability that it’ll be corrected during this time.
Retrying after a transient error
The following diagram illustrates invoking an operation in a hosted service using this pattern. If the request is unsuccessful after a predefined number of attempts, the application should treat the fault as an exception and handle it accordingly.
- The application invokes an operation on a hosted service. The request fails, and the service host responds with HTTP response code 500 (internal server error).
- The application waits for a short interval and tries again. The request still fails with HTTP response code 500.
- The application waits for a longer interval and tries again. The request succeeds with HTTP response code 200 (OK).
The application should wrap all attempts to access a remote service in code that implements a retry policy matching one of the strategies listed above. Requests sent to different services can be subject to different policies. Some vendors provide libraries thatimplement retry policies, where the application can specify the maximum number of retries, the amount of time between retry attempts, and other parameters.
An application should log the details of faults and failing operations. This information is useful to operators. If a service is frequently unavailable or busy, it’s often because the service has exhausted its resources. You can reduce the frequency of these faults byscaling out the service. For example, if a database service is continually overloaded, it might be beneficial to partition the database and spread the load across multiple servers.
Handling transient errors in code
This example in C# illustrates an implementation of this pattern. The OperationWithBasicRetryAsync method, shown below, invokes an external service asynchronously through the TransientOperationAsync method. The details of theTransientOperationAsync method will be specific to the service and are omitted from the sample code:
private int retryCount = 3; private readonly TimeSpan delay = TimeSpan.FromSeconds(5); public async Task OperationWithBasicRetryAsync() { int currentRetry = 0; for (;;) { try { await TransientOperationAsync(); break; } catch (Exception ex) { Trace.TraceError(“Operation Exception”); currentRetry++; if (currentRetry > this.retryCount || !IsTransient(ex)) { throw; } } await Task.Delay(delay); } } private async Task TransientOperationAsync() { … }
The statement that invokes this method is contained in a try/catch block wrapped in a for loop. The for loop exits if the call to the TransientOperationAsync method succeeds without throwing an exception. If the TransientOperationAsync method fails, the catch block examines the reason for the failure. If it’s believed to be a transient error, the code waits for a short delay before retrying the operation.
The for loop also tracks the number of times that the operation has been attempted, and if the code fails three times, the exception is assumed to be more long lasting. If the exception isn’t transient or it’s long lasting, the catch handler will throw an exception. This exception exists in the for loop and should be caught by the code that invokes the OperationWithBasicRetryAsync method.
Detecting if an error is transient in code
The IsTransient method, shown below, checks for a specific set of exceptions that are relevant to the environment the code is run in. The definition of a transient exception will vary according to the resources being accessed and the environment the operation is being performed in:
private bool IsTransient(Exception ex) { if (ex is OperationTransientException) return true; var webException = ex as WebException; if (webException != null) { return new[] { WebExceptionStatus.ConnectionClosed, WebExceptionStatus.Timeout, WebExceptionStatus.RequestCanceled }.Contains(webException.Status); } return false; }