【備忘録】AzureのScale Setを準備する　後編

■はじめに

前回、VMのScale Set準備のためのterraformファイル作成まで行いました。

本日は更にauto scalingの記述を追加して実際にapplyします。

【注意】
ScaleSetは結構高いので、繋ぎっぱなしにしないようにしましょう。
ボクは構築確認が終わったら速攻でterraform destroyするようにしてます（小心者なので）。
azure.microsoft.com
ちょっと気になったのは、ScaleSetをAzurePortalから手動停止してもCPU負荷が0にならなかったこと。
恐らくですがcapacityでminimumの設定VM数を維持するように見えます。どっかで調べておきます。

■Auto Scalingの設定

今回のScaleSet導入は、Selenoidが高負荷になった場合に水平スケーリングをするのが目的です。
つまりリソースを拡張するのではなくて、環境を拡張します。
その場合に記載するterraformの記述は以下の通りです。

# スケールルール
resource "azurerm_monitor_autoscale_setting" "selenoid_autoscale_setting" {
  name                = "selenoid_autoscale_setting"
  resource_group_name = azurerm_resource_group.selenoid_rg.name
  location            = var.location
  target_resource_id  = azurerm_linux_virtual_machine_scale_set.selenoid_vm01.id

  profile {
    name = "defaultProfile"

    capacity {
      default = 1
      minimum = 1
      maximum = 2
    }

    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.selenoid_vm01.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "GreaterThan"
        threshold          = 75
      }

      scale_action {
        direction = "Increase"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT1M"
      }
    }

    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.selenoid_vm01.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "LessThan"
        threshold          = 25
      }

      scale_action {
        direction = "Decrease"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT1M"
      }
    }
  }

  notification {
    email {
      send_to_subscription_administrator    = true
      send_to_subscription_co_administrator = true
      custom_emails                         = ["user@example.com"]
    }
  }
}

以下ページを参考にしています。

www.terraform.io
ほぼそのまま使ってますw
理由としては、まずは疎通が大切なので動くと思われる記述を試してから、運用しつつ後日調整していこうかと。

ちなみに参考ページにもありますが、閾値の設定にはISO-8601に準拠した記述が必要です。
こちらのページが参考になりました。感謝。

qiita.com

■terraform実行

terraform applyします。

data.azurerm_resource_group.image: Refreshing state...
azurerm_resource_group.selenoid_rg: Refreshing state... [id=/subscriptions/サブスクID/resourceGroups/selenoid_rg]
data.azurerm_image.image: Refreshing state...

（中略）

azurerm_linux_virtual_machine_scale_set.selenoid_vm01: Creation complete after 2m9s [id=/subscriptions/サブスクID/resourceGroups/selenoid_rg/providers/Microsoft.Compute/virtualMachineScaleSets/selenoid01]
azurerm_monitor_autoscale_setting.selenoid_autoscale_setting: Creating...
azurerm_monitor_autoscale_setting.selenoid_autoscale_setting: Still creating... [10s elapsed]
azurerm_monitor_autoscale_setting.selenoid_autoscale_setting: Creation complete after 13s [id=/subscriptions/サブスクID/resourceGroups/selenoid_rg/providers/microsoft.insights/autoscalesettings/selenoid_autoscale_setting]

Apply complete! Resources: 17 added, 0 changed, 0 destroyed.

さて、正常にapplyされ、selenoidにはpackerで作成したimageが割り当てられていますが、ggrのVMが空っぽなのでAnsibleのplaybookを改めて実行します。

※ポイントだけ書きます。記載の全体はこちらのエントリを参照。

（省略）

    - name: ggrのコピー
      copy:
        src: ../../container_template/ggr
        dest: /usr/local/
        mode: '0644'
    - name: ggr docker-compose
      docker_compose:
        project_src: /usr/local/ggr/
        build: yes

Ansibleがggr用のdocker関連ファイルをVMに転送してVM上でdocker-compose upします。

TASK [ggrのコピー] ********************************************************************************************************************************************************************
changed: [example-grid.com]

TASK [ggr docker-compose] *********************************************************************************************************************************************************
changed: [example-grid.com]

PLAY RECAP ************************************************************************************************************************************************************************
example-grid.com : ok=15   changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

試しにJenkinsからE2Eテストを起動した際に負荷がかかっているのが確認できました。

f:id:theboyalex:20200626013815p:plain
次回は、テスト負荷を上げて実際に閾値を超えた場合にスケールアウトすることを確認します。

ではでは。