Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenAI Dynamic Quota (DynamicThrottling) in azure-native.cognitiveservices.Deployment #3564

Open
onordberg opened this issue Sep 7, 2024 · 4 comments
Labels
impact/missing-api kind/enhancement Improvements or new features

Comments

@onordberg
Copy link

Hello!

  • Vote on this issue by adding a 👍 reaction
  • If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)

Issue details

Dynamic quota is an Azure OpenAI feature that enables a standard (pay-as-you-go) deployment to opportunistically take advantage of more quota when extra capacity is available. In the GUI it is default set to true as there is little downside to enabling it. More about the feature: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/dynamic-quota

Based on a similar feature request in the Terraform project (hashicorp/terraform-provider-azurerm#23988) and the Azure Rest API implementation (https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices/preview/2024-06-01-preview/cognitiveservices.json) it seems like this can be set with the dynamicThrottlingEnabled key.

Example of a TypeScript constructor with this configuration exposed:

const exampledeploymentResourceResourceFromCognitiveservices = new azure_native.cognitiveservices.Deployment("exampledeploymentResourceResourceFromCognitiveservices", {
    accountName: "string",
    resourceGroupName: "string",
    deploymentName: "string",
    properties: {
        model: {
            format: "string",
            name: "string",
            source: "string",
            version: "string",
        },
        raiPolicyName: "string",
        scaleSettings: {
            capacity: 0,
            scaleType: "string",
        },
        **dynamicThrottlingEnabled: boolean,**
        versionUpgradeOption: "string",
    },
    sku: {
        name: "string",
        capacity: 0,
        family: "string",
        size: "string",
        tier: "string",
    },
});

Affected area/feature

azure-native.cognitiveservices.Deployment

@onordberg onordberg added kind/enhancement Improvements or new features needs-triage Needs attention from the triage team labels Sep 7, 2024
@thomas11
Copy link
Contributor

thomas11 commented Sep 9, 2024

Hi @onordberg, the Azure spec defines dynamicThrottlingEnabled as a property of accounts, not deployments. Accordingly, the Pulumi provider exposes it as a property of Account. Does that help?

@thomas11 thomas11 added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Sep 9, 2024
@onordberg
Copy link
Author

onordberg commented Sep 10, 2024

Ah! Thanks for spotting that, @thomas11. I think it is defined both as properties of accounts and deployments. I will test what happens if we define it at the account level right away. Maybe it defaults all deployments accordingly. That would strictly speaking suffice for us.

@pulumi-bot pulumi-bot added needs-triage Needs attention from the triage team and removed awaiting-feedback Blocked on input from the author labels Sep 10, 2024
@onordberg
Copy link
Author

I get the following error which leads me to believe that this setting needs to be configured at the Deployment level.

Diagnostics:
  azure-native:cognitiveservices:Account (name):
    error: Code="DynamicThrottlingNotSupported" Message="Thank you for your interest in Dynamic Throttling for Cognitive Services. This feature is currently not supported for the resource kind OpenAI and sku S0."

@thomas11
Copy link
Contributor

That's unfortunate. The dynamicThrottlingEnabled definition for deployments you linked has "readOnly": true, meaning it's only an output. Same for dynamicThrottlingEnabled in QuotaLimit and CallRateLimit.

There are some hints on the web that HTTP PATCH needs to be used to update an existing deployment. If that's the case, this provider cannot support it out of the box but we could add it with a manual addition.

I filed an upstream issue.

@thomas11 thomas11 added impact/missing-api and removed needs-triage Needs attention from the triage team labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/missing-api kind/enhancement Improvements or new features
Projects
None yet
Development

No branches or pull requests

3 participants