Azure OpenAI SDK for .NET: Build Your First AI Feature in C#

Azure OpenAI gives you the same models as OpenAI’s API — GPT-4o, GPT-4o-mini, DALL-E 3, Whisper — but through your own Azure subscription. That means your data stays within your Azure tenant, you get enterprise SLA guarantees, you can use private networking to keep traffic off the public internet, and Azure’s content filtering is built in.

For enterprise .NET developers, this is the right way to ship AI features. Here’s how to build your first one.

Setting Up Azure OpenAI Resource
SDK Setup in .NET
Chat Completion: The Foundation
Streaming Responses
System Prompts and Personas
Embeddings for Semantic Search
Function Calling
Image Generation with DALL-E 3
Production Best Practices
Using with Microsoft.Extensions.AI
FAQ
Conclusion

Setting Up Azure OpenAI Resource

Prerequisites

Azure subscription with Azure OpenAI access (request access at azure.microsoft.com if needed)
.NET 8 or later
Azure CLI or Azure Portal access

Create the Resource

# Azure CLI
az group create --name rg-myapp-ai --location eastus

az cognitiveservices account create \
  --name aoai-myapp \
  --resource-group rg-myapp-ai \
  --kind OpenAI \
  --sku S0 \
  --location eastus

# Deploy a model
az cognitiveservices account deployment create \
  --resource-group rg-myapp-ai \
  --name aoai-myapp \
  --deployment-name gpt-4o-mini \
  --model-name gpt-4o-mini \
  --model-version "2024-07-18" \
  --model-format OpenAI \
  --sku-capacity 10 \
  --sku-name Standard

Get Your Credentials

# Get endpoint and key
az cognitiveservices account show \
  --name aoai-myapp \
  --resource-group rg-myapp-ai \
  --query "properties.endpoint" -o tsv

az cognitiveservices account keys list \
  --name aoai-myapp \
  --resource-group rg-myapp-ai \
  --query "key1" -o tsv

Store these in appsettings.Development.json or user secrets for local dev. Never commit API keys to source control.

SDK Setup in .NET

# Install the Azure OpenAI SDK
dotnet add package Azure.AI.OpenAI

# Optional: Microsoft.Extensions.AI for abstraction
dotnet add package Microsoft.Extensions.AI.AzureAIInference

Register in DI (ASP.NET Core / MAUI / Worker Service)

// appsettings.json
{
  "AzureOpenAI": {
    "Endpoint": "https://aoai-myapp.openai.azure.com/",
    "ApiKey": "your-key-here",
    "DeploymentName": "gpt-4o-mini"
  }
}

// Program.cs
var builder = WebApplication.CreateBuilder(args);

var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"]!;
var deployment = builder.Configuration["AzureOpenAI:DeploymentName"]!;

// Register AzureOpenAIClient as singleton
builder.Services.AddSingleton(_ =>
    new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey)));

// Register specific clients scoped to deployment
builder.Services.AddSingleton(sp =>
    sp.GetRequiredService<AzureOpenAIClient>().GetChatClient(deployment));

builder.Services.AddSingleton(sp =>
    sp.GetRequiredService<AzureOpenAIClient>().GetEmbeddingClient("text-embedding-3-small"));

Managed Identity (Preferred for Production)

// No API key needed — uses Azure Managed Identity
builder.Services.AddSingleton(_ =>
    new AzureOpenAIClient(
        new Uri(endpoint),
        new DefaultAzureCredential())); // Reads from environment, workload identity, etc.

Chat Completion: The Foundation

using Azure.AI.OpenAI;
using OpenAI.Chat;

public class DocumentAnalysisService
{
    private readonly ChatClient _chatClient;

    public DocumentAnalysisService(ChatClient chatClient)
    {
        _chatClient = chatClient;
    }

    public async Task<string> SummarizeAsync(string documentText, CancellationToken ct = default)
    {
        var messages = new List<ChatMessage>
        {
            new SystemChatMessage(
                "You are a document analyst. Summarize documents concisely in 3-5 bullet points."),
            new UserChatMessage(
                $"Please summarize the following document:\n\n{documentText}")
        };

        var options = new ChatCompletionOptions
        {
            MaxOutputTokenCount = 500,
            Temperature = 0.3f  // Lower = more focused/deterministic
        };

        var response = await _chatClient.CompleteChatAsync(messages, options, ct);
        return response.Value.Content[0].Text;
    }

    public async Task<string> ClassifyAsync(string text, string[] categories)
    {
        var categoryList = string.Join(", ", categories);

        var messages = new List<ChatMessage>
        {
            new SystemChatMessage(
                $"Classify the input into exactly one of these categories: {categoryList}. " +
                "Respond with only the category name, nothing else."),
            new UserChatMessage(text)
        };

        var options = new ChatCompletionOptions { Temperature = 0f };
        var response = await _chatClient.CompleteChatAsync(messages, options, ct: default);
        return response.Value.Content[0].Text.Trim();
    }
}

Streaming Responses

public async IAsyncEnumerable<string> StreamSummaryAsync(
    string text,
    [EnumeratorCancellation] CancellationToken ct = default)
{
    var messages = new List<ChatMessage>
    {
        new SystemChatMessage("Summarize the provided text."),
        new UserChatMessage(text)
    };

    await foreach (var update in _chatClient.CompleteChatStreamingAsync(messages, cancellationToken: ct))
    {
        foreach (var part in update.ContentUpdate)
        {
            if (!string.IsNullOrEmpty(part.Text))
                yield return part.Text;
        }
    }
}

// ASP.NET Core streaming endpoint
app.MapGet("/summarize/stream", async (string text, ChatClient client, HttpResponse response) =>
{
    response.Headers["Content-Type"] = "text/event-stream";
    response.Headers["Cache-Control"] = "no-cache";

    var messages = new List<ChatMessage>
    {
        new SystemChatMessage("You are a helpful summarizer."),
        new UserChatMessage(text)
    };

    await foreach (var update in client.CompleteChatStreamingAsync(messages))
    {
        foreach (var part in update.ContentUpdate)
        {
            if (!string.IsNullOrEmpty(part.Text))
            {
                await response.WriteAsync($"data: {part.Text}\n\n");
                await response.Body.FlushAsync();
            }
        }
    }
});

System Prompts and Personas

System prompts define the model’s behavior and personality. They’re the most important thing to get right in a production AI feature.

public static class SystemPrompts
{
    public const string CustomerSupport = """
        You are a customer support assistant for Contoso Software.

        Your rules:
        - Only answer questions about Contoso products
        - If asked about competitors, politely decline to comment
        - Escalate billing issues by saying "I'll connect you with our billing team"
        - Always be professional and concise (under 100 words per response)
        - Never make up product features — if unsure, say you'll check

        Today's date: {0}
        """;

    public const string CodeReviewer = """
        You are a senior C# developer performing code review.
        Focus on: correctness, performance, security, and .NET best practices.
        Format feedback as: [ISSUE TYPE]: description. Suggest fixes with code examples.
        Skip comments on style unless they violate C# conventions.
        """;
}

// Usage
var systemPrompt = string.Format(SystemPrompts.CustomerSupport, DateTime.Today.ToShortDateString());
var messages = new List<ChatMessage>
{
    new SystemChatMessage(systemPrompt),
    new UserChatMessage(userInput)
};

Embeddings for Semantic Search

public class KnowledgeBaseService
{
    private readonly EmbeddingClient _embeddingClient;
    private readonly List<(string Text, float[] Embedding)> _knowledgeBase = new();

    public KnowledgeBaseService(EmbeddingClient embeddingClient)
    {
        _embeddingClient = embeddingClient;
    }

    public async Task IndexDocumentAsync(string text)
    {
        var response = await _embeddingClient.GenerateEmbeddingAsync(text);
        var embedding = response.Value.ToFloats().ToArray();
        _knowledgeBase.Add((text, embedding));
    }

    public async Task<List<string>> SearchAsync(string query, int topK = 3)
    {
        var queryResponse = await _embeddingClient.GenerateEmbeddingAsync(query);
        var queryEmbedding = queryResponse.Value.ToFloats().ToArray();

        return _knowledgeBase
            .Select(item => new
            {
                item.Text,
                Score = CosineSimilarity(queryEmbedding, item.Embedding)
            })
            .OrderByDescending(x => x.Score)
            .Take(topK)
            .Select(x => x.Text)
            .ToList();
    }

    // Batch indexing — more efficient than one-by-one
    public async Task IndexDocumentsAsync(IList<string> texts)
    {
        var response = await _embeddingClient.GenerateEmbeddingsAsync(texts);
        for (int i = 0; i < texts.Count; i++)
        {
            var embedding = response.Value[i].ToFloats().ToArray();
            _knowledgeBase.Add((texts[i], embedding));
        }
    }

    private static float CosineSimilarity(float[] a, float[] b)
    {
        float dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.Length; i++)
        {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
    }
}

Function Calling

// Define tools the model can call
private static readonly ChatTool GetOrderStatusTool = ChatTool.CreateFunctionTool(
    functionName: "get_order_status",
    functionDescription: "Gets the current status of an order by order ID",
    functionParameters: BinaryData.FromString("""
    {
        "type": "object",
        "properties": {
            "order_id": {
                "type": "string",
                "description": "The order ID, e.g. ORD-12345"
            }
        },
        "required": ["order_id"]
    }
    """));

public async Task<string> HandleSupportQueryAsync(string userQuery)
{
    var messages = new List<ChatMessage>
    {
        new SystemChatMessage("You are a support agent. Use tools to look up order information."),
        new UserChatMessage(userQuery)
    };

    var options = new ChatCompletionOptions();
    options.Tools.Add(GetOrderStatusTool);

    // Agentic loop — model may call tools multiple times
    while (true)
    {
        var response = await _chatClient.CompleteChatAsync(messages, options);
        var completion = response.Value;

        if (completion.FinishReason == ChatFinishReason.ToolCalls)
        {
            messages.Add(new AssistantChatMessage(completion));

            foreach (var toolCall in completion.ToolCalls)
            {
                var result = await ExecuteToolAsync(toolCall);
                messages.Add(new ToolChatMessage(toolCall.Id, result));
            }
        }
        else
        {
            return completion.Content[0].Text;
        }
    }
}

private async Task<string> ExecuteToolAsync(ChatToolCall toolCall)
{
    if (toolCall.FunctionName == "get_order_status")
    {
        using var doc = JsonDocument.Parse(toolCall.FunctionArguments);
        var orderId = doc.RootElement.GetProperty("order_id").GetString()!;
        var status = await _orderService.GetStatusAsync(orderId);
        return JsonSerializer.Serialize(status);
    }
    return "Tool not found";
}

Image Generation with DALL-E 3

// Register image client
builder.Services.AddSingleton(sp =>
    sp.GetRequiredService<AzureOpenAIClient>().GetImageClient("dall-e-3"));

// Generate an image
public async Task<string> GenerateProductImageAsync(string productDescription)
{
    var options = new ImageGenerationOptions
    {
        Quality = GeneratedImageQuality.High,
        Size = GeneratedImageSize.W1024xH1024,
        Style = GeneratedImageStyle.Natural,
        ResponseFormat = GeneratedImageFormat.Uri
    };

    var response = await _imageClient.GenerateImageAsync(
        $"Professional product photo of: {productDescription}. " +
        "White background, studio lighting, high resolution.",
        options);

    return response.Value.ImageUri.ToString();
}

// Generate with revised prompt (see what DALL-E actually used)
var result = await _imageClient.GenerateImageAsync(prompt, options);
Console.WriteLine($"Revised prompt: {result.Value.RevisedPrompt}");

Production Best Practices

Retry Policy

// AzureOpenAIClient includes built-in retry for 429 (rate limit) and 5xx errors
// Configure custom retry behavior:
builder.Services.AddSingleton(_ =>
    new AzureOpenAIClient(
        new Uri(endpoint),
        new AzureKeyCredential(apiKey),
        new AzureOpenAIClientOptions
        {
            RetryPolicy = new ClientRetryPolicy(maxRetries: 5),
            NetworkTimeout = TimeSpan.FromSeconds(30)
        }));

Content Filtering

// Check content filter results in the response
var response = await _chatClient.CompleteChatAsync(messages);
var completion = response.Value;

// Azure automatically filters harmful content — check if response was filtered
if (completion.FinishReason == ChatFinishReason.ContentFilter)
{
    _logger.LogWarning("Response filtered by content policy");
    return "I can't help with that request.";
}

Token Usage Tracking

var response = await _chatClient.CompleteChatAsync(messages);
var usage = response.Value.Usage;

_metrics.RecordTokenUsage(
    inputTokens: usage.InputTokenCount,
    outputTokens: usage.OutputTokenCount,
    model: deployment,
    feature: "document-summary");

Using with Microsoft.Extensions.AI

// Use Azure OpenAI through the MEAI abstraction
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var aoaiClient = sp.GetRequiredService<AzureOpenAIClient>();
    return aoaiClient
        .AsChatClient(deployment)
        .AsBuilder()
        .UseLogging(sp.GetRequiredService<ILoggerFactory>())
        .UseOpenTelemetry()
        .Build(sp);
});

// Now your service uses IChatClient — works with Azure OpenAI, OpenAI, or Ollama
public class MyAiService
{
    private readonly IChatClient _client; // Provider-agnostic

    public MyAiService(IChatClient client) => _client = client;
}

FAQ

What’s the difference between Azure OpenAI and OpenAI’s direct API?

Same models, different hosting. Azure OpenAI runs in your Azure tenant — data doesn’t leave Microsoft’s datacenters, you can use VNet integration, Managed Identity, Azure Monitor, and Azure Policy. OpenAI’s direct API sends data to OpenAI’s infrastructure. For enterprise compliance requirements, Azure OpenAI is almost always the right choice.

How do I handle Azure OpenAI rate limits (429 errors)?

The SDK retries automatically with exponential backoff. For applications needing higher throughput, deploy your model with a higher “Tokens Per Minute” quota in Azure Portal → Azure OpenAI → Deployments. For burst scenarios, implement request queuing with a Channel<T> to smooth out spikes.

Can I use Azure OpenAI with private endpoints?

Yes — Azure OpenAI supports private endpoints that route all traffic through your VNet, never touching the public internet. Configure in Azure Portal → your Azure OpenAI resource → Networking → Private endpoint connections.

How do I monitor token usage and costs?

Azure Monitor and the Azure OpenAI resource’s “Metrics” blade show token consumption by deployment. Set budget alerts in Azure Cost Management. Log Usage.TotalTokenCount from each response to your own metrics system for fine-grained per-feature tracking.

What’s the difference between GPT-4o and GPT-4o-mini for most use cases?

GPT-4o-mini is approximately 15x cheaper and slightly faster. For classification, summarization, data extraction, and customer support bots: GPT-4o-mini is usually sufficient. For complex reasoning, code generation, and nuanced content creation: GPT-4o produces noticeably better results. Start with mini and upgrade only specific features that need the full model.

Conclusion

The Azure OpenAI SDK for .NET gives enterprise developers a production-grade path to AI features with the security and compliance guarantees that Azure provides. The SDK is comprehensive — chat, streaming, embeddings, function calling, and image generation all work through a consistent API.

For most .NET projects, wrap the SDK behind IChatClient from Microsoft.Extensions.AI so your application code stays provider-agnostic. Use Managed Identity in production, log token usage from day one, and start with GPT-4o-mini before deciding whether the full model is needed.

[INTERNAL_LINK: Microsoft.Extensions.AI explained guide] [INTERNAL_LINK: Semantic Kernel for .NET developers guide]