Skip to content

Teaching OpenClaw to Draw with MAI-Image-2

Want your OpenClaw assistant to generate images directly from chat prompts? OpenClaw is an open-source multi-channel chat agent gateway, and this guide shows how to connect Microsoft's MAI-Image-2 model to it and make the output arrive reliably on Telegram, LINE, and WhatsApp.

What You Will End Up With

Once this is wired up, your OpenClaw can:

  • A user can say, "Draw a space cat," in any chat channel and get an image back.
  • Image generation runs through MAI-Image-2, without managing your own GPU infrastructure, and billing stays usage-based.
  • Telegram can receive the image directly.
  • LINE and WhatsApp can consume the image through a public URL.
  • Generated images can be cleaned up automatically after 7 days.

Prerequisites

Step 1: Deploy MAI-Image-2

1.1 Add support for Microsoft model format in Bicep

MAI-Image-2 uses the Microsoft model format, not the OpenAI format used by GPT deployments. Update the deployment loop in openai.bicep so the format can be overridden:

properties: {
  model: {
    format: deployment.?modelFormat ?? 'OpenAI'
    name: deployment.modelName
    version: deployment.modelVersion
  }
}

This is backward-compatible. Existing GPT deployments without modelFormat will still fall back to OpenAI.

1.2 Add MAI-Image-2 to your parameter file

In the openaiModelDeployments array inside prod.bicepparam, add this block:

{
  name: 'mai-image-2'
  modelName: 'MAI-Image-2'
  modelVersion: '2026-02-20'
  modelFormat: 'Microsoft'
  skuName: 'GlobalStandard'
  skuCapacity: 3
}

[!IMPORTANT] modelName must be exactly MAI-Image-2. If you use lowercase mai-image-2, Azure may return the misleading SpecialFeatureOrQuotaIdRequired error.

1.3 Deploy and verify

az deployment group create \
  --resource-group oc-family-rg \
  --template-file infra/bicep/main.bicep \
  --parameters infra/bicep/params/prod.bicepparam

Then verify the deployment:

az cognitiveservices account deployment list \
  --name <your Foundry resource name> \
  --resource-group oc-family-rg \
  --query "[].{name:name, model:properties.model.name, format:properties.model.format}" \
  -o table

You should see mai-image-2 | MAI-Image-2 | Microsoft.

Step 2: Add Azure Blob Storage for public image hosting

This is the piece that makes multi-channel delivery practical. Telegram can handle direct image uploads, but LINE and WhatsApp are much easier to support when the generated image is available through a public HTTPS URL.

2.1 Create media-storage.bicep

Provision a dedicated Storage Account instead of reusing the Foundry resource:

resource mediaStorage 'Microsoft.Storage/storageAccounts@2023-05-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: 'Hot'
    allowBlobPublicAccess: true
    minimumTlsVersion: 'TLS1_2'
  }
}

resource imagesContainer '...' = {
  name: 'images'
  properties: {
    publicAccess: 'Blob'
  }
}

Add a 7-day lifecycle policy as well:

resource lifecyclePolicy '...' = {
  properties: {
    policy: {
      rules: [
        {
          name: 'auto-delete-7d'
          type: 'Lifecycle'
          definition: {
            filters: {
              blobTypes: ['blockBlob']
              prefixMatch: ['images/']
            }
            actions: {
              baseBlob: {
                delete: {
                  daysAfterCreationGreaterThan: 7
                }
              }
            }
          }
        }
      ]
    }
  }
}

2.2 Wire it into main.bicep

param enableMediaStorage bool = true

module mediaStorage './modules/media-storage.bicep' = if (enableMediaStorage) {
  name: 'mediaStorage'
  params: {
    location: location
    prefix: prefix
  }
}

output mediaStorageEndpoint string = enableMediaStorage
  ? mediaStorage.outputs.storageEndpoint
  : ''

2.3 Deploy and store the key in Key Vault

Bash, on a Linux VM, WSL, or Azure Cloud Shell:

az deployment group create \
  --resource-group oc-family-rg \
  --template-file infra/bicep/main.bicep \
  --parameters infra/bicep/params/prod.bicepparam

STORAGE_NAME=$(az storage account list -g oc-family-rg \
  --query "[?contains(name,'media')].name" -o tsv)

STORAGE_KEY=$(az storage account keys list -g oc-family-rg \
  -n "$STORAGE_NAME" --query "[0].value" -o tsv)

az keyvault secret set \
  --vault-name <your Key Vault name> \
  --name media-storage-key \
  --value "$STORAGE_KEY"

Step 3: Build the OpenClaw plugin

The plugin does four things: take a drawing request, call MAI-Image-2, upload the result to Blob Storage, and return both the image and a public URL.

3.1 Suggested project structure

extensions/mai-image/
├── index.js
├── lib/
│   ├── api.js
│   └── blob.js
├── openclaw.plugin.json
├── package.json
└── test/
    ├── tool.test.js
    └── blob.test.js

3.2 Core registration logic

const crypto = require("crypto");
const { generateImage } = require("./lib/api");
const { uploadToBlob } = require("./lib/blob");

function register(api) {
  const cfg = Object.assign({
    endpoint: "",
    deploymentName: "mai-image-2",
    defaultWidth: 1024,
    defaultHeight: 1024,
    mediaStorageAccount: "",
    mediaStorageKey: "",
    mediaStorageContainer: "images",
  }, api.pluginConfig || {});

  api.registerTool({
    name: "mai_image_generate",
    label: "mai_image_generate",
    description: "Generate an image from a text prompt using MAI-Image-2.",
    parameters: {
      type: "object",
      required: ["prompt"],
      properties: {
        prompt: { type: "string" },
        width: { type: "integer" },
        height: { type: "integer" },
      },
    },
    execute: async (_toolCallId, params) => {
      const result = await generateImage({ ...cfg, prompt: params.prompt });
      const buffer = Buffer.from(result.b64_json, "base64");

      const blobName = `${Date.now()}-${crypto.randomUUID()}.png`;
      const publicUrl = await uploadToBlob({
        accountName: cfg.mediaStorageAccount,
        accountKey: cfg.mediaStorageKey,
        containerName: cfg.mediaStorageContainer,
        blobName,
        buffer,
        contentType: "image/png",
      });

      return {
        content: [
          { type: "image", data: result.b64_json, mimeType: "image/png" },
          { type: "text", text: `Image generated: ${publicUrl}` },
        ],
        details: { status: "ok", publicUrl },
      };
    },
  });

  api.on(
    "before_prompt_build",
    () => ({
      appendSystemContext:
        "You have a mai_image_generate tool. When the user asks to draw or generate an image, use it. After calling the tool, include the returned URL in your reply.",
    }),
    { priority: 20 }
  );
}

module.exports = register;

3.3 What api.js does

lib/api.js can stay very small. It only needs to shape the request for the MAI-Image-2 HTTP API:

// POST https://<resource>.cognitiveservices.azure.com/mai/v1/images/generations
// Headers: api-key, Content-Type: application/json
// Body: { model: "mai-image-2", prompt, width, height }
// Response: { data: [{ b64_json: "<base64 PNG>" }] }

If you want minimal dependencies, Node's built-in https is enough here.

3.4 What blob.js does

lib/blob.js uploads the generated image through the Azure Blob Storage REST API:

// PUT https://<account>.blob.core.windows.net/<container>/<blob>
// Authorization: SharedKey <account>:<HMAC-SHA256 signature>
// Public URL:
// https://<account>.blob.core.windows.net/images/<blob>.png

Same idea here: you can skip the storage SDK and keep the plugin lightweight.

Step 4: Deploy the plugin to the VM

4.1 Copy the files

scp -r extensions/mai-image/ weijen@family-claw.multiagentai.co:~/.openclaw/extensions/mai-image/

[!NOTE] If you are working from Codespaces and cannot use port 22 directly, az vm run-command invoke plus base64 file transfer is a reasonable fallback.

4.2 Update OpenClaw configuration

Add this block to the plugins section of ~/.openclaw/openclaw.json:

{
  "plugins": {
    "allow": ["...", "mai-image"],
    "entries": {
      "mai-image": {
        "enabled": true,
        "config": {
          "endpoint": "https://<your Foundry>.cognitiveservices.azure.com",
          "deploymentName": "mai-image-2",
          "defaultWidth": 1024,
          "defaultHeight": 1024,
          "mediaStorageAccount": "<Storage Account name>",
          "mediaStorageKey": "<key from Key Vault>",
          "mediaStorageContainer": "images"
        }
      }
    },
    "load": {
      "paths": ["...", "/home/weijen/.openclaw/extensions/mai-image"]
    }
  }
}

4.3 Restart the gateway and confirm the plugin loads

openclaw gateway restart
journalctl --user -u openclaw-gateway.service --since "30 seconds ago" | grep mai-image

You should see a log line similar to mai-image plugin ready.

Step 5: Test the flow

CLI test

openclaw agent \
  --message "Use mai_image_generate to draw a cute cat" \
  --session-id test \
  --json \
  --timeout 120

Channel test

Send this message through Telegram, LINE, or WhatsApp:

Draw a cat reading a book on the moon.

Expected behavior:

Channel Expected result
Telegram The user receives the image directly.
WhatsApp The user receives a message containing a clickable image URL.
LINE The user receives a clickable image URL, similar to WhatsApp.

Cost Notes

Item Notes
MAI-Image-2 Charged per generated image.
Blob Storage (Standard LRS) Usually inexpensive for low-volume personal use.
7-day lifecycle cleanup No separate feature charge.

For a small personal or family bot, storage cost is usually negligible. Model usage is the part worth watching — see trace MAI-Image-2 calls and per-image cost in Azure AI Foundry for the OpenTelemetry setup that makes per-request usage visible.

Pitfalls I Hit Along the Way

Pitfall 1: Model name casing

Symptom Fix
Bicep deployment returns SpecialFeatureOrQuotaIdRequired Make sure modelName is MAI-Image-2, not mai-image-2.

Use az cognitiveservices account list-models if you want to confirm the exact model spelling.

Pitfall 2: Wrong plugin return shape

Symptom Fix
TypeError: Cannot read properties of undefined (reading 'trim'), followed by broken LLM calls registerTool must return { content: [...], details: {...} }, not a custom shape like { media, text }.

Pitfall 3: API key is not ready during registration

Symptom Fix
401 Access denied Do not rely on api.resolveSecret() during plugin registration. Read the provider configuration during each execute call instead.

Pitfall 4: registerImageGenerationProvider does not solve every channel

Symptom Fix
Telegram works, but WhatsApp and LINE only receive text Use registerTool plus a Blob Storage URL so every channel can fetch the image over HTTPS.

registerImageGenerationProvider stores the image on local disk. Telegram can work with that. WhatsApp and LINE are much better served by a public URL.

Pitfall 5: Session memory contamination

Symptom Fix
The plugin is fixed, but the model still says it cannot draw images Clear old session files and workspace memory, then restart the gateway.
find ~/.openclaw/agents/main/sessions/ -name "*.jsonl" \
  -exec grep -l "<peer-id>" {} \; -delete

rm -f ~/.openclaw/workspace/memory/$(date +%Y-%m-%d)*.md
openclaw gateway restart

References