Teaching OpenClaw to Draw with MAI-Image-2

Want your OpenClaw assistant to generate images directly from chat prompts? OpenClaw is an open-source multi-channel chat agent gateway, and this guide shows how to connect Microsoft's MAI-Image-2 model to it and make the output arrive reliably on Telegram, LINE, and WhatsApp.

What You Will End Up With

Once this is wired up, your OpenClaw can:

A user can say, "Draw a space cat," in any chat channel and get an image back.
Image generation runs through MAI-Image-2, without managing your own GPU infrastructure, and billing stays usage-based.
Telegram can receive the image directly.
LINE and WhatsApp can consume the image through a public URL.
Generated images can be cleaned up automatically after 7 days.

Prerequisites

You already have OpenClaw running on an Azure VM. If not, start with Building a Family AI Chat Bot on Azure with OpenClaw.
Azure AI Foundry is already deployed with kind: AIServices.
Your Bicep infrastructure can deploy successfully today.

Start with OpenClaw Overview if you want the broader architecture and plugin model before diving into image generation.
Read Building a Natural-Language Reminder and Scheduling Plugin for OpenClaw for a second plugin example that focuses on scheduling, task dispatch, and delivery routing.
Use Building a Family AI Chat Bot on Azure with OpenClaw when you need the full Azure VM, Key Vault, and AI Foundry setup behind this guide.
Read Teaching OpenClaw to Understand Voice with MAI-Transcribe-1 if you want to add the matching speech pipeline for voice notes across the same chat channels.

Step 1: Deploy MAI-Image-2

1.1 Add support for `Microsoft` model format in Bicep

MAI-Image-2 uses the Microsoft model format, not the OpenAI format used by GPT deployments. Update the deployment loop in openai.bicep so the format can be overridden:

properties: {
  model: {
    format: deployment.?modelFormat ?? 'OpenAI'
    name: deployment.modelName
    version: deployment.modelVersion
  }
}

This is backward-compatible. Existing GPT deployments without modelFormat will still fall back to OpenAI.

1.2 Add MAI-Image-2 to your parameter file

In the openaiModelDeployments array inside prod.bicepparam, add this block:

{
  name: 'mai-image-2'
  modelName: 'MAI-Image-2'
  modelVersion: '2026-02-20'
  modelFormat: 'Microsoft'
  skuName: 'GlobalStandard'
  skuCapacity: 3
}

[!IMPORTANT] modelName must be exactly MAI-Image-2. If you use lowercase mai-image-2, Azure may return the misleading SpecialFeatureOrQuotaIdRequired error.

1.3 Deploy and verify

az deployment group create \
  --resource-group oc-family-rg \
  --template-file infra/bicep/main.bicep \
  --parameters infra/bicep/params/prod.bicepparam

Then verify the deployment:

az cognitiveservices account deployment list \
  --name <your Foundry resource name> \
  --resource-group oc-family-rg \
  --query "[].{name:name, model:properties.model.name, format:properties.model.format}" \
  -o table

You should see mai-image-2 | MAI-Image-2 | Microsoft.

Step 2: Add Azure Blob Storage for public image hosting

This is the piece that makes multi-channel delivery practical. Telegram can handle direct image uploads, but LINE and WhatsApp are much easier to support when the generated image is available through a public HTTPS URL.

2.1 Create `media-storage.bicep`

Provision a dedicated Storage Account instead of reusing the Foundry resource:

resource mediaStorage 'Microsoft.Storage/storageAccounts@2023-05-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: 'Hot'
    allowBlobPublicAccess: true
    minimumTlsVersion: 'TLS1_2'
  }
}

resource imagesContainer '...' = {
  name: 'images'
  properties: {
    publicAccess: 'Blob'
  }
}

Add a 7-day lifecycle policy as well:

resource lifecyclePolicy '...' = {
  properties: {
    policy: {
      rules: [
        {
          name: 'auto-delete-7d'
          type: 'Lifecycle'
          definition: {
            filters: {
              blobTypes: ['blockBlob']
              prefixMatch: ['images/']
            }
            actions: {
              baseBlob: {
                delete: {
                  daysAfterCreationGreaterThan: 7
                }
              }
            }
          }
        }
      ]
    }
  }
}

2.2 Wire it into `main.bicep`

param enableMediaStorage bool = true

module mediaStorage './modules/media-storage.bicep' = if (enableMediaStorage) {
  name: 'mediaStorage'
  params: {
    location: location
    prefix: prefix
  }
}

output mediaStorageEndpoint string = enableMediaStorage
  ? mediaStorage.outputs.storageEndpoint
  : ''

2.3 Deploy and store the key in Key Vault

Bash, on a Linux VM, WSL, or Azure Cloud Shell:

az deployment group create \
  --resource-group oc-family-rg \
  --template-file infra/bicep/main.bicep \
  --parameters infra/bicep/params/prod.bicepparam

STORAGE_NAME=$(az storage account list -g oc-family-rg \
  --query "[?contains(name,'media')].name" -o tsv)

STORAGE_KEY=$(az storage account keys list -g oc-family-rg \
  -n "$STORAGE_NAME" --query "[0].value" -o tsv)

az keyvault secret set \
  --vault-name <your Key Vault name> \
  --name media-storage-key \
  --value "$STORAGE_KEY"

Step 3: Build the OpenClaw plugin

The plugin does four things: take a drawing request, call MAI-Image-2, upload the result to Blob Storage, and return both the image and a public URL.

3.1 Suggested project structure

extensions/mai-image/
├── index.js
├── lib/
│   ├── api.js
│   └── blob.js
├── openclaw.plugin.json
├── package.json
└── test/
    ├── tool.test.js
    └── blob.test.js

3.2 Core registration logic

const crypto = require("crypto");
const { generateImage } = require("./lib/api");
const { uploadToBlob } = require("./lib/blob");

function register(api) {
  const cfg = Object.assign({
    endpoint: "",
    deploymentName: "mai-image-2",
    defaultWidth: 1024,
    defaultHeight: 1024,
    mediaStorageAccount: "",
    mediaStorageKey: "",
    mediaStorageContainer: "images",
  }, api.pluginConfig || {});

  api.registerTool({
    name: "mai_image_generate",
    label: "mai_image_generate",
    description: "Generate an image from a text prompt using MAI-Image-2.",
    parameters: {
      type: "object",
      required: ["prompt"],
      properties: {
        prompt: { type: "string" },
        width: { type: "integer" },
        height: { type: "integer" },
      },
    },
    execute: async (_toolCallId, params) => {
      const result = await generateImage({ ...cfg, prompt: params.prompt });
      const buffer = Buffer.from(result.b64_json, "base64");

      const blobName = `${Date.now()}-${crypto.randomUUID()}.png`;
      const publicUrl = await uploadToBlob({
        accountName: cfg.mediaStorageAccount,
        accountKey: cfg.mediaStorageKey,
        containerName: cfg.mediaStorageContainer,
        blobName,
        buffer,
        contentType: "image/png",
      });

      return {
        content: [
          { type: "image", data: result.b64_json, mimeType: "image/png" },
          { type: "text", text: `Image generated: ${publicUrl}` },
        ],
        details: { status: "ok", publicUrl },
      };
    },
  });

  api.on(
    "before_prompt_build",
    () => ({
      appendSystemContext:
        "You have a mai_image_generate tool. When the user asks to draw or generate an image, use it. After calling the tool, include the returned URL in your reply.",
    }),
    { priority: 20 }
  );
}

module.exports = register;

3.3 What `api.js` does

lib/api.js can stay very small. It only needs to shape the request for the MAI-Image-2 HTTP API:

// POST https://<resource>.cognitiveservices.azure.com/mai/v1/images/generations
// Headers: api-key, Content-Type: application/json
// Body: { model: "mai-image-2", prompt, width, height }
// Response: { data: [{ b64_json: "<base64 PNG>" }] }

If you want minimal dependencies, Node's built-in https is enough here.

3.4 What `blob.js` does

lib/blob.js uploads the generated image through the Azure Blob Storage REST API:

// PUT https://<account>.blob.core.windows.net/<container>/<blob>
// Authorization: SharedKey <account>:<HMAC-SHA256 signature>
// Public URL:
// https://<account>.blob.core.windows.net/images/<blob>.png

Same idea here: you can skip the storage SDK and keep the plugin lightweight.

Step 4: Deploy the plugin to the VM

4.1 Copy the files

scp -r extensions/mai-image/ weijen@family-claw.multiagentai.co:~/.openclaw/extensions/mai-image/

[!NOTE] If you are working from Codespaces and cannot use port 22 directly, az vm run-command invoke plus base64 file transfer is a reasonable fallback.

4.2 Update OpenClaw configuration

Add this block to the plugins section of ~/.openclaw/openclaw.json:

{
  "plugins": {
    "allow": ["...", "mai-image"],
    "entries": {
      "mai-image": {
        "enabled": true,
        "config": {
          "endpoint": "https://<your Foundry>.cognitiveservices.azure.com",
          "deploymentName": "mai-image-2",
          "defaultWidth": 1024,
          "defaultHeight": 1024,
          "mediaStorageAccount": "<Storage Account name>",
          "mediaStorageKey": "<key from Key Vault>",
          "mediaStorageContainer": "images"
        }
      }
    },
    "load": {
      "paths": ["...", "/home/weijen/.openclaw/extensions/mai-image"]
    }
  }
}

4.3 Restart the gateway and confirm the plugin loads

openclaw gateway restart
journalctl --user -u openclaw-gateway.service --since "30 seconds ago" | grep mai-image

You should see a log line similar to mai-image plugin ready.

Step 5: Test the flow

CLI test

openclaw agent \
  --message "Use mai_image_generate to draw a cute cat" \
  --session-id test \
  --json \
  --timeout 120

Channel test

Send this message through Telegram, LINE, or WhatsApp:

Draw a cat reading a book on the moon.

Expected behavior:

Channel	Expected result
Telegram	The user receives the image directly.
WhatsApp	The user receives a message containing a clickable image URL.
LINE	The user receives a clickable image URL, similar to WhatsApp.

Cost Notes

Item	Notes
MAI-Image-2	Charged per generated image.
Blob Storage (Standard LRS)	Usually inexpensive for low-volume personal use.
7-day lifecycle cleanup	No separate feature charge.

For a small personal or family bot, storage cost is usually negligible. Model usage is the part worth watching — see trace MAI-Image-2 calls and per-image cost in Azure AI Foundry for the OpenTelemetry setup that makes per-request usage visible.

Pitfalls I Hit Along the Way

Pitfall 1: Model name casing

Symptom	Fix
Bicep deployment returns `SpecialFeatureOrQuotaIdRequired`	Make sure `modelName` is `MAI-Image-2`, not `mai-image-2`.

Use az cognitiveservices account list-models if you want to confirm the exact model spelling.

Pitfall 2: Wrong plugin return shape

Symptom	Fix
`TypeError: Cannot read properties of undefined (reading 'trim')`, followed by broken LLM calls	`registerTool` must return `{ content: [...], details: {...} }`, not a custom shape like `{ media, text }`.

Pitfall 3: API key is not ready during registration

Symptom	Fix
`401 Access denied`	Do not rely on `api.resolveSecret()` during plugin registration. Read the provider configuration during each `execute` call instead.

Pitfall 4: `registerImageGenerationProvider` does not solve every channel

Symptom	Fix
Telegram works, but WhatsApp and LINE only receive text	Use `registerTool` plus a Blob Storage URL so every channel can fetch the image over HTTPS.

registerImageGenerationProvider stores the image on local disk. Telegram can work with that. WhatsApp and LINE are much better served by a public URL.

Pitfall 5: Session memory contamination

Symptom	Fix
The plugin is fixed, but the model still says it cannot draw images	Clear old session files and workspace memory, then restart the gateway.

find ~/.openclaw/agents/main/sessions/ -name "*.jsonl" \
  -exec grep -l "<peer-id>" {} \; -delete

rm -f ~/.openclaw/workspace/memory/$(date +%Y-%m-%d)*.md
openclaw gateway restart