Teaching OpenClaw to Draw with MAI-Image-2
Want your OpenClaw assistant to generate images directly from chat prompts? OpenClaw is an open-source multi-channel chat agent gateway, and this guide shows how to connect Microsoft's MAI-Image-2 model to it and make the output arrive reliably on Telegram, LINE, and WhatsApp.
What You Will End Up With
Once this is wired up, your OpenClaw can:
- A user can say, "Draw a space cat," in any chat channel and get an image back.
- Image generation runs through MAI-Image-2, without managing your own GPU infrastructure, and billing stays usage-based.
- Telegram can receive the image directly.
- LINE and WhatsApp can consume the image through a public URL.
- Generated images can be cleaned up automatically after 7 days.
Prerequisites
- You already have OpenClaw running on an Azure VM. If not, start with Building a Family AI Chat Bot on Azure with OpenClaw.
- Azure AI Foundry is already deployed with
kind: AIServices. - Your Bicep infrastructure can deploy successfully today.
Related OpenClaw Guides
- Start with OpenClaw Overview if you want the broader architecture and plugin model before diving into image generation.
- Read Building a Natural-Language Reminder and Scheduling Plugin for OpenClaw for a second plugin example that focuses on scheduling, task dispatch, and delivery routing.
- Use Building a Family AI Chat Bot on Azure with OpenClaw when you need the full Azure VM, Key Vault, and AI Foundry setup behind this guide.
- Read Teaching OpenClaw to Understand Voice with MAI-Transcribe-1 if you want to add the matching speech pipeline for voice notes across the same chat channels.
Step 1: Deploy MAI-Image-2
1.1 Add support for Microsoft model format in Bicep
MAI-Image-2 uses the Microsoft model format, not the OpenAI format used by GPT deployments. Update the deployment loop in openai.bicep so the format can be overridden:
properties: {
model: {
format: deployment.?modelFormat ?? 'OpenAI'
name: deployment.modelName
version: deployment.modelVersion
}
}
This is backward-compatible. Existing GPT deployments without modelFormat will still fall back to OpenAI.
1.2 Add MAI-Image-2 to your parameter file
In the openaiModelDeployments array inside prod.bicepparam, add this block:
{
name: 'mai-image-2'
modelName: 'MAI-Image-2'
modelVersion: '2026-02-20'
modelFormat: 'Microsoft'
skuName: 'GlobalStandard'
skuCapacity: 3
}
[!IMPORTANT]
modelNamemust be exactlyMAI-Image-2. If you use lowercasemai-image-2, Azure may return the misleadingSpecialFeatureOrQuotaIdRequirederror.
1.3 Deploy and verify
az deployment group create \
--resource-group oc-family-rg \
--template-file infra/bicep/main.bicep \
--parameters infra/bicep/params/prod.bicepparam
Then verify the deployment:
az cognitiveservices account deployment list \
--name <your Foundry resource name> \
--resource-group oc-family-rg \
--query "[].{name:name, model:properties.model.name, format:properties.model.format}" \
-o table
You should see mai-image-2 | MAI-Image-2 | Microsoft.
Step 2: Add Azure Blob Storage for public image hosting
This is the piece that makes multi-channel delivery practical. Telegram can handle direct image uploads, but LINE and WhatsApp are much easier to support when the generated image is available through a public HTTPS URL.
2.1 Create media-storage.bicep
Provision a dedicated Storage Account instead of reusing the Foundry resource:
resource mediaStorage 'Microsoft.Storage/storageAccounts@2023-05-01' = {
name: storageAccountName
location: location
sku: {
name: 'Standard_LRS'
}
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
allowBlobPublicAccess: true
minimumTlsVersion: 'TLS1_2'
}
}
resource imagesContainer '...' = {
name: 'images'
properties: {
publicAccess: 'Blob'
}
}
Add a 7-day lifecycle policy as well:
resource lifecyclePolicy '...' = {
properties: {
policy: {
rules: [
{
name: 'auto-delete-7d'
type: 'Lifecycle'
definition: {
filters: {
blobTypes: ['blockBlob']
prefixMatch: ['images/']
}
actions: {
baseBlob: {
delete: {
daysAfterCreationGreaterThan: 7
}
}
}
}
}
]
}
}
}
2.2 Wire it into main.bicep
param enableMediaStorage bool = true
module mediaStorage './modules/media-storage.bicep' = if (enableMediaStorage) {
name: 'mediaStorage'
params: {
location: location
prefix: prefix
}
}
output mediaStorageEndpoint string = enableMediaStorage
? mediaStorage.outputs.storageEndpoint
: ''
2.3 Deploy and store the key in Key Vault
Bash, on a Linux VM, WSL, or Azure Cloud Shell:
az deployment group create \
--resource-group oc-family-rg \
--template-file infra/bicep/main.bicep \
--parameters infra/bicep/params/prod.bicepparam
STORAGE_NAME=$(az storage account list -g oc-family-rg \
--query "[?contains(name,'media')].name" -o tsv)
STORAGE_KEY=$(az storage account keys list -g oc-family-rg \
-n "$STORAGE_NAME" --query "[0].value" -o tsv)
az keyvault secret set \
--vault-name <your Key Vault name> \
--name media-storage-key \
--value "$STORAGE_KEY"
Step 3: Build the OpenClaw plugin
The plugin does four things: take a drawing request, call MAI-Image-2, upload the result to Blob Storage, and return both the image and a public URL.
3.1 Suggested project structure
extensions/mai-image/
├── index.js
├── lib/
│ ├── api.js
│ └── blob.js
├── openclaw.plugin.json
├── package.json
└── test/
├── tool.test.js
└── blob.test.js
3.2 Core registration logic
const crypto = require("crypto");
const { generateImage } = require("./lib/api");
const { uploadToBlob } = require("./lib/blob");
function register(api) {
const cfg = Object.assign({
endpoint: "",
deploymentName: "mai-image-2",
defaultWidth: 1024,
defaultHeight: 1024,
mediaStorageAccount: "",
mediaStorageKey: "",
mediaStorageContainer: "images",
}, api.pluginConfig || {});
api.registerTool({
name: "mai_image_generate",
label: "mai_image_generate",
description: "Generate an image from a text prompt using MAI-Image-2.",
parameters: {
type: "object",
required: ["prompt"],
properties: {
prompt: { type: "string" },
width: { type: "integer" },
height: { type: "integer" },
},
},
execute: async (_toolCallId, params) => {
const result = await generateImage({ ...cfg, prompt: params.prompt });
const buffer = Buffer.from(result.b64_json, "base64");
const blobName = `${Date.now()}-${crypto.randomUUID()}.png`;
const publicUrl = await uploadToBlob({
accountName: cfg.mediaStorageAccount,
accountKey: cfg.mediaStorageKey,
containerName: cfg.mediaStorageContainer,
blobName,
buffer,
contentType: "image/png",
});
return {
content: [
{ type: "image", data: result.b64_json, mimeType: "image/png" },
{ type: "text", text: `Image generated: ${publicUrl}` },
],
details: { status: "ok", publicUrl },
};
},
});
api.on(
"before_prompt_build",
() => ({
appendSystemContext:
"You have a mai_image_generate tool. When the user asks to draw or generate an image, use it. After calling the tool, include the returned URL in your reply.",
}),
{ priority: 20 }
);
}
module.exports = register;
3.3 What api.js does
lib/api.js can stay very small. It only needs to shape the request for the MAI-Image-2 HTTP API:
// POST https://<resource>.cognitiveservices.azure.com/mai/v1/images/generations
// Headers: api-key, Content-Type: application/json
// Body: { model: "mai-image-2", prompt, width, height }
// Response: { data: [{ b64_json: "<base64 PNG>" }] }
If you want minimal dependencies, Node's built-in https is enough here.
3.4 What blob.js does
lib/blob.js uploads the generated image through the Azure Blob Storage REST API:
// PUT https://<account>.blob.core.windows.net/<container>/<blob>
// Authorization: SharedKey <account>:<HMAC-SHA256 signature>
// Public URL:
// https://<account>.blob.core.windows.net/images/<blob>.png
Same idea here: you can skip the storage SDK and keep the plugin lightweight.
Step 4: Deploy the plugin to the VM
4.1 Copy the files
scp -r extensions/mai-image/ weijen@family-claw.multiagentai.co:~/.openclaw/extensions/mai-image/
[!NOTE] If you are working from Codespaces and cannot use port 22 directly,
az vm run-command invokeplus base64 file transfer is a reasonable fallback.
4.2 Update OpenClaw configuration
Add this block to the plugins section of ~/.openclaw/openclaw.json:
{
"plugins": {
"allow": ["...", "mai-image"],
"entries": {
"mai-image": {
"enabled": true,
"config": {
"endpoint": "https://<your Foundry>.cognitiveservices.azure.com",
"deploymentName": "mai-image-2",
"defaultWidth": 1024,
"defaultHeight": 1024,
"mediaStorageAccount": "<Storage Account name>",
"mediaStorageKey": "<key from Key Vault>",
"mediaStorageContainer": "images"
}
}
},
"load": {
"paths": ["...", "/home/weijen/.openclaw/extensions/mai-image"]
}
}
}
4.3 Restart the gateway and confirm the plugin loads
openclaw gateway restart
journalctl --user -u openclaw-gateway.service --since "30 seconds ago" | grep mai-image
You should see a log line similar to mai-image plugin ready.
Step 5: Test the flow
CLI test
openclaw agent \
--message "Use mai_image_generate to draw a cute cat" \
--session-id test \
--json \
--timeout 120
Channel test
Send this message through Telegram, LINE, or WhatsApp:
Draw a cat reading a book on the moon.
Expected behavior:
| Channel | Expected result |
|---|---|
| Telegram | The user receives the image directly. |
| The user receives a message containing a clickable image URL. | |
| LINE | The user receives a clickable image URL, similar to WhatsApp. |
Cost Notes
| Item | Notes |
|---|---|
| MAI-Image-2 | Charged per generated image. |
| Blob Storage (Standard LRS) | Usually inexpensive for low-volume personal use. |
| 7-day lifecycle cleanup | No separate feature charge. |
For a small personal or family bot, storage cost is usually negligible. Model usage is the part worth watching — see trace MAI-Image-2 calls and per-image cost in Azure AI Foundry for the OpenTelemetry setup that makes per-request usage visible.
Pitfalls I Hit Along the Way
Pitfall 1: Model name casing
| Symptom | Fix |
|---|---|
Bicep deployment returns SpecialFeatureOrQuotaIdRequired |
Make sure modelName is MAI-Image-2, not mai-image-2. |
Use az cognitiveservices account list-models if you want to confirm the exact model spelling.
Pitfall 2: Wrong plugin return shape
| Symptom | Fix |
|---|---|
TypeError: Cannot read properties of undefined (reading 'trim'), followed by broken LLM calls |
registerTool must return { content: [...], details: {...} }, not a custom shape like { media, text }. |
Pitfall 3: API key is not ready during registration
| Symptom | Fix |
|---|---|
401 Access denied |
Do not rely on api.resolveSecret() during plugin registration. Read the provider configuration during each execute call instead. |
Pitfall 4: registerImageGenerationProvider does not solve every channel
| Symptom | Fix |
|---|---|
| Telegram works, but WhatsApp and LINE only receive text | Use registerTool plus a Blob Storage URL so every channel can fetch the image over HTTPS. |
registerImageGenerationProvider stores the image on local disk. Telegram can work with that. WhatsApp and LINE are much better served by a public URL.
Pitfall 5: Session memory contamination
| Symptom | Fix |
|---|---|
| The plugin is fixed, but the model still says it cannot draw images | Clear old session files and workspace memory, then restart the gateway. |
find ~/.openclaw/agents/main/sessions/ -name "*.jsonl" \
-exec grep -l "<peer-id>" {} \; -delete
rm -f ~/.openclaw/workspace/memory/$(date +%Y-%m-%d)*.md
openclaw gateway restart