Oliver Wolfson
ServicesProjectsContact

Development Services

SaaS apps · AI systems · MVP builds · Technical consulting

Services·Blog
© 2026 O. Wolf. All rights reserved.
webdevelopmentai
Building a Multimodal NL Input Playground
How I built a natural-language and image-driven demo builder that generates schemas, forms, and item detail views on the fly.
February 4, 2026•O. Wolfson

I have created a demonstration page that shows the full loop of "natural language in, structured data out" across different industries. The goal was not a one-off form. The goal was a demo builder that can generate a schema, render a form from that schema, and then accept both text and images to populate it.

This post walks through the architecture and the minimal implementation I shipped.

What the playground does

  • The user describes the demo they want.
  • The system generates a schema and form.
  • The user can submit either NL text or an image.
  • The input is parsed into the schema.
  • A human reviews and submits.
  • An item detail view is shown.

Route and UI The route lives at src/app/input-playground/page.tsx. The UI is a client component because it manages state for the schema, form data, image previews, and entries.

At a high level the page has five stages:

  • Demo prompt
  • Generated schema
  • Multimodal input
  • Review form
  • Item detail
// src/app/input-playground/page.tsx
export default function InputPlaygroundPage() {
  // 1) user describes the demo
  // 2) generate schema
  // 3) parse NL + image into schema
  // 4) render form for review
  // 5) show item detail
}

Schema generation (NL → schema) Schema creation is handled with a server action. I send a strict schema prompt and require JSON output. The result is sanitized and capped at a reasonable field count.

// src/app/actions/input-playground.ts
export async function generateDemoSchema(prompt: string) {
  const completion = await openai.chat.completions.create({
    model,
    messages: [
      { role: "system", content: system },
      { role: "user", content: `Demo description: ${prompt}` },
    ],
    response_format: { type: "json_object" },
  });

  return { success: true, schema: sanitizeSchema(parsed) };
}

Parsing NL text (NL → structured data) The same schema is used to map the user’s text input into structured data. I return only the keys that exist in the schema, and I coerce values into the correct field types for the form.

export async function parseDemoInputText(schema: DemoSchema, input: string) {
  const completion = await openai.chat.completions.create({
    model,
    messages: [
      { role: "system", content: buildSchemaInstructions(schema) },
      { role: "user", content: `Input: ${input}` },
    ],
    response_format: { type: "json_object" },
  });

  return { success: true, data: coerceParsedData(parsed, schema) };
}

Parsing images (image → structured data) Images flow through the same schema and mapping layer. The UI accepts uploads and also enables mobile camera capture. The image is sent as a base64 data URL.

export async function parseDemoInputImage(
  schema: DemoSchema,
  imageDataUrl: string,
) {
  const completion = await openai.chat.completions.create({
    model,
    messages: [
      { role: "system", content: buildSchemaInstructions(schema) },
      {
        role: "user",
        content: [
          { type: "text", text: "Extract the data from this image." },
          {
            type: "image_url",
            image_url: { url: imageDataUrl, detail: "high" },
          },
        ],
      },
    ],
    response_format: { type: "json_object" },
  });

  return { success: true, data: coerceParsedData(parsed, schema) };
}

The form is generated from a shared schema The schema is intentionally small and opinionated. It only supports a handful of field types, which keeps rendering predictable and the demo fast.

// src/lib/input-playground/types.ts
export type DemoFieldType =
  | "string"
  | "number"
  | "integer"
  | "boolean"
  | "date"
  | "enum"
  | "text";

Item detail view On submit, the entry is stored in client state (ephemeral). The detail view renders from that entry so the experience feels complete without needing a database.

const entry: DemoEntry = {
  id: crypto.randomUUID(),
  createdAt: new Date().toISOString(),
  schemaName: schema.name,
  data: formData,
  imageDataUrl: imagePreview || undefined,
};

Why this architecture works

  • It’s fast to demo because nothing depends on a database.
  • The schema provides a single source of truth for UI + parsing.
  • Multimodal input is additive, not divergent. Text and image both target the same schema.

What I want to add next

  • Schema validation summaries and required-field warnings before submit.
  • A simple “save as JSON” download so the output feels real.
  • Optional persistence (Supabase) for real inventory demos.

If you want to see it in action, visit /input-playground and describe the demo you want to generate.

Tags
#nextjs#openai#forms#schema#multimodal