Oliver Wolfson | AI Automation & Local-First AI

The Core Problem

Text-to-image generators like Flux or Midjourney can produce stunning, photorealistic results. But ask them to render the same object twice from different angles and you will get two different objects. The geometry shifts, details change, proportions drift. For any product-focused use case — e-commerce, advertising, product visualization — this is a dealbreaker.

The standard industry solution is real photography or 3D modelling by hand. Both are slow and expensive. The emerging solution is a three-stage pipeline: generate a 3D model first, render it as reference images from multiple angles, then use those reference images to drive photorealistic generation. The 3D model enforces consistency; the image model provides the photorealism.

It sounds like it would require five different accounts and three different billing relationships. With fal.ai, it does not.

The Stack

Everything in this pipeline is available on fal.ai under a single API key and pay-as-you-go billing. You can query the pricing for any endpoint programmatically:

curl "https://api.fal.ai/v1/models/pricing?endpoint_id=fal-ai/flux-pro/v1.1" \
  -H "Authorization: Key $FAL_KEY"

The four layers of the pipeline:

Stage	Model	fal.ai Endpoint
3D generation	Meshy-6	`fal-ai/meshy/v6/...`
Reference rendering	Three.js / Blender (local)	—
Photorealistic image	Flux 1.1 Pro / Flux Kontext	`fal-ai/flux-pro/v1.1`
Animated video	Kling 3.0 Pro	`fal-ai/kling-video/v3/pro/...`

Stage 1: 3D Model Generation with Meshy

Meshy-6 is the current default on fal.ai and generates production-ready 3D models from either a text prompt or a reference image (or both). For product workflows, image-to-3D is almost always the better path — you start from something real.

The multi-image endpoint accepts up to four photos of the same object from different angles, which dramatically improves geometry accuracy on the back and sides:

const result = await fal.subscribe("fal-ai/meshy/v5/multi-image-to-3d", {
  input: {
    image_urls: [
      "https://yourdomain.com/product-front.jpg",
      "https://yourdomain.com/product-side.jpg",
      "https://yourdomain.com/product-back.jpg",
    ],
    ai_model: "meshy-6",
    should_texture: true,
    enable_pbr: true,      // metallic, roughness, normal maps
    texture_richness: "high",
  },
});

// result.data.model_urls.glb — download and save this
console.log(result.data.model_urls.glb);

The output is a GLB file with full PBR texture maps: base color, roughness, metallic, and normal. PBR is what makes metal look like metal and glass look like glass in downstream rendering. Do not skip it.

Cost: approximately $0.05–0.15 per model depending on texture resolution. Failed generations are not charged.

Stage 2: Multi-Angle Reference Rendering

This is the only stage that happens off fal.ai. You need to render your GLB from several camera positions to create the reference images that will drive the photorealistic generation stage.

The fastest path is Three.js in a Node.js environment. For higher fidelity — particularly for objects with complex reflections — Blender's Python API gives you more lighting control and production-quality output.

Minimal Three.js render script:

const angles = [
  { name: "front",        azimuth: 0,   elevation: 15 },
  { name: "three-quarter", azimuth: 45,  elevation: 25 },
  { name: "side",         azimuth: 90,  elevation: 15 },
  { name: "top",          azimuth: 0,   elevation: 75 },
  { name: "detail",       azimuth: 30,  elevation: 5  },
];

// Load GLB, position camera for each angle, render to PNG
// Output: front.png, three-quarter.png, side.png, top.png, detail.png

Use a neutral HDRI or three-point studio lighting for the renders. The goal at this stage is not photorealism — it is accurate geometry and silhouette. The image model handles the photorealism in the next step.

Recommended output: 1024×1024 PNG, white or neutral grey background, no compression.

Stage 3: Photorealistic Image Generation with Flux Kontext

Flux Kontext is the model to use here. Unlike standard text-to-image, Kontext accepts a reference image and generates a new image that maintains the structure, composition, and identity of the reference while dramatically improving surface quality, lighting, and realism.

You feed it one of your rendered reference PNGs and prompt for the lighting and surface quality you want. The 3D render constrains the geometry; Flux provides the photorealism.

async function generatePhotorealistic(renderPath, angle) {
  // Upload the reference render
  const imageFile = fs.readFileSync(renderPath);
  const uploaded = await fal.storage.upload(
    new Blob([imageFile], { type: "image/png" })
  );

  const result = await fal.subscribe("fal-ai/flux-kontext/dev", {
    input: {
      image_url: uploaded,
      prompt: `photorealistic product photography, studio lighting, 
               clean white background, sharp focus, commercial quality, 
               8K detail, ${angle} angle`,
      negative_prompt: "3D render, CGI look, plastic, toy, amateur",
      num_inference_steps: 30,
      guidance_scale: 7.5,
    },
  });

  return result.data.images[0].url;
}

// Run for all angles
const angles = ["front", "three-quarter", "side", "top", "detail"];
for (const angle of angles) {
  const url = await generatePhotorealistic(`./renders/${angle}.png`, angle);
  console.log(`${angle}: ${url}`);
}

Cost: approximately $0.025–0.05 per image on Flux 1.1 Pro. For a five-angle set, you are looking at roughly $0.15–0.25 total.

Prompting notes:

Be explicit about lighting style: "studio softbox", "dramatic side lighting", "product photography on white"
Include negative prompts that push away from the CGI look: "3D render, plastic, fake, artificial"
For reflective or metallic subjects, add: "realistic metal reflections, physically accurate materials"

Stage 4: Animated Video with Kling 3.0

Once you have photorealistic reference images, you can drive video generation from them. This is where the pipeline earns its consistency advantage — because the video model is animating from a real-looking reference rather than generating from text alone, the subject stays stable across frames.

Kling 3.0 Pro on fal.ai supports image-to-video with strong subject consistency and native audio generation in a single pass. For product visualization, the most useful motion types are slow turntable rotations, dramatic light sweeps, and close-up detail reveals.

async function generateProductVideo(imageUrl, motionType) {
  const prompts = {
    turntable: "slow 360-degree rotation, studio lighting, product photography, smooth camera movement, no camera shake",
    lightSweep: "studio light sweeping across surface, dramatic specular highlights, slow motion, cinematic",
    detailReveal: "slow macro zoom into surface detail, shallow depth of field, studio environment",
  };

  const result = await fal.subscribe("fal-ai/kling-video/v1/pro/image-to-video", {
    input: {
      image_url: imageUrl,
      prompt: prompts[motionType],
      negative_prompt: "blur, shaky camera, distortion, artifacts, flickering",
      duration: "5",          // 5 or 10 seconds
      aspect_ratio: "1:1",    // square for product use
      cfg_scale: 0.5,
    },
    logs: true,
    onQueueUpdate: (update) => {
      if (update.status === "IN_PROGRESS") {
        update.logs.map((log) => log.message).forEach(console.log);
      }
    },
  });

  return result.data.video.url;
}

// Generate from your best angle reference
const videoUrl = await generateProductVideo(frontImageUrl, "turntable");

Cost: approximately $0.10/second on Kling 3.0 via fal.ai. A 5-second clip is roughly $0.50.

Notes on jewelry and highly reflective subjects: Keep clips short (3–5 seconds) and camera motion slow. Refractions and complex reflections are still the hardest thing for video models to maintain frame-to-frame. Simple, deliberate motion hides any temporal inconsistency that occurs.

The Complete Pipeline

fal.config({ credentials: process.env.FAL_KEY });

async function runPipeline(productImagePaths) {
  console.log("Stage 1: Generating 3D model...");
  const model3d = await generate3DModel(productImagePaths);
  
  console.log("Stage 2: Rendering reference angles...");
  const renders = await renderAngles(model3d.glbUrl); // local Blender/Three.js
  
  console.log("Stage 3: Generating photorealistic images...");
  const images = await Promise.all(
    renders.map((render) => generatePhotorealistic(render.path, render.angle))
  );
  
  console.log("Stage 4: Generating video...");
  const video = await generateProductVideo(images[1].url, "turntable"); // three-quarter angle
  
  return { images, video };
}

Cost Summary

For a complete 5-angle image set plus one 5-second video, running entirely through fal.ai:

Stage	Cost
3D model (Meshy-6, with PBR)	~$0.10
Photorealistic images × 5 (Flux 1.1 Pro)	~$0.15
Animated video, 5s (Kling 3.0 Pro)	~$0.50
Total per product	~$0.75

For high-volume workflows, fal.ai offers enterprise pricing with per-endpoint volume discounts. The pricing API lets you query costs before committing to a generation, which is useful for building cost estimates into your application.

Why fal.ai as the Backbone

The case for routing everything through fal.ai rather than managing separate vendor relationships:

Single API key. Meshy, Flux, and Kling each have their own native APIs, their own authentication patterns, their own billing dashboards. fal.ai consolidates all of them behind one key and one billing account.

Pay-as-you-go. No monthly minimums, no per-vendor subscriptions. You pay for what you generate. Credits do not expire.

Model switching without re-integration. When Flux 2 ships or Kling 4.0 launches, you change one string in your endpoint call. The integration pattern stays identical.

Speed. fal.ai runs custom CUDA kernels for Flux-family models and is consistently the fastest inference option for image generation. For pipelines where you are generating multiple angles in parallel, this matters.

Queryable pricing. You can hit the pricing API before every generation to get current costs and build accurate cost tracking into your application.

The one trade-off: if you need the absolute latest Seedance or Wan models on day one of release, a platform like WaveSpeed sometimes has earlier access through direct ByteDance partnerships. For Flux and Kling — the two workhorses of this pipeline — fal.ai is the right home.

Use Cases This Pattern Fits

The pipeline is generic. The same approach that works for product photography works for:

E-commerce catalogues — consistent multi-angle shots of physical products at scale
Jewelry and luxury goods — the consistency advantage is largest where brand accuracy matters most
Architecture and interiors — render building models from multiple viewpoints, upscale to photorealism
Character design — generate a consistent character model, produce reference sheets and animated clips
Industrial and technical products — CAD exports to GLB, then photorealistic marketing imagery without a photography budget

The underlying logic is the same in every case: use 3D to enforce geometry and consistency, use diffusion models to enforce photorealism, and let fal.ai hold the whole thing together.

Quick Reference: Key fal.ai Endpoints

3D Generation:    fal-ai/meshy/v5/multi-image-to-3d
                  fal-ai/meshy/v6/text-to-3d

Image (quality):  fal-ai/flux-pro/v1.1
Image (editing):  fal-ai/flux-kontext/dev
Image (fast):     fal-ai/flux/schnell

Video:            fal-ai/kling-video/v1/pro/image-to-video
                  fal-ai/kling-video/v3/pro/image-to-video
                  fal-ai/veo3 (if cinematic quality + audio needed)

Pricing API:      https://api.fal.ai/v1/models/pricing?endpoint_id={endpoint}

The models in this space move fast. Check fal.ai/explore/models for the current roster and fal.ai/docs for endpoint-level documentation. The pipeline pattern stays stable even as the individual models improve.