The gap between a mediocre Kling output and something that looks like it belongs in a real ad campaign comes down almost entirely to how you structure the prompt. This is not about finding magic words - it is about understanding what information Kling needs to make a decision and giving it that information before it makes the wrong one.
After months of testing Kling for consumer product ad content, a clear structure emerges. Every prompt that works has four components, in this order: environment, lighting, camera movement, and product behavior. Leave any of them out and Kling fills the gaps itself. It usually fills them in wrong.
The difference is stark:
Weak prompt: "a bottle of perfume on a table"
Strong prompt: "a glass perfume bottle on a dark marble surface, soft directional studio lighting from the left creating a single highlight along the bottle edge, slow push in toward the bottle, light mist rising from the cap"
Same subject. Completely different output. Here is how each component works.
Environment
Be specific about surface materials. Marble, raw concrete, aged oak, brushed steel, white acrylic. Kling responds well to material descriptions because they carry implicit lighting and texture information. "A kitchen counter" tells it almost nothing. "A white quartz countertop with subtle veining" gives it something to render against.
For lifestyle product shots, describe the environment the way a set designer would. What is in the background? How far back is it? Is it in focus or soft? "Out of focus warm kitchen interior in the background, depth of field shallow" gets you much closer to a real ad look than "kitchen setting."
The more physically grounded your environment description is, the less Kling has to invent - and the less it invents, the more consistent and controllable the output becomes.
Lighting
This is the single biggest lever for making something look premium versus cheap. Spend the most prompt detail here.
Terms that consistently produce clean results in Kling: soft box lighting, single source directional light, rim lighting, golden hour window light, dark studio with specular highlights, overcast diffused light.
For most product ads you are choosing between two setups. Either clean studio with controlled highlights - which reads as premium - or natural environmental light, which reads as lifestyle. Mixing them in the same prompt usually looks off. Pick one and commit to it.
For anything glass, liquid, or reflective: always include where the light source is and what it is hitting. "Backlit, light passing through the liquid creating a warm amber glow" gets you something cinematic. Without that instruction Kling tends to flatten the lighting on reflective surfaces, and flattened glass looks like plastic.
Camera Movement
Kling handles camera movement well but needs explicit instruction. Vague direction like "cinematic movement" produces inconsistent results. Be literal.
Movements that work well for product ads: - Slow push in - Slow pull back - Orbit right to left - Low angle push in - Top down slow zoom - Handheld subtle drift
For a reveal shot: "camera starts tight on the texture of the label, slowly pulls back to reveal the full bottle against the background"
For a hero shot: "camera orbits slowly around the product from right to left, product stays centered in frame, movement is slow and deliberate"
The more precisely you describe the start position, direction, and speed, the more Kling can execute what you actually want rather than what it thinks "cinematic" means.
Product Behavior
This is where most prompts fall short. If your product can do something, describe it happening. Liquid pouring, steam rising, fabric moving, powder dispersing, condensation forming on glass. These micro-moments are what make a product ad feel alive rather than a rotating 3D render.
For food and beverage: "condensation forming on the outside of the glass" and "slow pour with bubbles rising" do significant work for perceived quality. These details read as craft and freshness to a viewer even when they are not consciously registering them.
For skincare and beauty: "a single drop falling in slow motion toward the surface of the serum" is a reliable go-to that works across most product shapes.
For apparel: "fabric moving with a light breeze from off screen, movement is slow and natural" beats any static product placement. The movement implies real-world physics and that is what makes AI apparel video not look like AI apparel video.
Composition and Negative Space
Kling tends to fill the frame. If you want the clean ad aesthetic with breathing room, you need to ask for it explicitly. "Product occupying the lower third of the frame, upper two thirds clean background" or "centered composition with significant negative space on either side."
Aspect ratio matters too. For feed ads, 9:16 with the product centered and negative space at top and bottom for text overlay gives you something actually usable in a campaign without additional editing.
Maintaining Consistency Across Shots
If you are building a multi-shot ad and need the product to look the same across cuts, do not reference previous clips. Describe the product in identical physical terms in every single prompt. Treat each prompt as if the model has never seen the product before - because effectively it has not.
This means writing out "glass bottle with brushed silver cap, label in matte black with white serif typography" in every single prompt, not "the same bottle as before." The redundancy feels wrong but the output is consistent. That consistency is what makes multiple shots feel like a campaign rather than a collection of loosely related clips.
Quick Reference by Product Category
Beverages - backlit, condensation, pour or bubble movement, dark or white studio, slow push in
Skincare - soft box from above, drop or texture close up, clean white or stone surface, slow macro push in
Apparel - natural window light, fabric movement, lifestyle background out of focus, handheld drift
Supplements and wellness - dark moody studio, rim light, product centered, mist or powder element if relevant
Home goods - environmental context, warm natural light, lifestyle background, slow orbit
The structure is consistent across categories. What changes is which lighting setup, which surface material, and which product behavior makes sense for what the product is actually selling. A skincare serum is selling purity and science - that is clean studio, precise macro, single drop. A home fragrance candle is selling warmth and atmosphere - that is warm environmental light, soft focus background, gentle flame and smoke movement.
Once the four-component structure is muscle memory, the prompting work becomes faster and the output becomes predictable. Predictable output from an AI video model is the goal - it means you have creative control.