← All skills
Tencent SkillHub Β· AI

Sogni Gen

Generate images **and videos** using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Generate images **and videos** using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "...

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
LICENSE, README.md, SKILL.md, llm.txt, skill-package.json, version.mjs

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
1.5.16

Documentation

ClawHub primary doc Primary doc: SKILL.md 30 sections Open source page

Sogni Image & Video Generation

Generate images and videos using Sogni AI's decentralized GPU network.

Setup

Get Sogni credentials at https://app.sogni.ai/ Create credentials file: mkdir -p ~/.config/sogni cat > ~/.config/sogni/credentials << 'EOF' SOGNI_API_KEY=your_api_key # or: # SOGNI_USERNAME=your_username # SOGNI_PASSWORD=your_password EOF chmod 600 ~/.config/sogni/credentials You can also export SOGNI_API_KEY, or SOGNI_USERNAME + SOGNI_PASSWORD, instead of writing the file. Install dependencies (if cloned): cd /path/to/sogni-gen npm i Or install from npm (no git clone): mkdir -p ~/.clawdbot/skills cd ~/.clawdbot/skills npm i sogni-gen ln -sfn node_modules/sogni-gen sogni-gen When this skill is distributed via ClawHub, it bootstraps its local runtime dependencies from skill-package.json during install. That avoids relying on a root package.json being present in the published skill artifact.

Filesystem Paths and Overrides

Default file paths used by this skill: Credentials file (read): ~/.config/sogni/credentials Last render metadata (read/write): ~/.config/sogni/last-render.json OpenClaw config (read): ~/.openclaw/openclaw.json Media listing for --list-media (read): ~/.clawdbot/media/inbound MCP local result copies (write): ~/Downloads/sogni Path override environment variables: SOGNI_CREDENTIALS_PATH SOGNI_LAST_RENDER_PATH SOGNI_MEDIA_INBOUND_DIR OPENCLAW_CONFIG_PATH SOGNI_DOWNLOADS_DIR (MCP) SOGNI_MCP_SAVE_DOWNLOADS=0 to disable MCP local file writes SOGNI_ALLOWED_DOWNLOAD_HOSTS to override which HTTPS hosts the MCP server may auto-download and save locally

Usage (Images & Video)

# Generate and get URL node sogni-gen.mjs "a cat wearing a hat" # Save to file node sogni-gen.mjs -o /tmp/cat.png "a cat wearing a hat" # JSON output (for scripting) node sogni-gen.mjs --json "a cat wearing a hat" # Check token balances (no prompt required) node sogni-gen.mjs --balance # Check token balances in JSON node sogni-gen.mjs --json --balance # Quiet mode (suppress progress) node sogni-gen.mjs -q -o /tmp/cat.png "a cat wearing a hat"

Options

FlagDescriptionDefault-o, --output <path>Save to fileprints URL-m, --model <id>Model IDz_image_turbo_bf16-w, --width <px>Width512-h, --height <px>Height512-n, --count <num>Number of images1-t, --timeout <sec>Timeout seconds30 (300 for video)-s, --seed <num>Specific seedrandom--last-seedReuse seed from last render---seed-strategy <s>Seed strategy: random|prompt-hashprompt-hash--multi-angleMultiple angles LoRA mode (Qwen Image Edit)---angles-360Generate 8 azimuths (front -> front-left)---angles-360-videoAssemble looping 360 mp4 using i2v between angles (requires ffmpeg)---azimuth <key>front|front-right|right|back-right|back|back-left|left|front-leftfront--elevation <key>low-angle|eye-level|elevated|high-angleeye-level--distance <key>close-up|medium|widemedium--angle-strength <n>LoRA strength for multiple_angles0.9--angle-description <text>Optional subject description---steps <num>Override steps (model-dependent)---guidance <num>Override guidance (model-dependent)---output-format <f>Image output format: png|jpgpng--sampler <name>Sampler (model-dependent)---scheduler <name>Scheduler (model-dependent)---lora <id>LoRA id (repeatable, edit only)---loras <ids>Comma-separated LoRA ids---lora-strength <n>LoRA strength (repeatable)---lora-strengths <n>Comma-separated LoRA strengths---token-type <type>Token type: spark|sognispark--balance, --balancesShow SPARK/SOGNI balances and exit--c, --context <path>Context image for editing---last-imageUse last generated image as context/ref---video, -vGenerate video instead of image---workflow <type>Video workflow (t2v|i2v|s2v|ia2v|a2v|v2v|animate-move|animate-replace)inferred--fps <num>Frames per second (video)16--duration <sec>Duration in seconds (video)5--frames <num>Override total frames (video)---auto-resize-assetsAuto-resize video assetstrue--no-auto-resize-assetsDisable auto-resize---estimate-video-costEstimate video cost and exit (requires --steps)---photoboothFace transfer mode (InstantID + SDXL Turbo)---cn-strength <n>ControlNet strength (photobooth)0.8--cn-guidance-end <n>ControlNet guidance end point (photobooth)0.3--ref <path>Reference image for video or photobooth facerequired for video/photobooth--ref-end <path>End frame for i2v interpolation---ref-audio <path>Reference audio for s2v---ref-video <path>Reference video for animate/v2v workflows---controlnet-name <name>ControlNet type for v2v: canny|pose|depth|detailer---controlnet-strength <n>ControlNet strength for v2v (0.0-1.0)0.8--sam2-coordinates <coords>SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2)---trim-end-frameTrim last frame for seamless video stitching---first-frame-strength <n>Keyframe strength for start frame (0.0-1.0)---last-frame-strength <n>Keyframe strength for end frame (0.0-1.0)---lastShow last render info---jsonJSON outputfalse--strict-sizeDo not auto-adjust i2v video size for reference resizing constraintsfalse-q, --quietNo progress outputfalse--extract-last-frame <video> <image>Extract last frame from video (safe ffmpeg wrapper)---concat-videos <out> <clips...>Concatenate video clips (safe ffmpeg wrapper)---list-media [type]List recent inbound media (images|audio|all)images

OpenClaw Config Defaults

When installed as an OpenClaw plugin, sogni-gen will read defaults from: ~/.openclaw/openclaw.json { "plugins": { "entries": { "sogni-gen": { "enabled": true, "config": { "defaultImageModel": "z_image_turbo_bf16", "defaultEditModel": "qwen_image_edit_2511_fp8_lightning", "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad", "videoModels": { "t2v": "wan_v2.2-14b-fp8_t2v_lightx2v", "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v", "s2v": "wan_v2.2-14b-fp8_s2v_lightx2v", "ia2v": "ltx2-19b-fp8_ia2v_distilled", "a2v": "ltx2-19b-fp8_a2v_distilled", "animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v", "animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v", "v2v": "ltx2-19b-fp8_v2v_distilled" }, "defaultVideoWorkflow": "t2v", "defaultNetwork": "fast", "defaultTokenType": "spark", "seedStrategy": "prompt-hash", "modelDefaults": { "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 }, "flux2_dev_fp8": { "steps": 20, "guidance": 7.5 } }, "defaultWidth": 768, "defaultHeight": 768, "defaultCount": 1, "defaultFps": 16, "defaultDurationSec": 5, "defaultImageTimeoutSec": 30, "defaultVideoTimeoutSec": 300, "credentialsPath": "~/.config/sogni/credentials", "lastRenderPath": "~/.config/sogni/last-render.json", "mediaInboundDir": "~/.clawdbot/media/inbound" } } } } } CLI flags always override these defaults. If your OpenClaw config lives elsewhere, set OPENCLAW_CONFIG_PATH. Seed strategies: prompt-hash (deterministic) or random.

Image Models

ModelSpeedUse Casez_image_turbo_bf16Fast (~5-10s)General purpose, defaultflux1-schnell-fp8Very fastQuick iterationsflux2_dev_fp8Slow (~2min)High qualitychroma-v.46-flash_fp8MediumBalancedqwen_image_edit_2511_fp8MediumImage editing with context (up to 3)qwen_image_edit_2511_fp8_lightningFastQuick image editingcoreml-sogniXLturbo_alpha1_adFastPhotobooth face transfer (SDXL Turbo)

WAN 2.2 Models

ModelSpeedUse Casewan_v2.2-14b-fp8_i2v_lightx2vFastDefault video generationwan_v2.2-14b-fp8_i2vSlowHigher quality videowan_v2.2-14b-fp8_t2v_lightx2vFastText-to-videowan_v2.2-14b-fp8_s2v_lightx2vFastSound-to-videowan_v2.2-14b-fp8_animate-move_lightx2vFastAnimate-movewan_v2.2-14b-fp8_animate-replace_lightx2vFastAnimate-replace

LTX-2 / LTX-2.3 Models

ModelSpeedUse Caseltx2-19b-fp8_t2v_distilledFast (~2-3min)Text-to-video, 8-stepltx2-19b-fp8_t2vMedium (~5min)Text-to-video, 20-step qualityltx2-19b-fp8_i2v_distilledFast (~2-3min)Image-to-video, 8-stepltx2-19b-fp8_i2vMedium (~5min)Image-to-video, 20-step qualityltx2-19b-fp8_ia2v_distilledFast (~2-3min)Image+audio-to-videoltx2-19b-fp8_a2v_distilledFast (~2-3min)Audio-to-videoltx2-19b-fp8_v2v_distilledFast (~3min)Video-to-video with ControlNetltx2-19b-fp8_v2vMedium (~5min)Video-to-video with ControlNet, qualityltx23-22b-fp8_t2v_distilledFast (~2-3min)Text-to-video, LTX-2.3

Image Editing with Context

Edit images using reference images (Qwen models support up to 3): # Single context image node sogni-gen.mjs -c photo.jpg "make the background a beach" # Multiple context images (subject + style) node sogni-gen.mjs -c subject.jpg -c style.jpg "apply the style to the subject" # Use last generated image as context node sogni-gen.mjs --last-image "make it more vibrant" When context images are provided without -m, defaults to qwen_image_edit_2511_fp8_lightning.

Photobooth (Face Transfer)

Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use --photobooth with --ref pointing to their face image. # Basic photobooth node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait" # Multiple outputs node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot" # Custom ControlNet tuning node sogni-gen.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting" Uses SDXL Turbo (coreml-sogniXLturbo_alpha1_ad) at 1024x1024 by default. The face image is passed via --ref and styled according to the prompt. Cannot be combined with --video or -c/--context. Agent usage: # Photobooth: stylize a face photo node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait" # Multiple photobooth outputs node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"

Multiple Angles (Turnaround)

Generate specific camera angles from a single reference image using the Multiple Angles LoRA: # Single angle node sogni-gen.mjs --multi-angle -c subject.jpg \ --azimuth front-right --elevation eye-level --distance medium \ --angle-strength 0.9 \ "studio portrait, same person" # 360 sweep (8 azimuths) node sogni-gen.mjs --angles-360 -c subject.jpg --distance medium --elevation eye-level \ "studio portrait, same person" # 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg) node sogni-gen.mjs --angles-360 --angles-360-video /tmp/turntable.mp4 \ -c subject.jpg --distance medium --elevation eye-level \ "studio portrait, same person" The prompt is auto-built with the required <sks> token plus the selected camera angle keywords. --angles-360-video generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.

360 Video Best Practices

When a user requests a "360 video", follow this workflow: Default camera parameters (do not ask unless they specify): Elevation: default to medium Distance: default to medium Map user terms to flags: User saysFlag value"high" angle--elevation high-angle"medium" angle--elevation eye-level"low" angle--elevation low-angle"close"--distance close-up"medium" distance--distance medium"far"--distance wide Always use first-frame/last-frame stitching - the --angles-360-video flag automatically handles this by generating i2v clips between consecutive angles including last→first for seamless looping. Example command: node sogni-gen.mjs --angles-360 --angles-360-video /tmp/output.mp4 \ -c /path/to/image.png --elevation eye-level --distance medium \ "description of subject"

Transition Video Rule

For any transition video work, always use the Sogni skill/plugin (not raw ffmpeg or other shell commands). Use the built-in --extract-last-frame, --concat-videos, and --looping flags for video manipulation.

Insufficient Funds Handling

When you see "Debit Error: Insufficient funds", reply: "Insufficient funds. Claim 50 free daily Spark points at https://app.sogni.ai/"

Video Generation

Generate videos from a reference image: # Text-to-video (t2v) node sogni-gen.mjs --video "ocean waves at sunset" # Basic video from image node sogni-gen.mjs --video --ref cat.jpg -o cat.mp4 "cat walks around" # Use last generated image as reference node sogni-gen.mjs --last-image --video "gentle camera pan" # Custom duration and FPS node sogni-gen.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out slowly" # Sound-to-video (s2v) node sogni-gen.mjs --video --ref face.jpg --ref-audio speech.m4a \ -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head" # Image+audio-to-video (ia2v, LTX) node sogni-gen.mjs --video --workflow ia2v --ref cover.jpg --ref-audio song.mp3 \ "music video with synchronized motion" # Audio-to-video (a2v, LTX) node sogni-gen.mjs --video --workflow a2v --ref-audio song.mp3 \ "abstract audio-reactive visualizer" # LTX-2.3 text-to-video node sogni-gen.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \ "A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail." # Animate (motion transfer) node sogni-gen.mjs --video --ref subject.jpg --ref-video motion.mp4 \ --workflow animate-move "transfer motion"

Video-to-Video (V2V) with ControlNet

Transform an existing video using LTX-2 models with ControlNet guidance: # Basic v2v with canny edge detection node sogni-gen.mjs --video --workflow v2v --ref-video input.mp4 \ --controlnet-name canny "stylized anime version" # V2V with pose detection and custom strength node sogni-gen.mjs --video --workflow v2v --ref-video dance.mp4 \ --controlnet-name pose --controlnet-strength 0.7 "robot dancing" # V2V with depth map node sogni-gen.mjs --video --workflow v2v --ref-video scene.mp4 \ --controlnet-name depth "watercolor painting style" ControlNet types: canny (edge detection), pose (body pose), depth (depth map), detailer (detail enhancement).

Photo Restoration

Restore damaged vintage photos using Qwen image editing: # Basic restoration sogni-gen -c damaged_photo.jpg -o restored.png \ "professionally restore this vintage photograph, remove damage and scratches" # Detailed restoration with preservation hints sogni-gen -c old_photo.jpg -o restored.png -w 1024 -h 1280 \ "restore this vintage photo, remove peeling, tears and wear marks, \ preserve natural features and expression, maintain warm nostalgic color tones" Tips for good restorations: Describe the damage: "peeling", "scratches", "tears", "fading" Specify what to preserve: "natural features", "eye color", "hair", "expression" Mention the era for color tones: "1970s warm tones", "vintage sepia" Finding received images (Telegram/etc): node {{skillDir}}/sogni-gen.mjs --json --list-media images Do NOT use ls, cp, or other shell commands to browse user files. Always use --list-media to find inbound media.

IMPORTANT KEYWORD RULE

If the user message includes the word "photobooth" (case-insensitive), always use --photobooth mode with --ref set to the user-provided face image. Prioritize this rule over generic image-edit flows (-c) for that request.

LTX-2.3 Prompt Rule

Whenever the chosen video model is ltx23-22b-fp8_t2v_distilled, do not pass the user's short request through unchanged. Rewrite it into an LTX-2.3-safe prompt before calling sogni-gen. Output one single paragraph only. No line breaks, bullet points, section labels, tag lists, or screenplay formatting. Use 4-8 flowing present-tense sentences describing one continuous shot. No cuts, montage, or unrelated scene jumps. Start with shot scale plus the scene's visual identity, then describe environment, time of day, atmosphere, textures, and specific light sources. Keep people, clothing, props, and locations concrete and stable across the whole paragraph. Give the scene one main action thread from start to finish. Use connectors like as, while, and then so motion reads as a continuous filmed moment. If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line. Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally. Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like beautiful or nice, or structural markup such as [DIALOGUE]. Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events. Preserve the user's request, but expand it into cinematic prose. Do not invent a different story just to make the prompt longer.

Duration-Aware Pacing

Match scene density to clip length so prompts stay filmable: About 1-4s: describe exactly 1 action or moment. About 5-8s: describe about 2 sequential actions. About 9-12s: describe about 3 sequential actions. Longer clips: add only a small number of additional sequential beats. Do not turn the prompt into a montage or a full story arc unless the duration clearly supports it.

Orientation Mapping

When the user explicitly asks for an orientation or aspect ratio, map it to safe LTX dimensions: vertical, portrait, story, reel, tiktok -> -w 1088 -h 1920 landscape, horizontal, widescreen, youtube, 16:9 -> -w 1920 -h 1088 square, 1:1 -> -w 1088 -h 1088 4:3 portrait -> -w 832 -h 1088 4:3 landscape -> -w 1088 -h 832

Camera Language Normalization

When the user uses loose camera language, translate it into concrete motion phrasing inside the prose prompt: zoom in -> slow push-in zoom out -> slow pull-back pan left / pan right -> smooth pan left / smooth pan right orbit / circle around -> slow arc left or slow arc right follow -> tracking follow Short example: User ask: "4k video of a woman in a neon alley" Use this shape instead: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."

Agent Usage

When user asks to generate/draw/create an image: # Generate and save locally node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "user's prompt" # Edit an existing image node {{skillDir}}/sogni-gen.mjs -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style" # Generate video from image node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes." # Generate text-to-video node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene." # HD / "4K" text-to-video: prefer LTX-2.3 node {{skillDir}}/sogni-gen.mjs -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy." # HD / "4K" image-to-video: prefer LTX i2v node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -m ltx2-19b-fp8_i2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent." # Photobooth: stylize a face photo node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait" # Check current SPARK/SOGNI balances (no prompt required) node {{skillDir}}/sogni-gen.mjs --json --balance # Find user-sent images/audio node {{skillDir}}/sogni-gen.mjs --json --list-media images # Then send via message tool with filePath

High-Res Video Routing

When the user asks for video in "hd", "1080p", "4k", "uhd", or "high-res", do not use the default WAN video models. For text-to-video, use -m ltx23-22b-fp8_t2v_distilled. For image-to-video, use -m ltx2-19b-fp8_i2v_distilled. Prefer LTX-sized dimensions such as -w 1920 -h 1088. If the user explicitly asks for vertical, portrait, story, reel, tiktok, square, or 4:3, apply the matching dimensions from the Orientation Mapping rules instead of defaulting to 16:9. Rewrite the user's request using the LTX-2.3 Prompt Rule before invoking the command. Do not send short slogan-style prompts to LTX. Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even if the exact output is not literal 3840x2160. Security: Agents must use the CLI's built-in flags (--extract-last-frame, --concat-videos, --list-media) for all file operations and video manipulation. Never run raw shell commands (ffmpeg, ls, cp, etc.) directly.

Animate Between Two Images (First-Frame / Last-Frame)

When a user asks to animate between two images, use --ref (first frame) and --ref-end (last frame) to create a creative interpolation video: # Animate from image A to image B node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/imageA.png --ref-end /tmp/imageB.png -o /tmp/transition.mp4 "descriptive prompt of the transition"

Animate a Video to an Image (Scene Continuation)

When a user asks to animate from a video to an image (or "continue" a video into a new scene): Extract the last frame of the existing video using the built-in safe wrapper: node {{skillDir}}/sogni-gen.mjs --extract-last-frame /tmp/existing.mp4 /tmp/lastframe.png Generate a new video using the last frame as --ref and the target image as --ref-end: node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/lastframe.png --ref-end /tmp/target.png -o /tmp/continuation.mp4 "scene transition prompt" Concatenate the videos using the built-in safe wrapper: node {{skillDir}}/sogni-gen.mjs --concat-videos /tmp/full_sequence.mp4 /tmp/existing.mp4 /tmp/continuation.mp4 This ensures visual continuity β€” the new clip picks up exactly where the previous one ended. Do NOT run raw ffmpeg commands. Always use --extract-last-frame and --concat-videos for video manipulation. Always apply this pattern when: User says "animate image A to image B" β†’ use --ref A --ref-end B User says "animate this video to this image" β†’ extract last frame, use as --ref, target image as --ref-end, then stitch User says "continue this video" with a target image β†’ same as above

JSON Output

{ "success": true, "prompt": "a cat wearing a hat", "model": "z_image_turbo_bf16", "width": 512, "height": 512, "urls": ["https://..."], "localPath": "/tmp/cat.png" } On error (with --json), the script returns a single JSON object like: { "success": false, "error": "Video width and height must be divisible by 16 (got 500x512).", "errorCode": "INVALID_VIDEO_SIZE", "hint": "Choose --width/--height divisible by 16. For i2v, also match the reference aspect ratio." } Balance check example (--json --balance): { "success": true, "type": "balance", "spark": 12.34, "sogni": 0.56 }

Cost

Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient.

Troubleshooting

Auth errors: Check SOGNI_API_KEY or the credentials in ~/.config/sogni/credentials i2v sizing gotchas: Video sizes are constrained (min 480px, max 1536px, divisible by 16). For i2v, the client wrapper resizes the reference (fit: inside) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size (example: 1024x1536 requested but ref becomes 1024x1535). Auto-adjustment: With a local --ref, the script will auto-adjust the requested size to avoid non-16 resized reference dimensions. If the script adjusts your size but you want to fail instead: pass --strict-size and it will print a suggested --width/--height. Timeouts: Try a faster model or increase -t timeout No workers: Check https://sogni.ai for network status

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs2 Files1 Scripts1 Config
  • SKILL.md Primary doc
  • README.md Docs
  • version.mjs Scripts
  • skill-package.json Config
  • LICENSE Files
  • llm.txt Files