prunaai/p-image-trainer

Fast LoRA trainer for p-image, a super fast text-to-image model developed by Pruna AI. Use LoRAs here: https://replicate.com/prunaai/p-image-lora. Find or contribute LoRAs here: https://huggingface.co/collections/PrunaAI/p-image

101 runs

Readme

Dataset Preparation Guide

Text-to-Image Training (AI Toolkit)

This trainer endpoint fine-tunes a text-to-image diffusion model using ai-toolkit in the background. To achieve good results, your dataset must follow the structure described below.


1. Dataset Format (Required)

Your dataset must be a single folder containing images, where each image has a matching caption file.

Folder Structure

dataset/
├── image_001.jpg
├── image_001.txt
├── image_002.png
├── image_002.txt
├── image_003.webp
├── image_003.txt

Rules

  • Every image must have a .txt caption file
  • Caption files must share the exact same base filename as the image
  • Images without captions will be ignored
  • Caption files without images will be ignored

2. Caption Files (.txt)

Each .txt file contains the text description of the image, similar to a prompt used at inference time.

Example

image_001.txt

a photo of <my_concept> wearing a black hoodie, studio lighting, high detail

Caption Guidelines

  • Describe what you want the model to learn
  • Be clear and concise
  • Natural language works best
  • Multi-line captions are allowed, but treated as plain text

If you are training a: - person - character - product - specific visual concept

use a unique trigger word in every caption.

Example

a portrait photo of <my_concept>, 35mm lens, shallow depth of field

Later, you can prompt the trained model with:

a cinematic portrait of <my_concept>

Trigger Word Rules

  • Must be unique
  • Should not exist in the base model vocabulary
  • Must be used consistently in all captions

4. Image Requirements

ai-toolkit automatically handles resizing and bucketing, so manual preprocessing is not required.

  • Minimum resolution: 512×512
  • Formats: .jpg, .png, .webp
  • High-quality images with varied:
  • angles
  • lighting
  • backgrounds
  • poses or expressions

Avoid

  • Duplicates
  • Watermarks or logos
  • Text overlays
  • Very low-resolution or blurry images

5. Dataset Size Recommendations

Use Case Recommended Images
Person / Character 15–40
Product 20–50
Style / Aesthetic 30–100
General Concept 50+

Quality and diversity matter more than raw quantity.


6. What Not to Include

  • Nested folders inside the dataset
  • Missing or mismatched caption files
  • Reused or copy-pasted captions
  • Copyrighted material you do not own the rights to
  • NSFW or disallowed content

7. Uploading the Dataset

Once your dataset folder is ready:

  1. Upload the dataset folder to the trainer endpoint
  2. Specify your trigger word (if applicable)
  3. Start training

The trainer will validate the dataset before launching the training job.


8. Minimal Example Dataset

my_dataset/
├── img1.jpg
├── img1.txt   → a photo of <my_concept> smiling, outdoor lighting
├── img2.jpg
├── img2.txt   → side profile of <my_concept>, soft light, 85mm lens
├── img3.jpg
├── img3.txt   → studio portrait of <my_concept>, neutral background

Need Help?

If you are unsure whether your dataset is correctly structured or want feedback on captions, reach out before starting training — fixing dataset issues early saves time and compute.

Model created
Model updated