2025 Computer Vision and Pattern Recognition (CVPR 2025)
AIpparel is a multimodal foundation model for digital garments trained by fine-tuning a large
multimodal
model on a custom
sewing pattern dataset using a novel tokenization scheme for these patterns. AIpparel generates complex,
diverse,
high-quality sewing
patterns based on multimodal inputs, such as text and images, and it unlocks new applications such as
language-instructed sewing pattern
editing. The generated sewing patterns can be directly used to simulate the corresponding 3D garments.
Abstract
Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing
personal
style.
Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in
designing them.
To simplify this process, we introduce AIpparel, a large multimodal model for generating and editing sewing
patterns.
Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset
of
over 120,000 unique garments,
each with multimodal annotations including text, images, and sewing patterns.
Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so
that LLMs can learn to predict them efficiently.
AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and
image-to-garment prediction,
and it enables novel multimodal garment generation applications such as interactive garment editing.
Dataset
GarmentCodeData-Multimodal Dataset extends GarmentCodeData (GCD) with additional, rich annotations.
Specifically, we provide textual descriptions of garments, including a descriptive text that decribes the garment's style in detail,
and a speculative text that describes a suitable occasion for the garment. Moreover, we also provide pairs of garments that are edited version of each other.
They are paired with the suitable editing instructions.
Dataset Samples
The midi dress features a strapless top connected to a narrow waistband,
creating a defined high-rise waist. The skirt extends from the waistband,
maintaining a fitted silhouette, and includes a back slit for ease of movement.
The upper garment features a mini length with long sleeves and an elevated waistline.
It includes a curvy neckline and a similarly contoured back, enhancing its distinctive shape.
The garment is constructed with lapels at the front.
This upper garment is a mini-length style featuring short sleeves.
It is constructed with a short, narrow trapezoidal neckline.
The garment includes a waistband that cinches the waist and connects the upper and lower body panels.
The dress features an asymmetric neckline extending into an asymmetric top.
It has a single short sleeve on the right side.
The skirt portion is knee-length and constructed from five horizontal, tiered levels.
The midi jumpsuit features a strapless top with a bodice constructed from multiple panels. The garment extends to a midi length with wide-leg trousers.
The dress is of midi length and features short sleeves.
A waistband is positioned at the high-rise waist. The skirt section includes both a back slit and a side slit.
The upper garment features long sleeves and a short oval neckline in the front,
with a similarly shaped back neckline. It includes a godet skirt with 8 panels.
The dress features an asymmetric top with a neckline differing heights on either side.
Both sleeves are long, with the right sleeve creating an angled, looser fit.
Method
AIpparel uses a novel sewing pattern tokenizer (light blue region) to tokenize each panel into a set
of
special
tokens (light green region). Panel vertex positions and 3D transformations are incorporated using positional
embeddings (colored arrows) to the tokens. AIpparel takes in multimodal inputs, such as images and
texts
(light orange region), to output sewing patterns using autoregressive sampling (light grey region). Finally,
the
output is decoded to produce simulation-ready sewing patterns (light pink region).
Image to Sewing Pattern Reconstruction
Left: our model can reconstruct suitable sewing patterns from the input image alone.
In contrast, SewFormer does not produce simulation-ready sewing patterns despite fine-tuning.
Right: our model also achieves state-of-the-art performance on the existing SewFactory dataset.
Text to Sewing Pattern Generation
our model can generate sewing patterns following text descriptions. Our generated sewing patterns closely follow the textual details as highlighted.
Sewing Pattern Editing
Our model can take an existing sewing pattern and edit it based on textual instructions.
The resulting sewing pattern closely follow the style of the original garment while performing the desired editing.
Citation
@article{nakayama2024aipparel,
title={AIpparel: A Large Multimodal Generative Model for Digital Garments},
author={Kiyohiro Nakayama and Jan Ackermann and Timur Levent Kesdogan
and Yang Zheng and Maria Korosteleva and Olga Sorkine-Hornung and Leonidas Guibas
and Guandao Yang and Gordon Wetzstein},
journal = {Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}