AIpparel

AIpparel: A Large Multimodal Generative Model for Digital Garments

Kiyohiro Nakayama*, 1          Jan Ackermann*, 2          Timur Levent Kesdogan*, 2          Yang Zheng1          Maria Korosteleva2          Olga Sorkine-Hornung2          Leonidas Guibas1          Guandao Yang1          Gordon Wetzstein1

1Stanford University 2ETH Zürich

AIpparel is a multimodal generative model for digital garments trained by fine-tuning a large multimodal model on a custom sewing pattern dataset using a novel tokenization scheme for these patterns. AIpparel generates complex, diverse, high-quality sewing patterns based on multimodal inputs, such as text and images, and it unlocks new applications such as language-instructed sewing pattern editing. The generated sewing patterns can be directly used to simulate the corresponding 3D garments.

Abstract

Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a large multimodal model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and it enables novel multimodal garment generation applications such as interactive garment editing.

Method
AIpparel uses a novel sewing pattern tokenizer (light blue region) to tokenize each panel into a set of special tokens (light green region). Panel vertex positions and 3D transformations are incorporated using positional embeddings (colored arrows) to the tokens. AIpparel takes in multimodal inputs, such as images and texts (light orange region), to output sewing patterns using autoregressive sampling (light grey region). Finally, the output is decoded to produce simulation-ready sewing patterns (light pink region).



Citation
@article{nakayama2024aipparel,
          title={AIpparel: A Large Multimodal Generative Model for Digital Garments}, 
          author={Kiyohiro Nakayama and Jan Ackermann and Timur Levent Kesdogan 
                  and Yang Zheng and Maria Korosteleva and Olga Sorkine-Hornung and Leonidas Guibas
                  and Guandao Yang and Gordon Wetzstein},
          journal = {Arxiv},
          year={2024}
      }