Junfei Xiao

I am a Computer Science Ph.D. student in CCVL research group at Johns Hopkins University, advised by Bloomberg Distinguished Professor Dr. Alan Yuille.

I received my M.S.E. degree in Computer Science from Johns Hopkins University. Before this, I got B.E. in Mechanical Engineering and Double Degree in Mathematics from Beihang University.

I build multi-modal generative systems for movie generation and scalable world modeling. I do product-driven research.

I am on the job market.

I host visiting undergraduates and graduate students in CCVL Lab.



CV  /  Google Scholar /  Github

profile photo
Recent News
  • [July 2025] My new work Captain Cinema on short movie generation is out! We build a state-of-the-art movie generation system through interleaved unified video generation.
  • [July 2025] The project VLV is released! It is a scalable knowledge distillation framework that builds state-of-the-art vision-language captioner with negligible cost.
  • [June 2025] The project ViGaL is released! It demonstrates that MLLMs can develop transferable reasoning skills by playing simple arcade games like Snake and achieve strong math problem solving performance.
  • [May 2025] VideoAuteur is accepted to ICCV 2025! Check out the cool demos on the project page and paper!
  • [Dec 2024] My latest work GenEx: Generating an Explorable World is out! Check out the cool demos on the project page and paper!
  • [May 2024] Starting my internship at TikTok, working on video generation with Dr. Lu Jiang!
  • [May 2024] ProLab is accepted to ECCV 2024! Check out the paper and code!
  • [April 2024] Gave a talk of PaLM2-VAdapter at Google Research. Check out the slides!
  • [Feb 2024] Please check out my latest work in Google - PaLM2-VAdapter at here!

Publications

Captain Cinema: Towards Short Movie Generation
Junfei Xiao*, Ceyuan Yang, Lvmin Zhang, Shengqu Cai, Yang Zhao, Yuwei Guo, Gordon Wetzstein, Maneesh Agrawala, Alan Yuille,
Lu Jiang
(*: Project Lead)
arXiv preprint, 2025

arXiv / project page / bibtex
@article{xiao2025captain,
  title={Captain Cinema: Towards Short Movie Generation},
  author={Xiao, Junfei and Yang, Ceyuan and Zhang, Lvmin and Cai, Shengqu and Zhao, Yang and Guo, Yuwei and Wetzstein, Gordon and Agrawala, Maneesh and Yuille, Alan and Jiang, Lu},
  journal={arXiv preprint arXiv:2507.18634},
  year={2025}
}

VLV: Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-cheng Chou, Jieneng Chen,
Alan L. Yuille, Chen Wei, Junfei Xiao*
(*: Project Lead)
arXiv preprint, 2025

arXiv / code / huggingface / dataset / bibtex
@article{zhang2025vlv,
  title={VLV: Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models},
  author={Zhang, Tiezheng and Li, Yitong and Chou, Yu-cheng and Chen, Jieneng and Yuille, Alan L and Wei, Chen and Xiao, Junfei},
  journal={arXiv preprint arXiv:2507.07104},
  year={2025}
}

Play to Generalize: Learning to Reason Through Game Play
Yunfei Xie, Yinsong Ma, Shiyi Lan, Alan Yuille,
Junfei Xiao*, Chen Wei
(*: Project Lead)
arXiv preprint, 2025

arXiv / project page / bibtex
@article{xie2025vigal,
  title={Play to Generalize: Learning to Reason Through Game Play},
  author={Xie, Yunfei and Ma, Yinsong and Lan, Shiyi and Yuille, Alan and Xiao, Junfei and Wei, Chen},
  journal={arXiv preprint arXiv:2506.08011},
  year={2025}
}

VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang
ICCV, 2025

paper / project page / bibtex
@article{xiao2024narrative,
  title={Towards Long Narrative Video Generation},
  author={Xiao, Junfei and Cheng, Feng and Qi, Lu and Gui, Liangke and Cen, Jiepeng and Ma, Zhibei and Yuille, Alan and Jiang, Lu},
  journal={arXiv preprint},
  year={2024}
}

GenEx: Generating an Explorable World
Taiming Lu*, Tianmin Shu*, Junfei Xiao*, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen
(*: Core Contributors)
Tech Report, 2024

arXiv / project page / bibtex
@article{lu2024genex,
  title={GenEx: Generating an Explorable World},
  author={Lu, Taiming and Shu, Tianmin and Xiao, Junfei and Ye, Luoxin and Wang, Jiahao and Peng, Cheng and Wei, Chen and Khashabi, Daniel and Chellappa, Rama and Yuille, Alan and Chen, Jieneng},
  journal={arXiv preprint arXiv:2412.09624},
  year={2024}
}

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang
arXiv preprint, 2024

arXiv / slides / bibtex
                    @article{palm2vadapter2024,
                      title={PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter},
                      author={Xiao, Junfei and Xu, Zheng and Yuille, Alan and Yan, Shen and Wang, Boyu},
                      journal={arXiv preprint arXiv:2402.10896},
                      year={2024},
                    }

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu,
Bingchen Zhao, Alan Yuille, Yuyin Zhou, Cihang Xie
ECCV , 2024

arXiv / code / bibtex
                    @article{xiao2023semantic,
                      author    = {Xiao, Junfei and Zhou, Ziqi and Li, Wenxuan and Lan, Shiyi and Mei, Jieru and Yu, Zhiding and Yuille, Alan and Zhou, Yuyin and Xie, Cihang},
                      title     = {A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties},
                      journal   = {arXiv preprint arXiv:2312.13764},
                      year      = {2023},
                    }

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
Jie Liu, Yixiao Zhang, Jieneng Chen, Junfei Xiao, Yongyi Lu, Bennett A. Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou
ICCV, 2023

arXiv / code / bibtex
                  @inproceedings{liu2023clip,
                    title={Clip-driven universal model for organ segmentation and tumor detection},
                    author={Liu, Jie and Zhang, Yixiao and Chen, Jie-Neng and Xiao, Junfei and Lu, Yongyi and A Landman, Bennett and Yuan, Yixuan and Yuille, Alan and Tang, Yucheng and Zhou, Zongwei},
                    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
                    pages={21152--21164},
                    year={2023}
                  }

Label-Free Liver Tumor Segmentation
Qixin Hu, Yixiong Chen, Junfei Xiao, Shuwen Sun, Jie-Neng Chen, Alan Yuille, Zongwei Zhou
CVPR, 2023

arXiv / code / bibtex
                  @article{hu2023labelfree,
                    title={Label-Free Liver Tumor Segmentation},
                    author={Hu, Qixin and Chen, Yixiong and Xiao, Junfei  and Sun, Shuwen and Chen, Jie-Neng and Yuille, Alan and Zhou, Zongwei},
                    journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
                    year={2022}
                  }

Masked Autoencoders Enable Efficient Knowledge Distillers
Yutong Bai, Zeyu Wang, Junfei Xiao*, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
(*: Technical Lead)
CVPR, 2023

arXiv / code / bibtex
                  @article{bai2022masked,
                    title={Masked autoencoders enable efficient knowledge distillers},
                    author={Bai, Yutong and Wang, Zeyu and Xiao, Junfei and Wei, Chen and Wang, Huiyu and Yuille, Alan and Zhou, Yuyin and Xie, Cihang},
                    journal={arXiv preprint arXiv:2208.12256},
                    year={2022}
                  }

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
WACV, 2023

arXiv / code / bibtex
@article{xiao2022delving,
  title={Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification},
  author={Xiao, Junfei and Bai, Yutong and Yuille, Alan and Zhou, Zongwei},
  journal={arXiv preprint arXiv:2210.12843},
  year={2022}
}

Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao, Longlong Jing, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, Yingwei Li
CVPR, 2022

arXiv / code / bibtex
@inproceedings{xiao2022learning,
  title={Learning from temporal gradient for semi-supervised action recognition},
  author={Xiao, Junfei and Jing, Longlong and Zhang, Lin and He, Ju and She, Qi and Zhou, Zongwei and Yuille, Alan and Li, Yingwei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3252--3262},
  year={2022}
}

CateNorm: Categorical Normalization for Robust Medical Image Segmentation
Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou
MICCAI Workshop - Domain Adaptation and Representation Transfer , 2022
Best Paper Honorable Mention

arXiv / code / bibtex
@inproceedings{xiao2022catenorm,
  title={CateNorm: Categorical Normalization for Robust Medical Image Segmentation},
  author={Xiao, Junfei and Yu, Lequan and Zhou, Zongwei and Bai, Yutong and Xing, Lei and Yuille, Alan and Zhou, Yuyin},
  booktitle={MICCAI Workshop on Domain Adaptation and Representation Transfer},
  pages={129--146},
  year={2022},
  organization={Springer}
}

A bio-robotic remora disc with attachment and detachment capabilities for reversible underwater hitchhiking
Siqi Wang, Lei Li, Yufeng Chen, Yueping Wang, Wenguang Sun, Junfei Xiao, Dylan Wainwright, Tianmiao Wang, Robert J. Wood, Li Wen
ICRA, 2019

arXiv / bibtex
@inproceedings{wang2019bio,
  title={A bio-robotic remora disc with attachment and detachment capabilities for reversible underwater hitchhiking},
  author={Wang, Siqi and Li, Lei and Chen, Yufeng and Wang, Yueping and Sun, Wenguang and Xiao, Junfei and Wainwright, Dylan and Wang, Tianmiao and Wood, Robert J and Wen, Li},
  booktitle={2019 International Conference on Robotics and Automation (ICRA)},
  pages={4653--4659},
  year={2019},
  organization={IEEE}
}


Honors and Awards

1st Place in the Robust Vision Challenge - Semantic Segmentation Track



Webpage template borrowed from Jon Barron