Follow
Sihan Chen
Title
Cited by
Cited by
Year
Cptr: Full transformer network for image captioning
W Liu, S Chen, L Guo, X Zhu, J Liu
arXiv preprint arXiv:2101.10804, 2021
1652021
Valor: Vision-audio-language omni-perception pretraining model and dataset
S Chen, X He, L Guo, X Zhu, W Wang, J Tang, J Liu
arXiv preprint arXiv:2304.08345, 2023
552023
Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset
S Chen, H Li, Q Wang, Z Zhao, M Sun, X Zhu, J Liu
Advances in Neural Information Processing Systems 36, 2024
352024
Chatbridge: Bridging modalities with large language model as a language catalyst
Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu
arXiv preprint arXiv:2305.16103, 2023
302023
Global-local propagation network for RGB-D semantic segmentation
S Chen, X Zhu, W Liu, X He, J Liu
arXiv preprint arXiv:2101.10801, 2021
192021
Vlab: Enhancing video language pre-training by feature adapting and blending
X He, S Chen, F Ma, Z Huang, X Jin, Z Liu, D Fu, Y Yang, J Liu, J Feng
arXiv preprint arXiv:2305.13167, 2023
182023
Vl-mamba: Exploring state space models for multimodal learning
Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu
arXiv preprint arXiv:2403.13600, 2024
82024
Mm21 pre-training for video understanding challenge: Video captioning with pretraining techniques
S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu
Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021
52021
Cosa: Concatenated sample pretrained vision-language foundation model
S Chen, X He, H Li, X Jin, J Feng, J Liu
The Twelfth International Conference on Learning Representations, 2023
32023
Sounding video generator: A unified framework for text-guided sounding video generation
J Liu, W Wang, S Chen, X Zhu, J Liu
IEEE Transactions on Multimedia, 2023
32023
GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
M Sun, W Wang, Z Qin, J Sun, S Chen, J Liu
Advances in Neural Information Processing Systems 36, 2024
22024
Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
Y Yan, X He, S Chen, J Liu
arXiv preprint arXiv:2404.08281, 2024
2024
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Z Liu, S Chen, L Guo, H Li, X He, J Liu
Proceedings of the 31st ACM International Conference on Multimedia, 5120-5131, 2023
2023
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Y Yan, X He, W Wang, S Chen, J Liu
arXiv preprint arXiv:2308.09779, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–14