Hive: Harnessing human feedback for instructional visual editing S Zhang, X Yang, Y Feng, C Qin, CC Chen, N Yu, Z Chen, H Wang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 84 | 2024 |
Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents Z Liu, W Yao, J Zhang, L Xue, S Heinecke, R Murthy, Y Feng, Z Chen, ... arXiv preprint arXiv:2308.05960, 2023 | 68 | 2023 |
Tackling data heterogeneity in federated learning with class prototypes Y Dai, Z Chen, J Li, S Heinecke, L Sun, R Xu Proceedings of the AAAI Conference on Artificial Intelligence 37 (6), 7314-7322, 2023 | 66 | 2023 |
Retroformer: Retrospective large language agents with policy gradient optimization W Yao, S Heinecke, JC Niebles, Z Liu, Y Feng, L Xue, R Murthy, Z Chen, ... arXiv preprint arXiv:2308.02151, 2023 | 52 | 2023 |
High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks Z Chen, S Nie, T Wu, CG Healey arXiv preprint arXiv:1801.07632 1 (4), 6, 2018 | 46 | 2018 |
xgen-mm (blip-3): A family of open large multimodal models L Xue, M Shu, A Awadalla, J Wang, A Yan, S Purushwalkam, H Zhou, ... arXiv preprint arXiv:2408.08872, 2024 | 30 | 2024 |
Gluegen: Plug and play multi-modal encoders for x-to-image generation C Qin, N Yu, C Xing, S Zhang, Z Chen, S Ermon, Y Fu, C Xiong, R Xu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 22 | 2023 |
LayoutDETR: detection transformer is a good multimodal layout designer N Yu, CC Chen, Z Chen, R Meng, G Wu, P Josel, JC Niebles, C Xiong, ... European Conference on Computer Vision, 169-187, 2025 | 10 | 2025 |
xlam: A family of large action models to empower ai agent systems J Zhang, T Lan, M Zhu, Z Liu, T Hoang, S Kokane, W Yao, J Tan, ... arXiv preprint arXiv:2409.03215, 2024 | 8 | 2024 |
Rex: Rapid exploration and exploitation for ai agents R Murthy, S Heinecke, JC Niebles, Z Liu, L Xue, W Yao, Y Feng, Z Chen, ... arXiv preprint arXiv:2307.08962, 2023 | 7 | 2023 |
Robustness evaluation of transformer-based form field extractors via form attacks L Xue, M Gao, Z Chen, C Xiong, R Xu International Conference on Document Analysis and Recognition, 167-184, 2023 | 6 | 2023 |
Burn after reading: Online adaptation for cross-domain streaming data L Yang, M Gao, Z Chen, R Xu, A Shrivastava, C Ramaiah European Conference on Computer Vision, 404-422, 2022 | 6 | 2022 |
Performance Characteristics of a Camera-Based Tangible Input Device for Manipulation of 3D Information. Z Chen, CG Healey, RS Amant Graphics Interface, 74-81, 2017 | 5 | 2017 |
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant G Sun, C Qin, J Wang, Z Chen, R Xu, Z Tao European Conference on Computer Vision, 156-172, 2025 | 4 | 2025 |
Field extraction from forms with unlabeled data M Gao, Z Chen, N Naik, K Hashimoto, C Xiong, R Xu arXiv preprint arXiv:2110.04282, 2021 | 3 | 2021 |
Large image collection visualization using perception-based similarity with color features Z Chen, CG Healey Advances in Visual Computing: 12th International Symposium, ISVC 2016, Las …, 2016 | 2 | 2016 |
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models J Zhang, L Xue, L Song, J Wang, W Huang, M Shu, A Yan, Z Ma, ... arXiv preprint arXiv:2412.07012, 2024 | 1 | 2024 |
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations C Qin, C Xia, K Ramakrishnan, M Ryoo, L Tu, Y Feng, M Shu, H Zhou, ... arXiv preprint arXiv:2408.12590, 2024 | 1 | 2024 |
Supplementary Material of “SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant” G Sun, C Qin, J Wang, Z Chen, R Xu, Z Tao | | |
BOLAA: BENCHMARKING AND ORCHESTRATING LLM AUTONOMOUS AGENTS Z Liu, W Yao, J Zhang, L Xue, S Heinecke, RN Rithesh, Y Feng, Z Chen, ... ICLR 2024 Workshop on Large Language Model (LLM) Agents, 0 | | |