Hello! đź‘‹
I’m a 4th-year Ph.D. candidate in Computer Science at Wayne State University, working at the intersection of Computer Vision, Image Segmentation, and multimodal machine learning. Since joining the Trustworthy AI Lab in 2022, my research has focused on vision–language models (VLMs) and foundation models for segmentation across medical imaging, mobility infrastructure, and remote sensing.
My recent work spans AI for social good, including mobility-infrastructure segmentation using vision foundation models and text-assisted medical image segmentation using VLMs. I also develop encoder–decoder architectures for coronary artery segmentation and grounded conversational systems for pedestrian navigation that combine segmentation, depth understanding, and language reasoning.
As a Graduate Research Assistant, I explore practical, interpretable, and trustworthy AI systems with real-world impact. My long-term goal is to build multimodal models that bridge visual understanding with actionable decision-making.
Always open to collaboration and new ideas—feel free to connect! 🚀
CONTACT:
I am a 4th-year Ph.D. candidate specializing in Computer Vision, Image Segmentation, and multimodal AI. My work focuses on the intersection of vision foundation models and large vision–language models (LVLMs) for grounded, real-world scene understanding across medical imaging and mobility-infrastructure domains.
Pedestrian Accessibility & Grounded Multimodal Reasoning:
A major part of my current research centers on WalkGPT, a grounded vision–language model designed for pedestrian navigation and accessibility. This system integrates segmentation, depth estimation, and language reasoning to identify sidewalks, crosswalks, curb ramps, and accessibility barriers directly from real-world pedestrian-view imagery. WalkGPT is built as a multimodal conversational agent capable of providing step-by-step, context-aware navigation guidance. This work is currently under review at CVPR 2026.
My earlier contribution in this domain, GeoSAM, introduced sparse- and dense-prompt fine-tuning of SAM for large-scale mobility-infrastructure segmentation using aerial and street-level imagery. GeoSAM was accepted to ECAI 2025 and has since been recognized as a benchmark model in later NeurIPS research.
Medical Image Segmentation & Multimodal Learning:
I develop foundation-model–based segmentation and VLM-driven fusion models for CT and MRI analysis. My recent work, BiPVL-Seg, proposes progressive vision–language alignment to improve organ and tumor segmentation by combining visual features with structured medical text. I also collaborate with Henry Ford Hospital on Left Anterior Descending (LAD) artery segmentation using novel encoder–decoder architectures—critical for improving treatment planning due to the LAD’s sensitivity to radiation injury. This ongoing work was accepted as an abstract at AAPM 2025.
Across both domains, my long-term goal is to build trustworthy, grounded multimodal systems that combine segmentation, depth understanding, and language reasoning to connect visual perception with actionable, real-world decision making.
Always open to collaboration and new ideas—feel free to connect! 🚀
Full publication list available on Google Scholar.
Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu, "GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation" , European Conference on Artificial Intelligence (ECAI), 2025.
Chengyin Li, Prashant Khanduri, Yao Qiang, Rafi Ibn Sultan, Indrin Chetty, Dongxiao Zhu, "Enhancing CT Image Segmentation Accuracy Through Ensemble Loss Function Optimization", Medical Physics, 2025.
Chengyin Li, Hui Zhu, Rafi Ibn Sultan, Dongxiao Zhu, "AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation", WACV, 2025.
Chengyin Li, Hui Zhu, Rafi Ibn Sultan, Dongxiao Zhu, "MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding", WACV, 2025.
Chengyin Li, Hassan Bagher-Ebadian, Rafi Ibn Sultan, Dongxiao Zhu, Indrin Chetty, "A New Architecture Combining Convolutional and Transformer-Based Networks for Automatic 3D Segmentation of Pelvic Anatomy" , Medical Physics, 2023.
Chengyin Li, Yao Qiang, Rafi Ibn Sultan, Hassan Bagher-Ebadian, Prashant Khanduri, Indrin J. Chetty, Dongxiao Zhu, "FocalUNETR: A Focal Transformer for Boundary-Aware Prostate Segmentation Using CT Images" , MICCAI, 2023.
Md. Simul Hasan Talukder, Md. Nahid Hasan, Rafi Ibn Sultan, Ajay Krishno Sarkar, Mahabubur Rahman, "An Enhanced Method for Encrypting Image and Text Data Using AES and LSB Steganography" , ICAEEE, 2022.
Rafi Ibn Sultan, Md. Nahid Hasan, Mohammad Kasedullah, "Recognition of Basic Handwritten Math Symbols Using CNN with Data Augmentation" , ICEEICT, 2021.
Md. Nahid Hasan, Rafi Ibn Sultan, Mohammad Kasedullah, "Automated Recognition of Isolated Handwritten Bangla Characters Using Deep CNN" , ISCAIE, 2021.
Md. Jamil-Ur Rahman, Rafi Ibn Sultan, Firoz Mahmud, Sazid Al Ahsan, Abdul Matin, "Automatic Detection of Invasive Ductal Carcinoma Using Convolutional Neural Networks" , TENCON, 2018.
Md. Jamil-Ur Rahman, Rafi Ibn Sultan, Firoz Mahmud, Ashadullah Shawon, Afsana Khan, "Ensemble of Multiple Models for Intelligent Heart Disease Prediction" , ICEEICT, 2018.
Rafi Ibn Sultan, et al., "BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion for Medical Image Segmentation" , arXiv:2503.23534, 2025.
Rafi Ibn Sultan, et al., "WalkGPT: Grounded Vision–Language Conversation with Depth-Aware Segmentation for Pedestrian Navigation", under review at CVPR 2026.
Rafi Ibn Sultan, et al., "A Neighborhood Attention Transformer Network for 3D LAD Artery Segmentation", under review at Medical Physics.