Rafi Ibn Sultan

Graduate Research Assistant, Wayne State University | Detroit, Michigan, USA |
hm4013@wayne.edu rafi.ruet13@gmail.com

Hello! đź‘‹
I’m a 4th-year Ph.D. candidate in Computer Science at Wayne State University, working at the intersection of Computer Vision, Image Segmentation, and multimodal machine learning. Since joining the Trustworthy AI Lab in 2022, my research has focused on vision–language models (VLMs) and foundation models for segmentation across medical imaging, mobility infrastructure, and remote sensing.

My recent work spans AI for social good, including mobility-infrastructure segmentation using vision foundation models and text-assisted medical image segmentation using VLMs. I also develop encoder–decoder architectures for coronary artery segmentation and grounded conversational systems for pedestrian navigation that combine segmentation, depth understanding, and language reasoning.

As a Graduate Research Assistant, I explore practical, interpretable, and trustworthy AI systems with real-world impact. My long-term goal is to build multimodal models that bridge visual understanding with actionable decision-making.

Always open to collaboration and new ideas—feel free to connect! 🚀


CONTACT:


News

Here are some announcements and news about my work:
  • (7/11/2025) Excited to share that our paper "GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation" has been accepted to the 28th European Conference on Artificial Intelligence (ECAI 2025), to be held in Bologna, Italy
  • (5/16/2025) Invited to serve as a reviewer for Pattern Recognition. Third invitation from a Q1-ranked journal, following previous invitations from Biomedical Signal Processing and Control and Computer Vision and Image Understanding.
  • (4/25/2025) Our work "NA-Unetr: A Neighborhood Attention Transformer Network for Enhanced 3D Segmentation of the Left Anterior Descending Artery ", will be presented as a poster at AAPM 2025.
  • (4/1/2025) Our recent work BiPVL-Seg, a multimodal segmentation model is in arxiv.
  • (2/27/2025 - 3/3/2025) Attending and presenting our paper AutoProSAM in WACV 2025! Here is the presentation.
  • (2/11/2025) I have been invited to be a reviewer of IJCNN '25.
  • (1/17/2025) Our work GeoSAM has reached 12 citations in Google Scholar and 73 stars in GitHub.
  • (10/28/2024) Two of our papers got accepted in WACV!
  • (03/08/2024) I passed my PhD Qualifying Exam! Now I am PhD Candidate.
  • (04/03/2024) Our lab and the work GeoSAM got featured in Detroit PBS! Check it out: link.
  • (03/07/2024) I passed my qualification exam! One step closer to getting my PhD. Read my report here.

Research

Current Research

I am a 4th-year Ph.D. candidate specializing in Computer Vision, Image Segmentation, and multimodal AI. My work focuses on the intersection of vision foundation models and large vision–language models (LVLMs) for grounded, real-world scene understanding across medical imaging and mobility-infrastructure domains.

Pedestrian Accessibility & Grounded Multimodal Reasoning:
A major part of my current research centers on WalkGPT, a grounded vision–language model designed for pedestrian navigation and accessibility. This system integrates segmentation, depth estimation, and language reasoning to identify sidewalks, crosswalks, curb ramps, and accessibility barriers directly from real-world pedestrian-view imagery. WalkGPT is built as a multimodal conversational agent capable of providing step-by-step, context-aware navigation guidance. This work is currently under review at CVPR 2026.

My earlier contribution in this domain, GeoSAM, introduced sparse- and dense-prompt fine-tuning of SAM for large-scale mobility-infrastructure segmentation using aerial and street-level imagery. GeoSAM was accepted to ECAI 2025 and has since been recognized as a benchmark model in later NeurIPS research.

Medical Image Segmentation & Multimodal Learning:
I develop foundation-model–based segmentation and VLM-driven fusion models for CT and MRI analysis. My recent work, BiPVL-Seg, proposes progressive vision–language alignment to improve organ and tumor segmentation by combining visual features with structured medical text. I also collaborate with Henry Ford Hospital on Left Anterior Descending (LAD) artery segmentation using novel encoder–decoder architectures—critical for improving treatment planning due to the LAD’s sensitivity to radiation injury. This ongoing work was accepted as an abstract at AAPM 2025.

Across both domains, my long-term goal is to build trustworthy, grounded multimodal systems that combine segmentation, depth understanding, and language reasoning to connect visual perception with actionable, real-world decision making.

Always open to collaboration and new ideas—feel free to connect! 🚀

Reviewing Experience

Work Experience

Graduate Research Assistant

Department of Computer Science
Wayne State University
Trustworthy AI Lab
(Room: 2211, Department of Computer Science, 5057 Woodward Ave)
Detroit, MI 48202
WEBSITE

(May 18, 2022 - current)

Graduate Teaching Assistant

Department of Computer Science
Wayne State University
(August 17, 2022 - May 17, 2023)

Lecturer

Department of Computer Science and Engineering
Varendra University
532, Jahangir Sarani, Talaimari
Rajshahi 6204, Bangladesh
(29 October, 2019 - 16 August, 2022)

Education

Wayne State University

Ph.D. in Computer Science
September 2022 - Current

Rajshahi University of Engineering & Technology (RUET)

Bachelor of Science in Computer Science & Engineering
April 2014 - November 2018

Rajshahi College

Higher Secondary School Certificate (HSC)
2013

Shiroil Government High School

Secondary School Certificate (SSC)
2011

Publications

Full publication list available on Google Scholar.

Peer-Reviewed Publications

Preprints & Under Review


Additional

Other than doing my work you can find me doing many things:
  • Soccer (a loyal fan of Real Madrid)
  • A beginner acoustic guitarist
  • Gaming Enthusiast (Playing Fifa from 98, A Killjoy main in Valorant, and a new CS2 player!)
  • Traveler: the goal is to visit all the 50 states!