Rafi Ibn Sultan

Graduate Research Assistant, Wayne State University | Detroit, Michigan, USA | CV
rafis@wayne.edu rafi.ruet13@gmail.com

Hi there! đź‘‹
I’m a Ph.D. candidate in Computer Science at Wayne State University, where I spend most of my time teaching machines how to see, segment, and reason about the world. I work at the intersection of Computer Vision and multimodal AI, building systems that connect images and language in meaningful ways.

Since joining the Trustworthy AI Lab in 2022, I’ve been exploring vision–language models and foundation models for segmentation across medical imaging, mobility infrastructure, and remote sensing. From coronary artery segmentation to pedestrian navigation systems that understand depth and spatial relationships, my work focuses on making multimodal models smarter and more grounded in reality.

Right now, I’m working on improving the spatial understanding of vision–language models at scale — pushing them beyond surface-level descriptions toward deeper spatial reasoning and more reliable logical thinking across complex scenes.

I’m particularly excited about AI for social good — building models that don’t just describe scenes, but actually reason about them and support real-world decision-making. My long-term goal is simple: bridge visual understanding with actionable intelligence.

If you’re interested in multimodal AI, spatial reasoning, or impactful applications of computer vision, let’s connect! 🚀


CONTACT:


News

Here are some announcements and news about my work:

Research

Current Research

I am a 4th-year Ph.D. candidate specializing in Computer Vision, Image Segmentation, and multimodal AI. My work focuses on the intersection of vision foundation models and large vision–language models (LVLMs) for grounded, real-world scene understanding across medical imaging and mobility-infrastructure domains.

Pedestrian Accessibility & Grounded Multimodal Reasoning:
My last research WalkGPT (Accepted in CVPR 2026), a grounded vision–language model designed for pedestrian navigation and accessibility. This system integrates segmentation, depth estimation, and language reasoning to identify sidewalks, crosswalks, curb ramps, and accessibility barriers directly from real-world pedestrian-view imagery. WalkGPT is built as a multimodal conversational agent capable of providing step-by-step, context-aware navigation guidance. Currently working on improving the spatial reasoning capabilities of multimodal vision-language models.

My earlier contribution in this domain, GeoSAM, introduced sparse- and dense-prompt fine-tuning of SAM for large-scale mobility-infrastructure segmentation using aerial and street-level imagery. GeoSAM was accepted to ECAI 2025 and has since been recognized as a benchmark model in later NeurIPS research.

Medical Image Segmentation & Multimodal Learning:
I develop foundation-model–based segmentation and VLM-driven fusion models for CT and MRI analysis. My recent work, BiPVL-Seg, proposes progressive vision–language alignment to improve organ and tumor segmentation by combining visual features with structured medical text. I also collaborate with Henry Ford Hospital on Left Anterior Descending (LAD) artery segmentation using novel encoder–decoder architectures—critical for improving treatment planning due to the LAD’s sensitivity to radiation injury. This ongoing work was accepted as an abstract at AAPM 2025.

Across both domains, my long-term goal is to build trustworthy, grounded multimodal systems that combine segmentation, depth understanding, and language reasoning to connect visual perception with actionable, real-world decision making.

Always open to collaboration and new ideas—feel free to connect! 🚀

Reviewing Experience

Work Experience

Graduate Research Assistant

Department of Computer Science
Wayne State University
Trustworthy AI Lab
(Room: 2211, Department of Computer Science, 5057 Woodward Ave)
Detroit, MI 48202
WEBSITE

(May 18, 2022 - current)

Graduate Teaching Assistant

Department of Computer Science
Wayne State University
(August 17, 2022 - May 17, 2023)

Lecturer

Department of Computer Science and Engineering
Varendra University
532, Jahangir Sarani, Talaimari
Rajshahi 6204, Bangladesh
(29 October, 2019 - 16 August, 2022)

Education

Wayne State University

Ph.D. in Computer Science
September 2022 - Current

Rajshahi University of Engineering & Technology (RUET)

Bachelor of Science in Computer Science & Engineering
April 2014 - November 2018

Rajshahi College

Higher Secondary School Certificate (HSC)
2013

Shiroil Government High School

Secondary School Certificate (SSC)
2011

Publications

Full publication list available on Google Scholar.

Peer-Reviewed Publications

Preprints & Under Review


Additional

Other than doing my work you can find me doing many things:
  • Soccer (a loyal fan of Real Madrid)
  • A beginner acoustic guitarist
  • Gaming Enthusiast (Playing Fifa from 98, A Killjoy main in Valorant, and a new CS2 player!)
  • Traveler: the goal is to visit all the 50 states!