K. Selim Engin

Home

I'm a Research Scientist at Sony US R&D Labs. I graduated with a Ph.D. in Computer Science at the University of Minnesota, working with Prof. Volkan Isler.

My research interests lie in the areas of robotics and computer vision. Particularly, I am interested in enabling autonomous agents to operate in unstructured environments. Towards this goal, I use techniques from 3D perception, motion planning, controls and deep learning in my work.

I hold a B.Sc. in Mechatronics Engineering from Sabanci University in Istanbul. Between Jan-Aug 2019, I did an internship at Samsung AI Center, and during the summer of 2021 at Sony.

For more information, please refer to my CV.

Publications

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection
H. Choi*, I. Kasahara*, S. Engin, M. Graule, N. Chavan-Dafle, V. Isler
WACV 2025
[pdf] [webpage]
Text and pose embeddings can be composed using attention masks during the diffusion process to achieve instance-level control in images.

VioLA: Aligning Videos to 2D LiDAR Scans
J.J. Chao*, S. Engin*, N. Chavan-Dafle, B. Lee, V. Isler
ICRA 2024
[pdf] [webpage]
User-captured videos can be aligned to 2D LiDAR maps obtained by mobile robots to augment the map with semantics, and register multiple videos to each other using the map as a common frame.

RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction
I. Kasahara, S. Agrawal, S. Engin, N. Chavan-Dafle, S. Song, V. Isler
ICRA 2024
[pdf] [webpage] [code]
Text-to-image models pre-trained on large corpus of data can be used to perform generalizable 3D scene reconstruction from a single view.

Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction
S. Agrawal, N. Chavan-Dafle, I. Kasahara, S. Engin, J. Huh, V. Isler
IROS 2023
[pdf] [webpage] [code]
Simultaneous multi-object scene understanding and dense grasp estimation can allow inference at real-time speed.

Physics-Aware Object Pose Refinement using Differentiable Simulation
S. Engin, B. Lee, V. Isler
CVPRW 2023 (CV4MR short paper)
Differentiable simulation coupled with rendering can be used to penalize physical artifacts in pose estimation, and refine initial coarse estimates.

Neural Optimal Control using Learned System Dynamics
S. Engin, V. Isler
ICRA 2023
[pdf] [video] [code]
Loss functions adapted from the HJB equations can be used to train neural controllers to control systems whose dynamics are learned from data.

Category-Level Global Camera Pose Estimation with Multi-Hypothesis Point Cloud Correspondences
J.J. Chao, S. Engin, N. Hani, V. Isler
ICRA 2023
[pdf]
Soft feature assignments through one-to-many matching allows for registering partial depth scans to category-level template point clouds.

Learning to Play Pursuit-Evasion with Visibility Constraints
S. Engin, Q. Jiang, V. Isler
IROS 2021
[webpage] [link]
Image-based compressed representations can be used as belief states for agents playing pursuit-evasion games with visibility constraints.

Establishing Fault-Tolerant Connectivity of Mobile Robot Networks
S. Engin, V. Isler
IEEE TCNS 2021
[link]
An approximation algorithm for establishing mobile communication networks that are robust to the removal of a given number of nodes.

Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision
N. Hani, S. Engin, J.J. Chao, V. Isler
NeurIPS 2020
[pdf] [webpage] [code] [video]
Continuous 3D object representations can be used to learn view synthesis from extreme viewpoints given a single image, even in low data regimes.

Active Localization of Multiple Targets from Noisy Relative Measurements
S. Engin, V. Isler
WAFR 2020
[pdf] [webpage] [code] [video]
An active strategy to locate targets can be learned with 2D histograms that represent the belief state.

Higher Order Function Networks for View Planning and Multi-View Reconstruction
S. Engin, E. Mitchell, D. Lee, V. Isler, D.D. Lee
ICRA 2020
[pdf] [video]
A single-view 3D reconstruction model can be used for planning the next views to refine the shape estimate from multiple viewpoints.

Higher-Order Function Networks for Learning Composable 3D Object Representations
E. Mitchell, S. Engin, V. Isler, D.D. Lee
ICLR 2020
[pdf] [webpage]
Neural networks can encode 3D shapes of objects into the parameters of other (surprisingly small) neural nets.

Asynchronous Network Formation in Unknown Unbounded Environments
S. Engin, V. Isler
ICRA 2019
[pdf] [link] [video]
An online search algorithm with bounded competitive ratio for forming the connectivity of an initially disconnected network of mobile robots.

Minimizing Movement to Establish the Connectivity of Randomly Deployed Robots
S. Engin, V. Isler
ICAPS 2018
[pdf] [link]
An algorithm with O(sqrt(n))-approx. exists to establish the connectivity of a group of n robots whose initial positions are sampled uniformly.

Tracking Wildlife with Multiple UAVs: System Design, Safety and Field Experiments
H. Bayram, N. Stefas, S. Engin, V. Isler
MRS 2017
[pdf] [link]
A group of coordinating aerial vehicles equipped with directional antennas can be used to localize radio-tagged animals.