Lin Huang

I am currently a Ph.D. student at University at Buffalo, advised by Prof. Junsong Yuan. I received my M.S. from the University of Southern California. Prior to that, I was an undergrad at Lanzhou University.

I am broadly interested in computer vision and machine learning. My current research focuses on pose estimation, 3D reconstruction, human behavior analysis, and human-computer interaction.

E-mail  /  CV  /  Google Scholar  /  GitHub

My picture

Research Projects


[New] HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation
Lin Huang, Jianchao Tan, Jingjing Meng, Ji Liu, Junsong Yuan
ACM MM, 2020

PDF  /  Abstract  /  Presentation  /  Bibtex
As we use our hands frequently in daily activities, the analysis of hand-object interactions plays a critical role to many multimedia understanding and interaction applications. Different from conventional 3D hand-only and object-only pose estimation, estimating 3D hand-object pose is more challenging due to the mutual occlusions between hand and object, as well as the physical constraints between them. To overcome these issues, we propose to fully utilize the structural correlations among hand joints and object corners in order to obtain more reliable poses. Our work is inspired by structured output learning models in sequence transduction field like Transformer encoder-decoder framework. Besides modeling inherent dependencies from extracted 2D hand-object pose, our proposed Hand-Object Transformer Network (HOT-Net) also captures the structural correlations among 3D hand joints and object corners. Similar to Transformer’s autoregressive decoder, by considering structured output patterns, this helps better constrain the output space and leads to more robust pose estimation. However, different from Transformer’s sequential modeling mechanism, HOT-Net adopts a novel non-autoregressive decoding strategy for 3D hand-object pose estimation. Specifically, our model removes the Transformer’s dependence on previously generated results and explicitly feeds a reference 3D hand-object pose into the decoding process to provide equivalent target pose patterns for parallely localizing each 3D keypoint. To further improve physical validity of estimated hand pose, besides anatomical constraints, we propose a cooperative pose constraint, aiming to enable the hand pose to cooperate with hand shape, to generate hand mesh. We demonstrate real-time speed and state-of-the-art performance on benchmark hand-object datasets for both 3D hand and object poses.
  title={HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation},
  author={Huang, Lin and Tan, Jianchao and Meng, Jingjing and Liu, Ji and Yuan, Junsong},
  booktitle={Proceedings of the 28th ACM International Conference on Multimedia},

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation
Lin Huang, Jianchao Tan, Ji Liu, Junsong Yuan
ECCV, 2020

PDF  /  Abstract  /  Video  /  Bibtex
3D hand pose estimation is still far from a well-solved problem mainly due to the highly nonlinear dynamics of hand pose and the difficulties of modeling its inherent structural dependencies. To address this issue, we connect this structured output learning problem with the structured modeling framework in sequence transduction field. Standard transduction models like Transformer adopt an autoregressive connection to capture dependencies from previously generated tokens and further correlate this information with the input sequence in order to prioritize the set of relevant input tokens for current token generation. To borrow wisdom from this structured learning framework while avoiding the sequential modeling for hand pose, taking a 3D point set as input, we propose to leverage the Transformer architecture with a novel non-autoregressive structured decoding mechanism. Specifically, instead of using previously generated results, our decoder utilizes a reference hand pose to provide equivalent dependencies among hand joints for each output joint generation. By imposing the reference structural dependencies, we can correlate the information with the input 3D points through a multi-head attention mechanism, aiming to discover informative points from different perspectives, towards each hand joint localization. We demonstrate our model’s effectiveness over multiple challenging hand pose datasets, comparing with several state-of-the-art methods.
  title={Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation},
  author={Huang, Lin and Tan, Jianchao and Liu, Ji and Yuan, Junsong},
  booktitle={European Conference on Computer Vision},

Learning Progressive Joint Propagation for Human Motion Prediction
Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann
ECCV, 2020

PDF  /  Abstract  /  Bibtex
Despite the great progress in human motion prediction, it remains a challenging task due to the complicated structural dynamics of human behaviors. In this paper, we address this problem in three aspects. First, to capture the long-range spatial correlations and temporal dependencies, we apply a transformer-based architecture with the global attention mechanism. Specifically, we feed the network with the sequential joints encoded with the temporal information for spatial and temporal explorations. Second, to further exploit the inherent kinematic chains for better 3D structures, we apply a progressive-decoding strategy, which performs in a central-to-peripheral extension according to the structural connectivity. Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions. We evaluate the proposed method on two challenging benchmark datasets (Human3.6M and CMU-Mocap). Experimental results show our superior performance compared with the state-of-the-art approaches.
  title={Learning progressive joint propagation for human motion prediction},
  author={Cai, Yujun and Huang, Lin and Wang, Yiwei and others},
  booktitle={European Conference on Computer Vision},


Simple Demo for Hand PointNet-based Real-Time 3D Hand Pose Estimation
Lin Huang, Pranav Sankhe, Yiheng Li



Research Intern: Y-tech Lab, Kwai Inc., Seattle, USA
May. 2020 - Aug. 2020. Supervisor: Dr. Jianchao Tan and Dr. Ji Liu


Reviewer: CVPR, ICCV, TIP


TA: Introduction to Pattern Recognition (CSE555)
Fall 2020

TA: Data Intensive Computing (CSE587)
Spring 2020

Website Template