📝 Selected Publications
For a complete list of publications, please visit my Google Scholar
🔮 Research Interest 1: Uncovering NLP & LLM Internal Mechanism and Interpretability
ZeroTuning: Unlocking the Initial Token’s Power to Enhance Large Language Models Without Training
Feijiang Han, Xiaodong Yu, Jianheng Tang, Delip Rao, Weihua Du, Lyle Ungar
Paper | Code & Demo | Blog | Poster
Key Points:
- Novel training-free optimization via initial token attention steering, supporting both supervised and unsupervised calibrations
- Lightweight implementation (four lines of code modification) achieves substantial gains: 19.9% on classification, 4.5% on QA, and 2.1% on multi-turn dialogue
- Explains why this method works through: (1) theoretical analysis; (2) output entropy and accuracy analysis; (3) error pattern analysis; (4) fine-grained layer/head analysis
Read Before You Think: Mitigating LLM Comprehension Failures with Step-by-Step Reading
Feijiang Han, Hengtao Cui, Licheng Guo, Zelong Wang, Zhiyuan Lyu
Key Points:
- Identified Semantic Misunderstanding as the core bottleneck in LLMs reasoning even with strong methods like CoT
- Designed SSR Series to resolve this issue by: (1) applying step-by-step reading logic (SSR), (2) enforcing attention on key tokens via self-reference (SSR+), and (3) resolving backward dependencies through iterative re-contextualization (SSR++)
🔍 Research Interest 2: Domain-Adapted Language Models for Code, Document, and Scientific Automation
Feijiang Han, Jiaming Zhang, Chuyi Deng, Jianheng Tang, Yunhuai Liu
Key Points:
- First comprehensive study of LLMs’ capabilities in WebShell detection
- Novel BFAD framework improves LLM detection by 13.82% through function-aware analysis
- Enables both large and small LLMs to outperform traditional SOTA methods
LaTeX2Layout: High-Fidelity, Scalable Document Layout Annotation Pipeline for Layout Detection
Feijiang Han, Zelong Wang, Bowen Wang, Xinxin Liu, Skyler Cheung, Delip Rao, Chris Callison-Burch, Lyle Ungar
[Paper] | [Code & Dataset] (Coming Soon)
Key Points:
- Novel pipeline extracting PDF layout information directly from LaTeX compilation (
No Human annotations and PDF Parsers) - Custom LaTeX packages for precise element tracking and accurate layout extraction
- 200% relative improvement over zero-shot baselines through curriculum learning and synthetic data augmentation
Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification
Feijiang Han
[Paper] (Coming Soon)
Key Points:
- First systematic study automating WebShell family classification through representation learning
- Novel dynamic function call trace extraction and LLM-based synthetic trace generation for behavioral analysis
- Comprehensive evaluation of representation methods (sequence, graph, and tree-based models) across multiple datasets with practical insights for optimal model selection
🌟 Research Interest 3: Other Topics (HCI, Big Data Visualization, IoT, Federated and Continual Learning)
Credit and quality intelligent learning based multi-armed bandit scheme for unknown worker selection in multimedia MCS
Jianheng Tang, Feijiang Han, Kejia Fan, et al.
Key Points:
- Novel Credit and Quality Learning based Multi-Armed Bandit (CQL-MAB) scheme for solving the Post-Unknown Worker Recruitment problem in MCS
- Integrates credit identification and quality calculation for worker selection
- Theoretically proven truthfulness and efficiency in reverse auction settings
-
UBICOMP 2025CALM: A Ubiquitous Crowdsourced Analytic Learning Mechanism for Continual Service Construction with Data Privacy Preservation
Kejia Fan, Yuwei Huang, Jiayi He, Feijiang Han, Jianheng Tang, et al. -
arXiv 2025APFL: Analytic Personalized Federated Learning via Dual-Stream Least Squares
Kejia Fan, Jianheng Tang, Zixuan Yang, Feijiang Han, Jiayi Li, et al. -
arXiv 2025ACU: Analytic Continual Unlearning for Efficient and Exact Forgetting with Privacy Preservation
Jianheng Tang, Haotian Zhuang, Dongxiao Fang, Jiayi Li, Feijiang Han, et al. -
Information Sciences 2024MAB-RP: A Multi-Armed Bandit based workers selection scheme for accurate data collection in crowdsensing
Yuwei Lou, Jianheng Tang, Feijiang Han, Anfeng Liu, et al. -
Information and Software Technology 2024Fctree: Visualization of function calls in execution
Fei Zhou, Yifan Fan, Shengchao Lv, Lingxiao Jiang, Zhuo Chen, Jingui Yuan, Feijiang Han, et al. -
IEEE IoT Journal 2023CRL-MABA: a completion rate learning-based accurate data collection scheme in large-scale energy internet
Kejia Fan, Jianheng Tang, Wenbin Xie, Feijiang Han, Yuwei Huang, et al. -
IEEE IoT Journal 2023BTV-CMAB: A bi-directional trust verification-based combinatorial multiarmed bandit scheme for mobile crowdsourcing
Jianheng Tang, Kejia Fan, Wenbin Xie, Feijiang Han, et al. -
Computer Communications 2023A Semi-supervised Sensing Rate Learning based CMAB scheme to combat COVID-19 by trustful data collection in the crowd
Jianheng Tang, Kejia Fan, Wenbin Xie, Lingxiao Zeng, Feijiang Han, et al.





