This is the official implementation of "Decouple and Ground: Hierarchy-Preserving Dual-Geometry Similarity for Weakly-Supervised Generalized Referring Expression Comprehension". We present the Linguistic Instance-Split Hyperbolic-Euclidean (LIHE) framework, the first weakly supervised framework for GREC: a frozen vision language model (VLM) decomposes the input into single-target phrases, each of which is grounded by a lightweight one-stage detector. To preserve both fine-grained matching and global structure, we introduce the Hyperbolic–Euclidean Mix similarity (HEMix), which combines local Euclidean similarity with global hyperbolic distance. On the GREC benchmark gRefCOCO and RefZOM, LIHE establishes the first weakly supervised baseline, achieving strong performance despite using only image-level captions. Beyond GREC, HEMix yields consistent gains over prior weakly supervised methods on REC datasets RefCOCO, RefCOCO+, and RefCOCOg, improving mAP by up to 2.5.
We opensource three code folder as follows:
✅| -- RefCLIP: This folder contains RefCLIP+HEMix.
✅| -- APL: This folder contains APL+HEMix.
✅| -- LIHE: This folder contains our LIHE.
More detail please see README.md in each folder
- Clone this repo Download the code in "https://anonymous.4open.science/r/LIHE"
# git clone https://github.com/xxx/LIHE.git
cd LIHEWe thank the authors of APL and RefCLIP for their open-source code.