AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

6 views

ServiceNow

3 weeks ago

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding