The field of multimodal large language models (MLLMs) is moving towards improving their reasoning capabilities, particularly in visual text grounding, ordinal understanding, and multimodal explanation. Researchers are exploring novel approaches, such as introducing visual keypoints, Chain-of-Thought (CoT) distillation, and hybrid optimization strategies, to enhance the performance of MLLMs. Noteworthy papers include OrderChain, which presents a prompting paradigm to improve ordinal understanding ability, and Skywork R1V, which introduces a multimodal reasoning model with an efficient multimodal transfer method. Additionally, benchmarks such as MDK12-Bench and V-MAGE are being developed to evaluate the reasoning capabilities of MLLMs in various domains.