The recent advancements in the modeling of temporal and spatial data have seen significant innovations, particularly in the integration of deep learning methodologies with point process theory. Researchers are increasingly focusing on developing generative models that can flexibly capture the complex interdependencies between timestamps and event types, as well as the dynamics of gaze deployment in space and time. These models are proving to be highly effective in various applications, from decoding reading goals from eye movements to generating spatial and spatiotemporal point processes. Notably, the introduction of diffusion-based latent variable models for point processes has unlocked new levels of efficiency and flexibility, enabling faster sampling and more accurate conditional generation. Additionally, the integration of neural temporal point processes with gaze dynamics modeling has shown superior performance in predicting both spatial and temporal aspects of visual scanpaths. These developments collectively push the boundaries of what is possible in the analysis and generation of complex, time-dependent data.