Enhancing Software Engineering with Personalized and Proactive LLM Applications

The integration of Large Language Models (LLMs) into software engineering practices continues to evolve, with a particular focus on enhancing productivity and reducing cognitive strain. Recent advancements highlight the need for more nuanced approaches to leveraging LLMs, particularly in tasks such as code review, repository mining, and collaborative problem solving. Innovative methods are being developed to fine-tune and personalize LLM outputs, ensuring they meet specific developer needs and improve the accuracy of tasks like code readability evaluation. Additionally, there is a growing emphasis on detecting and mitigating inconsistencies in API documentation through advanced symbolic execution and LLM-assisted analysis. Noteworthy is the exploration of generative AI in root cause analysis for legacy systems, offering a proactive approach to incident resolution. However, challenges remain in ensuring the reliability and cost-effectiveness of LLM applications, with ongoing research needed to address issues such as hallucinations and model biases. Overall, the field is moving towards more personalized, accurate, and proactive uses of LLMs, with a strong focus on improving human-AI interaction and system reliability.

Sources

LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering

Experiences from Using LLMs for Repository Mining Studies in Empirical Software Engineering

Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation

Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT

Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering

Detecting Multi-Parameter Constraint Inconsistencies in Python Data Science Libraries

Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Breaking the Cycle of Recurring Failures: Applying Generative AI to Root Cause Analysis in Legacy Banking Systems

Utilizing Large Language Models to Synthesize Product Desirability Datasets

SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs

Evidence is All We Need: Do Self-Admitted Technical Debts Impact Method-Level Maintenance?

FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs