Software Engineering and Large Language Models

Report on Current Developments in Software Engineering and Large Language Models

General Direction of the Field

The recent advancements in the intersection of software engineering and large language models (LLMs) are pushing the boundaries of how these technologies can be applied and evaluated. The field is witnessing a shift towards more automated and scalable solutions for evaluating and improving software quality, with a particular focus on code summarization, debugging, and instruction tuning.

One of the key trends is the exploration of LLMs as automated evaluators for software engineering tasks. This approach aims to reduce the reliance on human evaluators, which can be time-consuming and prone to fatigue, by leveraging the capabilities of LLMs to assess the quality of software artifacts such as bug reports and code summaries. This shift not only promises to make evaluations more scalable but also opens up new possibilities for reproducibility and consistency in software assessment.

Another significant development is the emphasis on high-quality data for pretraining and fine-tuning LLMs, particularly in the context of code generation. Researchers are increasingly recognizing the importance of data quality in achieving state-of-the-art performance in coding benchmarks. This has led to the creation of novel data pruning strategies and the development of models that can outperform existing ones with fewer training resources.

The field is also seeing a growing interest in the practical applications of LLMs in software development, such as debugging. While the use of LLMs for debugging has shown promise, there is a need for more rigorous evaluations, especially with open-source models that can be deployed locally without violating code-sharing policies.

Finally, there is a burgeoning focus on understanding and improving the quality of code itself, through the identification and analysis of code smells in simulation modeling software. This work highlights the unique challenges and characteristics of simulation software, providing valuable insights into how code quality issues can be addressed in this specialized domain.

Noteworthy Papers

  • LLMs as Evaluators: Demonstrates the potential of LLMs to evaluate bug report summarization effectively, suggesting a scalable and less fatiguing alternative to human evaluators.

  • Arctic-SnowCoder: Introduces a data-efficient code model that achieves state-of-the-art performance, highlighting the importance of high-quality data in pretraining.

  • XCoder: Presents a novel data pruning strategy for code instruction tuning, achieving new state-of-the-art performance with fewer training data.

These papers represent significant strides in automating software evaluation, optimizing data for model training, and enhancing the practical utility of LLMs in software engineering.

Sources

LLMs as Evaluators: A Novel Approach to Evaluate Bug Report Summarization

Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries

Lecture Notes from the NaijaCoder Summer Camp

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

Debugging with Open-Source Large Language Models: An Evaluation

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

On the Prevalence, Evolution, and Impact of Code Smells in Simulation Modelling Software