ARTICLE

Revisiting the Non-Determinism of Code Generation by the GPT-3.5 Large Language Model

2025
Actes de l' "International Conference on Software Analysis, Evolution, and Reengineering (SANER)", 2016 : 1-5

Discipline : Informatique et sciences de l'information

Auteur(s) : Salimata Sawadogo, Aminata Sabané, Rodrique Kafando, Tegawendé F. Bissyandé

Auteur(s) tagués : SABANE Aminata

Renseignée par : BISSYANDE T. François D'Assise

Résumé

Despite recent advancements in Large Language Models (LLMs) for code generation, their inherent non-determinism remains a significant obstacle for reliable and reproducible software engineering research. Prior work has highlighted the high degree of variability in LLM-generated code, even when prompted with identical inputs. This non-deterministic behavior can undermine the validity of scientific conclusions drawn from LLM-based experiments. In contrast to prior research, this paper showcases the Tree of Thoughts (ToT) prompting strategy as a promising alternative for improving the predictability and quality of code generation results. By guiding the LLM through a structured Thoughts process, ToT aims to reduce the randomness inherent in the generation process and improve the consistency of the output. Our experimental results on GPT-3.5 Turbo model using 829 code generation problems from benchmarks such as CodeContests, APPS (Automated Programming Progress Standard) and HumanEval demonstrate a substantial reduction in non-determinism compared to previous findings. Specifically, we observed a significant decrease in the number of coding tasks that produced inconsistent outputs across multiple requests. Nevertheless, we show that the reduction in semantic variability was less pronounced for HumanEval (69%), indicating unique challenges present in this dataset that are not fully mitigated by ToT

Mots-clés

LLM, TOT, GPT

Retour à la liste

Revisiting the Non-Determinism of Code Generation by the GPT-3.5 Large Language Model

Résumé

Mots-clés

942

8470

49

102