Détails Publication
Revisiting the Non-Determinism of Code Generation by the GPT-3.5 Large Language Model,
Lien de l'article:
Discipline: Informatique et sciences de l'information
Auteur(s): Salimata Sawadogo, Aminata Sabané, Rodrique Kafando, Tegawendé F. Bissyandé
Auteur(s) tagués: SABANE Aminata
Renseignée par : BISSYANDE T. François D'Assise
Résumé

Despite recent advancements in Large Language Models (LLMs) for code generation, their inherent non-determinism remains a significant obstacle for reliable and reproducible software engineering research. Prior work has highlighted the high degree of variability in LLM-generated code, even when prompted with identical inputs. This non-deterministic behavior can undermine the validity of scientific conclusions drawn from LLM-based experiments. In contrast to prior research, this paper showcases the Tree of Thoughts (ToT) prompting strategy as a promising alternative for improving the predictability and quality of code generation results. By guiding the LLM through a structured Thoughts process, ToT aims to reduce the randomness inherent in the generation process and improve the consistency of the output. Our experimental results on GPT-3.5 Turbo model using 829 code generation problems from benchmarks such as CodeContests, APPS (Automated Programming Progress Standard) and HumanEval demonstrate a substantial reduction in non-determinism compared to previous findings. Specifically, we observed a significant decrease in the number of coding tasks that produced inconsistent outputs across multiple requests. Nevertheless, we show that the reduction in semantic variability was less pronounced for HumanEval (69%), indicating unique challenges present in this dataset that are not fully mitigated by ToT

Mots-clés

LLM, TOT, GPT

937
Enseignants
8045
Publications
49
Laboratoires
101
Projets