An exposition of the research questions behind 5uGUE

5uGUE is an LLM-driven agent that autonomously fuzzes closed-source 5G baseband firmware (the modem inside a phone) to discover crashes. This site lets interested readers inspect the evidence behind each research question, down to every individual experimental run.

Research Questionone scientific question (one axis of experimentation)

→

Experimentone configuration variance

→

Runs5 repeated fuzzing sessions; reported numbers are the mean

We study three axes, each a research question: (7.1) the effect of the agent's capability version, (7.2) generalisation across different phone modems, and (7.3) the choice of underlying LLM. Within each research question one variable changes while the others are held fixed. We repeat each experiment five times under identical configurations and analyze the aggregated results across all runs to account for variability introduced by stochastic LLM outputs and fuzzing processes. Click any experiment to expand its five individual runs, then click a run to go all the way down the rabbit hole: its turn-by-turn timeline, every test the agent generated (with the actual exploit code), and the crash signatures it triggered.

Note on crash counts

In some cases the crash string cannot be recovered. The crash is still detected, but without its string it cannot be classified or counted, so the category breakdown can sum to less than the total. These cases are flagged inline as unclassified crash. They occur in the following tests:

863ef044-53 (7.2, UNISOC UD710).
2cefbe6f-1 (7.3, Kimi K2.6).
7105d3d1-36 and 7105d3d1-98 (7.2, MediaTek Dimensity 9000).
2e28f2ea-46 (7.2, MediaTek Dimensity 9000).