support/testing: test_aichat: improve test reliability

Since llama.cpp update in Buildroot commit [1], the test_aichat can
fail for several reasons:

The loop checking for the llama-server availability can fail if curl
succeed, but the returned json data is not formatted as expected.
This can happen if the server is ready but the model is not completely
loaded. In that case, the server returns:

    {"error":{"message":"Loading model","type":"unavailable_error","code":503}}

This commit ignore Python KeyError exceptions while doing the
server test, to avoid failing if this message is received.

Also, this new llama-server version introduced a prompt caching, which
uses too much memory. This commit completely disable this prompt
caching by adding "--cache-ram 0" in the llama-server options.

[1] 05c36d5d87

Signed-off-by: Julien Olivain <ju.o@free.fr>
This commit is contained in:
Julien Olivain
2026-03-18 22:38:05 +01:00
parent 01a5a8be46
commit f163d20002

View File

@@ -70,6 +70,8 @@ class TestAiChat(infra.basetest.BRTest):
llama_opts = "--log-file /tmp/llama-server.log"
# We set a fixed seed, to reduce variability of the test
llama_opts += " --seed 123456789"
# We disable prompt caching to reduce RAM usage
llama_opts += " --cache-ram 0"
llama_opts += f" --hf-repo {hf_model}"
# We start a llama-server in background, which will expose an
@@ -91,9 +93,12 @@ class TestAiChat(infra.basetest.BRTest):
if ret == 0:
models_json = "".join(out)
models = json.loads(models_json)
try:
model_name = models['models'][0]['name']
if model_name == hf_model:
break
except KeyError:
pass
else:
self.fail("Timeout while waiting for llama-server.")