support/testing: test_aichat: improve test reliability

Since llama.cpp update in Buildroot commit [1], the test_aichat can fail for several reasons: The loop checking for the llama-server availability can fail if curl succeed, but the returned json data is not formatted as expected. This can happen if the server is ready but the model is not completely loaded. In that case, the server returns: {"error":{"message":"Loading model","type":"unavailable_error","code":503}} This commit ignore Python KeyError exceptions while doing the server test, to avoid failing if this message is received. Also, this new llama-server version introduced a prompt caching, which uses too much memory. This commit completely disable this prompt caching by adding "--cache-ram 0" in the llama-server options. [1] 05c36d5d87 Signed-off-by: Julien Olivain <ju.o@free.fr>
2026-03-18 22:38:05 +01:00
parent 01a5a8be46
commit f163d20002
1 changed files with 8 additions and 3 deletions
--- a/support/testing/tests/package/test_aichat.py
+++ b/support/testing/tests/package/test_aichat.py
@@ -70,6 +70,8 @@ class TestAiChat(infra.basetest.BRTest):
        llama_opts = "--log-file /tmp/llama-server.log"
        # We set a fixed seed, to reduce variability of the test
        llama_opts += " --seed 123456789"
+        # We disable prompt caching to reduce RAM usage
+        llama_opts += " --cache-ram 0"
        llama_opts += f" --hf-repo {hf_model}"

        # We start a llama-server in background, which will expose an
@@ -91,9 +93,12 @@ class TestAiChat(infra.basetest.BRTest):
            if ret == 0:
                models_json = "".join(out)
                models = json.loads(models_json)
+                try:
                    model_name = models['models'][0]['name']
                    if model_name == hf_model:
                        break
+                except KeyError:
+                    pass
        else:
            self.fail("Timeout while waiting for llama-server.")