runtime error
Exit code: 1. Reason: master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some( "/data", ), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-generation-inference.router", cors_allow_origin: [], api_key: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4, lora_adapters: None, usage_stats: On, payload_limit: 2000000, enable_prefill_logprobs: false, graceful_termination_timeout: 90, } Attempt 1/120 - waiting... (TGI PID: 14) [2m2026-01-10T13:36:16.668095Z[0m [33m WARN[0m [2mtext_generation_launcher::gpu[0m[2m:[0m Cannot determine GPU compute capability: ModuleNotFoundError: No module named 'torch' [2m2026-01-10T13:36:16.668122Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Using attention flashinfer - Prefix caching true [2m2026-01-10T13:36:16.668736Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Default `max_batch_prefill_tokens` to 4096 [2m2026-01-10T13:36:16.668749Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Using default cuda graphs [1, 2, 4, 8, 16, 32] [2m2026-01-10T13:36:16.668753Z[0m [33m WARN[0m [2mtext_generation_launcher[0m[2m:[0m `trust_remote_code` is set. Trusting that model `NousResearch/Hermes-3-Llama-3.1-8B` do not contain malicious code. [2m2026-01-10T13:36:16.668864Z[0m [32m INFO[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Starting check and download process for NousResearch/Hermes-3-Llama-3.1-8B [2m2026-01-10T13:36:16.670055Z[0m [31mERROR[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Permission denied (os error 13) Error: DownloadError ā TGI process died! Check logs above for errors
Container logs:
Fetching error logs...