执行 bash run.sh 3 3 一直出现这个问题，安装matcha-tts，但是docker是3.12  跟matcha有冲突安装不上

root@jdev-3090:/mnt/runtime/triton_trtllm# bash run.sh 3 3
Starting Triton server
I0130 06:34:46.490943 7929 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7fbb44000000' with size 268435456"
I0130 06:34:46.493070 7929 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0130 06:34:46.497172 7929 model_lifecycle.cc:473] "loading: audio_tokenizer:1"
I0130 06:34:46.497192 7929 model_lifecycle.cc:473] "loading: cosyvoice2:1"
I0130 06:34:46.497201 7929 model_lifecycle.cc:473] "loading: speaker_embedding:1"
I0130 06:34:46.497221 7929 model_lifecycle.cc:473] "loading: tensorrt_llm:1"
I0130 06:34:46.497230 7929 model_lifecycle.cc:473] "loading: token2wav:1"
I0130 06:34:47.470954 7957 pb_stub.cc:320]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'cosyvoice'

At:
  /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/speaker_embedding/1/model.py(35): <module>
  <frozen importlib._bootstrap>(488): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(995): exec_module
  <frozen importlib._bootstrap>(950): _load_unlocked
  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1360): _find_and_load

E0130 06:34:47.479864 7929 model_lifecycle.cc:654] "failed to load 'speaker_embedding' version 1: Internal: ModuleNotFoundError: No module named 'cosyvoice'\n\nAt:\n  /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/speaker_embedding/1/model.py(35): <module>\n  <frozen importlib._bootstrap>(488): _call_with_frames_removed\n  <frozen importlib._bootstrap_external>(995): exec_module\n  <frozen importlib._bootstrap>(950): _load_unlocked\n  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked\n  <frozen importlib._bootstrap>(1360): _find_and_load\n"
I0130 06:34:47.479899 7929 model_lifecycle.cc:789] "failed to load 'speaker_embedding'"
I0130 06:34:47.614020 7929 libtensorrtllm.cc:55] "TRITONBACKEND_Initialize: tensorrtllm"
I0130 06:34:47.614039 7929 libtensorrtllm.cc:62] "Triton TRITONBACKEND API version: 1.19"
I0130 06:34:47.614042 7929 libtensorrtllm.cc:66] "'tensorrtllm' TRITONBACKEND API version: 1.19"
I0130 06:34:47.614044 7929 libtensorrtllm.cc:86] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
[TensorRT-LLM][WARNING] gpu_device_ids is not specified, will be automatically set
[TensorRT-LLM][WARNING] participant_ids is not specified, will be automatically set
I0130 06:34:47.620198 7929 libtensorrtllm.cc:114] "TRITONBACKEND_ModelInitialize: tensorrt_llm (version 1)"
[TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000
[TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0
[TensorRT-LLM][WARNING] normalize_log_probs is not specified, will be set to true
[TensorRT-LLM][WARNING] cross_kv_cache_fraction is not specified, error if it's encoder-decoder model, otherwise ok
[TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0
[TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true
[TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value
[TensorRT-LLM][WARNING] enable_chunked_context is not specified, will be set to false.
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64
[TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8
[TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05
[TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB
[TensorRT-LLM][INFO] num_nodes is not specified, will be set to 1
[TensorRT-LLM][WARNING] multi_block_mode is not specified, will be set to true
[TensorRT-LLM][WARNING] enable_context_fmha_fp32_acc is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_mode is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_cache_size is not specified, will be set to 0
[TensorRT-LLM][INFO] speculative_decoding_fast_logits is not specified, will be set to false
[TensorRT-LLM][WARNING] decoding_mode parameter is invalid or not specified(must be one of the {top_k, top_p, top_k_top_p, beam_search, medusa, redrafter, lookahead, eagle}).Using default: top_k_top_p if max_beam_width == 1, beam_search otherwise
[TensorRT-LLM][WARNING] gpu_weights_percent parameter is not specified, will use default value of 1.0
[TensorRT-LLM][INFO] recv_poll_period_ms is not set, will use busy loop
[TensorRT-LLM][WARNING] encoder_model_path is not specified, will be left empty
[TensorRT-LLM][INFO] Engine version 0.20.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 16
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 16
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 32768
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (2560) * 24
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 32768
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 32767 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens).
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Loaded engine size: 1215 MiB
[TensorRT-LLM][INFO] Engine load time 589 ms
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1024.03 MiB for execution context memory.
[TensorRT-LLM][INFO] gatherContextLogits: 0
[TensorRT-LLM][INFO] gatherGenerationLogits: 0
I0130 06:34:48.374294 7954 pb_stub.cc:320]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'matcha'

At:
  /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/cosyvoice2/1/model.py(42): <module>
  <frozen importlib._bootstrap>(488): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(995): exec_module
  <frozen importlib._bootstrap>(950): _load_unlocked
  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1360): _find_and_load

E0130 06:34:48.385283 7929 model_lifecycle.cc:654] "failed to load 'cosyvoice2' version 1: Internal: ModuleNotFoundError: No module named 'matcha'\n\nAt:\n  /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/cosyvoice2/1/model.py(42): <module>\n  <frozen importlib._bootstrap>(488): _call_with_frames_removed\n  <frozen importlib._bootstrap_external>(995): exec_module\n  <frozen importlib._bootstrap>(950): _load_unlocked\n  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked\n  <frozen importlib._bootstrap>(1360): _find_and_load\n"
I0130 06:34:48.385305 7929 model_lifecycle.cc:789] "failed to load 'cosyvoice2'"
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1209 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 13.87 MB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 46.10 MB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 23.57 GiB, available: 20.74 GiB, extraCostMemory: 0.00 GiB
[TensorRT-LLM][WARNING] Both freeGpuMemoryFraction (aka kv_cache_free_gpu_mem_fraction) and maxTokens (aka max_tokens_in_paged_kv_cache) are set (to 0.500000 and 2560, respectively). The smaller value will be used.
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 80
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] before Create KVCacheManager cacheTransPreAllocaSize:0
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 1024 [window size=2560]
[TensorRT-LLM][INFO] Number of tokens per block: 32.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 0.03 GiB for max tokens in paged KV cache (2560).
[TensorRT-LLM][WARNING] cancellation_check_period_ms is not specified, will be set to 100 (ms)
[TensorRT-LLM][WARNING] stats_check_period_ms is not specified, will be set to 100 (ms)
I0130 06:34:48.517434 7929 libtensorrtllm.cc:184] "TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_0_0"
I0130 06:34:48.517569 7929 model_lifecycle.cc:849] "successfully loaded 'tensorrt_llm'"
I0130 06:34:48.553925 8103 pb_stub.cc:320]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'cosyvoice'

At:
  /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/token2wav/1/model.py(39): <module>
  <frozen importlib._bootstrap>(488): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(995): exec_module
  <frozen importlib._bootstrap>(950): _load_unlocked
  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1360): _find_and_load

E0130 06:34:48.562428 7929 model_lifecycle.cc:654] "failed to load 'token2wav' version 1: Internal: ModuleNotFoundError: No module named 'cosyvoice'\n\nAt:\n  /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/token2wav/1/model.py(39): <module>\n  <frozen importlib._bootstrap>(488): _call_with_frames_removed\n  <frozen importlib._bootstrap_external>(995): exec_module\n  <frozen importlib._bootstrap>(950): _load_unlocked\n  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked\n  <frozen importlib._bootstrap>(1360): _find_and_load\n"
I0130 06:34:48.562449 7929 model_lifecycle.cc:789] "failed to load 'token2wav'"
I0130 06:34:48.872798 7929 python_be.cc:2289] "TRITONBACKEND_ModelInstanceInitialize: audio_tokenizer_0_0 (CPU device 0)"
I0130 06:34:50.673777 7929 model_lifecycle.cc:849] "successfully loaded 'audio_tokenizer'"
I0130 06:34:50.673925 7929 server.cc:611] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0130 06:34:50.673944 7929 server.cc:638] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python      | /opt/tritonserver/backends/python/libtriton_python.so           | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0130 06:34:50.673974 7929 server.cc:681] 
+-------------------+---------+-----------------------------------------------------------------------------------------------+
| Model             | Version | Status                                                                                        |
+-------------------+---------+-----------------------------------------------------------------------------------------------+
| audio_tokenizer   | 1       | READY                                                                                         |
| cosyvoice2        | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'matcha'                          |
|                   |         |                                                                                               |
|                   |         | At:                                                                                           |
|                   |         |   /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/cosyvoice2/1/model.py(42): <module>        |
|                   |         |   <frozen importlib._bootstrap>(488): _call_with_frames_removed                               |
|                   |         |   <frozen importlib._bootstrap_external>(995): exec_module                                    |
|                   |         |   <frozen importlib._bootstrap>(950): _load_unlocked                                          |
|                   |         |   <frozen importlib._bootstrap>(1334): _find_and_load_unlocked                                |
|                   |         |   <frozen importlib._bootstrap>(1360): _find_and_load                                         |
| speaker_embedding | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'cosyvoice'                       |
|                   |         |                                                                                               |
|                   |         | At:                                                                                           |
|                   |         |   /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/speaker_embedding/1/model.py(35): <module> |
|                   |         |   <frozen importlib._bootstrap>(488): _call_with_frames_removed                               |
|                   |         |   <frozen importlib._bootstrap_external>(995): exec_module                                    |
|                   |         |   <frozen importlib._bootstrap>(950): _load_unlocked                                          |
|                   |         |   <frozen importlib._bootstrap>(1334): _find_and_load_unlocked                                |
|                   |         |   <frozen importlib._bootstrap>(1360): _find_and_load                                         |
| tensorrt_llm      | 1       | READY                                                                                         |
| token2wav         | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'cosyvoice'                       |
|                   |         |                                                                                               |
|                   |         | At:                                                                                           |
|                   |         |   /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/token2wav/1/model.py(39): <module>         |
|                   |         |   <frozen importlib._bootstrap>(488): _call_with_frames_removed                               |
|                   |         |   <frozen importlib._bootstrap_external>(995): exec_module                                    |
|                   |         |   <frozen importlib._bootstrap>(950): _load_unlocked                                          |
|                   |         |   <frozen importlib._bootstrap>(1334): _find_and_load_unlocked                                |
|                   |         |   <frozen importlib._bootstrap>(1360): _find_and_load                                         |
+-------------------+---------+-----------------------------------------------------------------------------------------------+

I0130 06:34:50.730851 7929 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090"
I0130 06:34:50.733660 7929 metrics.cc:783] "Collecting CPU metrics"
I0130 06:34:50.733709 7929 tritonserver.cc:2598] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.59.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | ./model_repo_cosyvoice2                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0130 06:34:50.733754 7929 server.cc:312] "Waiting for in-flight requests to complete."
I0130 06:34:50.733758 7929 server.cc:328] "Timeout 30: Found 0 model versions that have in-flight inferences"
I0130 06:34:50.733919 7929 server.cc:343] "All models are stopped, unloading models"
I0130 06:34:50.733922 7929 server.cc:352] "Timeout 30: Found 2 live models and 0 in-flight non-inference requests"
[TensorRT-LLM][INFO] Refreshed the MPI local session
I0130 06:34:50.867384 7929 model_lifecycle.cc:636] "successfully unloaded 'tensorrt_llm' version 1"
I0130 06:34:51.734022 7929 server.cc:352] "Timeout 29: Found 1 live models and 0 in-flight non-inference requests"
I0130 06:34:52.104783 7929 model_lifecycle.cc:636] "successfully unloaded 'audio_tokenizer' version 1"
I0130 06:34:52.734213 7929 server.cc:352] "Timeout 28: Found 0 live models and 0 in-flight non-inference requests"
error: creating server: Internal - failed to load all models


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

执行 bash run.sh 3 3 一直出现这个问题，安装matcha-tts，但是docker是3.12 跟matcha有冲突安装不上 #1810

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

执行 bash run.sh 3 3 一直出现这个问题，安装matcha-tts，但是docker是3.12 跟matcha有冲突安装不上 #1810

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions