Skip to content

执行 bash run.sh 3 3 一直出现这个问题,安装matcha-tts,但是docker是3.12 跟matcha有冲突安装不上 #1810

@suyansong

Description

@suyansong

root@jdev-3090:/mnt/runtime/triton_trtllm# bash run.sh 3 3
Starting Triton server
I0130 06:34:46.490943 7929 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7fbb44000000' with size 268435456"
I0130 06:34:46.493070 7929 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0130 06:34:46.497172 7929 model_lifecycle.cc:473] "loading: audio_tokenizer:1"
I0130 06:34:46.497192 7929 model_lifecycle.cc:473] "loading: cosyvoice2:1"
I0130 06:34:46.497201 7929 model_lifecycle.cc:473] "loading: speaker_embedding:1"
I0130 06:34:46.497221 7929 model_lifecycle.cc:473] "loading: tensorrt_llm:1"
I0130 06:34:46.497230 7929 model_lifecycle.cc:473] "loading: token2wav:1"
I0130 06:34:47.470954 7957 pb_stub.cc:320] Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'cosyvoice'

At:
/mnt/runtime/triton_trtllm/model_repo_cosyvoice2/speaker_embedding/1/model.py(35):
(488): _call_with_frames_removed
(995): exec_module
(950): _load_unlocked
(1334): _find_and_load_unlocked
(1360): _find_and_load

E0130 06:34:47.479864 7929 model_lifecycle.cc:654] "failed to load 'speaker_embedding' version 1: Internal: ModuleNotFoundError: No module named 'cosyvoice'\n\nAt:\n /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/speaker_embedding/1/model.py(35): \n (488): _call_with_frames_removed\n (995): exec_module\n (950): _load_unlocked\n (1334): _find_and_load_unlocked\n (1360): _find_and_load\n"
I0130 06:34:47.479899 7929 model_lifecycle.cc:789] "failed to load 'speaker_embedding'"
I0130 06:34:47.614020 7929 libtensorrtllm.cc:55] "TRITONBACKEND_Initialize: tensorrtllm"
I0130 06:34:47.614039 7929 libtensorrtllm.cc:62] "Triton TRITONBACKEND API version: 1.19"
I0130 06:34:47.614042 7929 libtensorrtllm.cc:66] "'tensorrtllm' TRITONBACKEND API version: 1.19"
I0130 06:34:47.614044 7929 libtensorrtllm.cc:86] "backend configuration:\n{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}"
[TensorRT-LLM][WARNING] gpu_device_ids is not specified, will be automatically set
[TensorRT-LLM][WARNING] participant_ids is not specified, will be automatically set
I0130 06:34:47.620198 7929 libtensorrtllm.cc:114] "TRITONBACKEND_ModelInitialize: tensorrt_llm (version 1)"
[TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000
[TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0
[TensorRT-LLM][WARNING] normalize_log_probs is not specified, will be set to true
[TensorRT-LLM][WARNING] cross_kv_cache_fraction is not specified, error if it's encoder-decoder model, otherwise ok
[TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0
[TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true
[TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value
[TensorRT-LLM][WARNING] enable_chunked_context is not specified, will be set to false.
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64
[TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8
[TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05
[TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB
[TensorRT-LLM][INFO] num_nodes is not specified, will be set to 1
[TensorRT-LLM][WARNING] multi_block_mode is not specified, will be set to true
[TensorRT-LLM][WARNING] enable_context_fmha_fp32_acc is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_mode is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_cache_size is not specified, will be set to 0
[TensorRT-LLM][INFO] speculative_decoding_fast_logits is not specified, will be set to false
[TensorRT-LLM][WARNING] decoding_mode parameter is invalid or not specified(must be one of the {top_k, top_p, top_k_top_p, beam_search, medusa, redrafter, lookahead, eagle}).Using default: top_k_top_p if max_beam_width == 1, beam_search otherwise
[TensorRT-LLM][WARNING] gpu_weights_percent parameter is not specified, will use default value of 1.0
[TensorRT-LLM][INFO] recv_poll_period_ms is not set, will use busy loop
[TensorRT-LLM][WARNING] encoder_model_path is not specified, will be left empty
[TensorRT-LLM][INFO] Engine version 0.20.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 16
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 16
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 32768
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (2560) * 24
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 32768
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 32767 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens).
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Loaded engine size: 1215 MiB
[TensorRT-LLM][INFO] Engine load time 589 ms
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1024.03 MiB for execution context memory.
[TensorRT-LLM][INFO] gatherContextLogits: 0
[TensorRT-LLM][INFO] gatherGenerationLogits: 0
I0130 06:34:48.374294 7954 pb_stub.cc:320] Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'matcha'

At:
/mnt/runtime/triton_trtllm/model_repo_cosyvoice2/cosyvoice2/1/model.py(42):
(488): _call_with_frames_removed
(995): exec_module
(950): _load_unlocked
(1334): _find_and_load_unlocked
(1360): _find_and_load

E0130 06:34:48.385283 7929 model_lifecycle.cc:654] "failed to load 'cosyvoice2' version 1: Internal: ModuleNotFoundError: No module named 'matcha'\n\nAt:\n /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/cosyvoice2/1/model.py(42): \n (488): _call_with_frames_removed\n (995): exec_module\n (950): _load_unlocked\n (1334): _find_and_load_unlocked\n (1360): _find_and_load\n"
I0130 06:34:48.385305 7929 model_lifecycle.cc:789] "failed to load 'cosyvoice2'"
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1209 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 13.87 MB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 46.10 MB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 23.57 GiB, available: 20.74 GiB, extraCostMemory: 0.00 GiB
[TensorRT-LLM][WARNING] Both freeGpuMemoryFraction (aka kv_cache_free_gpu_mem_fraction) and maxTokens (aka max_tokens_in_paged_kv_cache) are set (to 0.500000 and 2560, respectively). The smaller value will be used.
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 80
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] before Create KVCacheManager cacheTransPreAllocaSize:0
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 1024 [window size=2560]
[TensorRT-LLM][INFO] Number of tokens per block: 32.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 0.03 GiB for max tokens in paged KV cache (2560).
[TensorRT-LLM][WARNING] cancellation_check_period_ms is not specified, will be set to 100 (ms)
[TensorRT-LLM][WARNING] stats_check_period_ms is not specified, will be set to 100 (ms)
I0130 06:34:48.517434 7929 libtensorrtllm.cc:184] "TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_0_0"
I0130 06:34:48.517569 7929 model_lifecycle.cc:849] "successfully loaded 'tensorrt_llm'"
I0130 06:34:48.553925 8103 pb_stub.cc:320] Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'cosyvoice'

At:
/mnt/runtime/triton_trtllm/model_repo_cosyvoice2/token2wav/1/model.py(39):
(488): _call_with_frames_removed
(995): exec_module
(950): _load_unlocked
(1334): _find_and_load_unlocked
(1360): _find_and_load

E0130 06:34:48.562428 7929 model_lifecycle.cc:654] "failed to load 'token2wav' version 1: Internal: ModuleNotFoundError: No module named 'cosyvoice'\n\nAt:\n /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/token2wav/1/model.py(39): \n (488): _call_with_frames_removed\n (995): exec_module\n (950): _load_unlocked\n (1334): _find_and_load_unlocked\n (1360): _find_and_load\n"
I0130 06:34:48.562449 7929 model_lifecycle.cc:789] "failed to load 'token2wav'"
I0130 06:34:48.872798 7929 python_be.cc:2289] "TRITONBACKEND_ModelInstanceInitialize: audio_tokenizer_0_0 (CPU device 0)"
I0130 06:34:50.673777 7929 model_lifecycle.cc:849] "successfully loaded 'audio_tokenizer'"
I0130 06:34:50.673925 7929 server.cc:611]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0130 06:34:50.673944 7929 server.cc:638]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0130 06:34:50.673974 7929 server.cc:681]
+-------------------+---------+-----------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-------------------+---------+-----------------------------------------------------------------------------------------------+
| audio_tokenizer | 1 | READY |
| cosyvoice2 | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'matcha' |
| | | |
| | | At: |
| | | /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/cosyvoice2/1/model.py(42): |
| | | (488): _call_with_frames_removed |
| | | (995): exec_module |
| | | (950): _load_unlocked |
| | | (1334): _find_and_load_unlocked |
| | | (1360): _find_and_load |
| speaker_embedding | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'cosyvoice' |
| | | |
| | | At: |
| | | /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/speaker_embedding/1/model.py(35): |
| | | (488): _call_with_frames_removed |
| | | (995): exec_module |
| | | (950): _load_unlocked |
| | | (1334): _find_and_load_unlocked |
| | | (1360): _find_and_load |
| tensorrt_llm | 1 | READY |
| token2wav | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'cosyvoice' |
| | | |
| | | At: |
| | | /mnt/runtime/triton_trtllm/model_repo_cosyvoice2/token2wav/1/model.py(39): |
| | | (488): _call_with_frames_removed |
| | | (995): exec_module |
| | | (950): _load_unlocked |
| | | (1334): _find_and_load_unlocked |
| | | (1360): _find_and_load |
+-------------------+---------+-----------------------------------------------------------------------------------------------+

I0130 06:34:50.730851 7929 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090"
I0130 06:34:50.733660 7929 metrics.cc:783] "Collecting CPU metrics"
I0130 06:34:50.733709 7929 tritonserver.cc:2598]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.59.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | ./model_repo_cosyvoice2 |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0130 06:34:50.733754 7929 server.cc:312] "Waiting for in-flight requests to complete."
I0130 06:34:50.733758 7929 server.cc:328] "Timeout 30: Found 0 model versions that have in-flight inferences"
I0130 06:34:50.733919 7929 server.cc:343] "All models are stopped, unloading models"
I0130 06:34:50.733922 7929 server.cc:352] "Timeout 30: Found 2 live models and 0 in-flight non-inference requests"
[TensorRT-LLM][INFO] Refreshed the MPI local session
I0130 06:34:50.867384 7929 model_lifecycle.cc:636] "successfully unloaded 'tensorrt_llm' version 1"
I0130 06:34:51.734022 7929 server.cc:352] "Timeout 29: Found 1 live models and 0 in-flight non-inference requests"
I0130 06:34:52.104783 7929 model_lifecycle.cc:636] "successfully unloaded 'audio_tokenizer' version 1"
I0130 06:34:52.734213 7929 server.cc:352] "Timeout 28: Found 0 live models and 0 in-flight non-inference requests"
error: creating server: Internal - failed to load all models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions