Fix broken links in multiple tutorials (#208)

jmayank1511 · Mayank Jain (SW-TEGRA) · mohnishparmar · web-flow · commit f6ab807993b8 · 2025-03-16T11:46:18.000+05:30
* fix some broken links

* remove empty cell

* pin NeMo version

* chore: Update TTS customization notebook

* chore: TTS notebook description update

---------

Co-authored-by: Mayank Jain (SW-TEGRA) &lt;mayjain@nvidia.com&gt;
Co-authored-by: mohnishp &lt;mohnishp@nvidia.com&gt;
diff --git a/asr-customize-vocabulary-and-lexicon.ipynb b/asr-customize-vocabulary-and-lexicon.ipynb
@@ -86,7 +86,7 @@
     "   <other_parameters>...\n",
     "```\n",
     "\n",
-    "Refer to Riva [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-asr.html#pipeline-configuration) for build commands for supported models.\n",
+    "Refer to Riva [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-pipeline-configuration.html) for build commands for supported models.\n",
     "\n",
     "\n"
    ]
@@ -299,7 +299,7 @@
     "\n",
     "### Sample Applications\n",
     "\n",
-    "Riva comes with various sample applications. They demonstrate how to use the APIs to build various applications. Refer to [Riva Sampple Apps](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/index.html) for more information.  \n",
+    "Riva comes with various sample applications. They demonstrate how to use the APIs to build various applications. Refer to [Riva Sample Apps](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/index.html) for more information.  \n",
     "\n",
     "\n",
     "### Additional Resources\n",
diff --git a/asr-finetune-conformer-ctc-nemo.ipynb b/asr-finetune-conformer-ctc-nemo.ipynb
@@ -80,7 +80,7 @@
     "!pip install Cython\n",
     "\n",
     "## Install NeMo\n",
-    "BRANCH = 'main'\n",
+    "BRANCH = 'v1.23.0'\n",
     "!python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\n",
     "\n",
     "\"\"\"\n",
diff --git a/nmt-python-advanced-finetune-nmt-model-with-nemo.ipynb b/nmt-python-advanced-finetune-nmt-model-with-nemo.ipynb
@@ -74,7 +74,7 @@
     "<a id='nmt_requirements_and_setup'></a>\n",
     "### Requirements and Setup\n",
     "\n",
-    "This tutorial needs to be run from inside a NeMo docker container. If you are not running this tutorial through a NeMo docker container, please refer to the [Riva NMT Tutorials](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_nmt_ea_tutorials)'s [README.md](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_nmt_ea_tutorials/files?version=2.2.0-ea) to get started.\n",
+    "This tutorial needs to be run from inside a NeMo docker container.\n",
     "\n",
     "Before we get into the Requirements and Setup, let us create a base directory for our work here. "
    ]
@@ -105,7 +105,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "NeMoBranch = \"main\"\n",
+    "NeMoBranch = \"'v1.23.0'\"\n",
     "!git clone -b $NeMoBranch https://github.com/NVIDIA/NeMo $base_dir/NeMo"
    ]
   },
@@ -156,7 +156,7 @@
    "id": "6c69451f",
    "metadata": {},
    "source": [
-    "2. Install the `nemo2riva` library from the [Riva Quick Start Guide](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_quickstart)."
+    "2. Install the `nemo2riva` library from the [pypi](https://pypi.org/project/nemo2riva/) or [github](https://github.com/NVIDIA/nemo2riva)."
    ]
   },
   {
@@ -229,7 +229,7 @@
    "source": [
     "### Step 2. Data preprocessing\n",
     "\n",
-    "Data preprocessing consists of multiple steps to improve the quality of the dataset. [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation.html#data-cleaning-normalization-tokenization) provides detailed instructions about the 8-step data preprocessing for NMT. NeMo also provides a [jupyter notebook](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Data_Preprocessing_and_Cleaning_for_NMT.ipynb) that takes users programatically through the different preprocessing steps. Note that depending on the dataset, some or all preprocessing steps can be skipped.\n",
+    "Data preprocessing consists of multiple steps to improve the quality of the dataset. [NeMo documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/machine_translation/machine_translation.html#data-cleaning-normalization-tokenization) provides detailed instructions about the 8-step data preprocessing for NMT. NeMo also provides a [jupyter notebook](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Data_Preprocessing_and_Cleaning_for_NMT.ipynb) that takes users programatically through the different preprocessing steps. Note that depending on the dataset, some or all preprocessing steps can be skipped.\n",
     "\n",
     "To simplify the fine-tuning process in the Riva NMT program, we have provided 3 preprocessing scripts through the NeMo repository. The input to these scripts will be the 2 parallel corpus (i.e., source and target language) data files. In this tutorial, we are using the Moses' version of the Scielo dataset, which directly provides us the source (`en_es.en`) and target (`en_es.es`) data files. If the dataset does not directly provide these files, then we first need to generate these 2 files from the dataset before using the preprocessing scripts.\n",
     "\n",
@@ -756,7 +756,7 @@
     "### Step 6. Deploying the fine-tuned NeMo NMT model on the Riva Speech Skills server.\n",
     "\n",
     "The NeMo-finetuned NMT model needs to be deployed on Riva Speech Skills server for inference. <br>\n",
-    "Please follow the \"How to deploy a NeMo-finetuned NMT model on Riva Speech Skills server?\" tutorial from [Riva NMT Tutorials](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_nmt_ea_tutorials) - This notebook covers deploying the .riva file obtained from Step 5, on Riva Speech Skills server."
+    "Please follow the \"How to deploy a NeMo-finetuned NMT model on Riva Speech Skills server?\" tutorial from [Riva NMT Tutorials](https://github.com/nvidia-riva/tutorials/blob/main/nmt-python-advanced-deploy-nemo-nmt-model-on-riva.ipynb) - This notebook covers deploying the .riva file obtained from Step 5, on Riva Speech Skills server."
    ]
   },
   {
diff --git a/nmt-python-advanced-synthetic-data-generation.ipynb b/nmt-python-advanced-synthetic-data-generation.ipynb
@@ -196,7 +196,7 @@
    "source": [
     "### Step 2. Data preprocessing\n",
     "\n",
-    "Data preprocessing consists of multiple steps to improve the quality of the dataset. [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation.html#data-cleaning-normalization-tokenization) provides detailed instructions about the 8-step data preprocessing for NMT. NeMo also provides a [jupyter notebook](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Data_Preprocessing_and_Cleaning_for_NMT.ipynb) that takes users programatically through the different preprocessing steps. Note that depending on the dataset, some or all preprocessing steps can be skipped.\n",
+    "Data preprocessing consists of multiple steps to improve the quality of the dataset. [NeMo documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/machine_translation/machine_translation.html#data-cleaning-normalization-tokenization) provides detailed instructions about the 8-step data preprocessing for NMT. NeMo also provides a [jupyter notebook](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Data_Preprocessing_and_Cleaning_for_NMT.ipynb) that takes users programatically through the different preprocessing steps. Note that depending on the dataset, some or all preprocessing steps can be skipped.\n",
     "\n",
     "To simplify the process in the Riva NMT program, we are only performing lang id filtering before data generation to get rid of any noise that maybe present in raw dataset. The input to these scripts will be a parallel corpus (i.e., source and target language) data files. In this tutorial, we are using the Moses' version of the Scielo dataset, which directly provides us the source (`en_es.en`) and target (`en_es.es`) data files. If the dataset does not directly provide these files, then we first need to generate these 2 files from the dataset before using the preprocessing scripts.\n",
     "\n",
@@ -324,7 +324,7 @@
    "source": [
     "### Step 4. Refer to the fine-tuning tutorial for using this data to customize the OOTB model.\n",
     "\n",
-    "Lastly, follow the steps in \" in [Riva NMT Tutorials](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_nmt_ea_tutorials) to use this data for customizing the OOTB model."
+    "Lastly, follow the steps in \" in [Riva NMT Tutorials](https://github.com/nvidia-riva/tutorials/blob/main/nmt-python-advanced-finetune-nmt-model-with-nemo.ipynb) to use this data for customizing the OOTB model."
    ]
   }
  ],
diff --git a/nmt-python-basics.ipynb b/nmt-python-basics.ipynb
@@ -47,7 +47,7 @@
     "3. **Bilingual models** are used for translation from one source language to another target language. For example, the `en_de_24x6` model can be used to translate from English to Russian. Bilingual models have a single pair of language codes in their name. Use a bilingual model when you want the best possible performance for a specific language pair direction. Running bilingual models produces faster results compared to running multilingual models. \n",
     "\n",
     "To learn more about Riva NMT, refer to the Riva NMT EA documentation.  \n",
-    "For more information about the NMT model architecture and training, refer to the [NeMo NMT documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation.html)."
+    "For more information about the NMT model architecture and training, refer to the [NeMo NMT documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/machine_translation/machine_translation.html)."
    ]
   },
   {
@@ -96,7 +96,7 @@
     "    * `Spanish (es) ASR, Spanish-to-English (es-en) NMT and English (en) TTS` models - The instructions to deploy Spanish (language code `es-US`) ASR model and English (`en-US`) TTS model can be found in the `config.sh` itself, as the latter section of this tutorial will cover using Speech-to-Speech (S2S) and Speech-to-Text (S2T) services. The model name corresponding to Spanish-English language pair can be found in the [table above](#nmt_language_pairs_supported).\n",
     "\n",
     "2. Install the Riva Client library.   \n",
-    "Follow the steps in the 'Running the Riva Client' in the Riva NMT EA Tutorials' [Overview section](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_nmt_ea_tutorials) or [README.md](https://ngc.nvidia.com/resources/riem1phmzvud:riva:riva_nmt_ea_tutorials/files?version=2.2.0-ea) to install the Riva Client library.  \n",
+    "Follow the steps [here](https://github.com/nvidia-riva/python-clients?tab=readme-ov-file#installation) to install the Riva Client library.  \n",
     "\n",
     "3. Install additional libraries needed to run this tutorial.  "
    ]
@@ -225,7 +225,7 @@
    "id": "c4a5110e",
    "metadata": {},
    "source": [
-    "To learn more about `NeuralMachineTranslationClient`, refer to the corresponding [docstring](https://github.com/nvidia-riva/python-clients/blob/main/riva/client/nmt.py#L13).  \n",
+    "To learn more about `NeuralMachineTranslationClient`, refer to the corresponding [docstring](https://github.com/nvidia-riva/python-clients/blob/main/riva/client/nmt.py#L33).  \n",
     "\n",
     "Now we submit the request to the server."
    ]
diff --git a/tts-basics-customize-ssml.ipynb b/tts-basics-customize-ssml.ipynb
@@ -35,7 +35,8 @@
    "source": [
     "## Basics: Generating Speech with Riva TTS APIs\n",
     "\n",
-    "The Riva TTS service is based on a two-stage pipeline: Riva generates a mel spectrogram using the first model, then uses the mel spectrogram to generate speech using the second model. This pipeline forms a text-to-speech system that enables you to synthesize natural sounding speech from raw transcripts without any additional information such as patterns or rhythms of speech.\n",
+    "The Riva TTS service is based on a two-stage pipeline: Riva models like FastPitch and RadTTS++ first generates a mel-spectrogram, and then generates\n",
+    "speech using the HifiGAN model while MagpieTTS Multilingual generates tokens and then generates speech using the Audio Codec model. This pipeline forms a text-to-speech system that enables you to synthesize natural sounding speech from raw transcripts without any additional information such as patterns or rhythms of speech.\n",
     "\n",
     "Riva provides two state-of-the-art voices (one male and one female) for English, that can easily be deployed with the Riva Quick Start scripts. Riva also supports easy customization of TTS in various ways, to meet your specific needs.  \n",
     "Subsequent Riva releases will include features such as  model registration to support multiple languages/voices with the same API and support for resampling to alternative sampling rates.  \n",
@@ -114,7 +115,7 @@
    "source": [
     "### TTS modes\n",
     "\n",
-    "Riva TTS supports both streaming and batch inference modes. In batch mode, audio is not returned until the full audio sequence for the requested text is generated and can achieve higher throughput. But when making a streaming request, audio chunks are returned as soon as they are generated, significantly reducing the latency (as measured by time to first audio) for large requests. <br> \n",
+    "Riva TTS supports both streaming and offline inference modes. In offline mode, audio is not returned until the full audio sequence for the requested text is generated and can achieve higher throughput. But when making a streaming request, audio chunks are returned as soon as they are generated, significantly reducing the latency (as measured by time to first audio) for large requests. <br> \n",
     "\n",
     "\n",
     "\n",
@@ -153,7 +154,8 @@
     "- ``language_code`` - Language of the generated audio. ``en-US`` represents English (US) and is currently the only language supported OOTB.\n",
     "- ``encoding`` - Type of audio encoding to generate. ``LINEAR_PCM`` and ``OGGOPUS`` encodings are supported.\n",
     "- ``sample_rate_hz`` - Sample rate of the generated audio. Depends on the microphone and is usually ``22khz`` or ``44khz``.\n",
-    "- ``voice_name`` - Voice used to synthesize the audio. Currently, Riva offers two OOTB voices (``English-US.Female-1``, ``English-US.Male-1``)."
+    "- ``voice_name`` - Voice used to synthesize the audio. Currently, Riva offers two OOTB voices (``English-US.Female-1``, ``English-US.Male-1``).\n",
+    "- ``custom_pronunciation`` - Dictionary of words and their custom pronunciations. For ease of use, the python API accepts a dictionary of words and their custom pronunciations. While the gRPC API accepts a string of comma seperated entries of words and their custom pronunciations with the format ``word1  pronunciation1,word2  pronunciation2``."
    ]
   },
   {
@@ -227,6 +229,15 @@
     "Let's look at customization of Riva TTS with these SSML tags in some detail."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "##### Note\n",
+    "Magpie TTS Multilingual supports only ``phoneme`` tag."
+   ]
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -332,7 +343,7 @@
     "<audio controls src=\"https://raw.githubusercontent.com/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_0.wav\" type=\"audio/ogg\"></audio>\n",
     "\n",
     "#### Note\n",
-    "If the audio controls are not seen throughout notebook. Open the notebook in github dev or view it in the [riva docs](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-python-basics-and-customization-with-ssml.html)\n"
+    "If the audio controls are not seen throughout notebook. Open the notebook in github dev or view it in the [riva docs](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-basics-customize-ssml.html)\n"
    ]
   },
   {
@@ -457,6 +468,10 @@
     "#### Arpabet\n",
     "The full list of phonemes in the CMUdict can be found at [cmudict.phone](https://github.com/cmusphinx/cmudict/blob/master/cmudict.phones). The list of supported symbols with stress can be found at [cmudict.symbols](https://github.com/cmusphinx/cmudict/blob/master/cmudict.symbols). For a mapping of these phones to English sounds, refer to the [ARPABET Wikipedia page](https://en.wikipedia.org/wiki/ARPABET).\n",
     "\n",
+    "#### Custom pronunciations\n",
+    "\n",
+    "We also support passing custom pronunciations for words with the request which will override the default pronunciation for the word for the request. For ease of use, the python API accepts a dictionary of words and their custom pronunciations. While the gRPC API accepts a string of comma seperated entries of words and their custom pronunciations with the format ``word1  pronunciation1,word2  pronunciation2``.\n",
+    "\n",
     "Let's look at an example showing this custom pronunciation for Riva TTS:"
    ]
   },
@@ -481,11 +496,28 @@
     "ssml_text = '<speak>You say <phoneme alphabet=\"ipa\" ph=\"təˈmeɪˌtoʊ\">tomato</phoneme>, I say <phoneme alphabet=\"ipa\" ph=\"təˈmɑˌtoʊ\">tomato</phoneme>.</speak>'\n",
     "# Older arpabet version\n",
     "# ssml_text = '<speak>You say <phoneme alphabet=\"x-arpabet\" ph=\"{@T}{@AH0}{@M}{@EY1}{@T}{@OW2}\">tomato</phoneme>, I say <phoneme alphabet=\"x-arpabet\" ph=\"{@T}{@AH0}{@M}{@AA1}{@T}{@OW2}\">tomato</phoneme>.</speak>'\n",
+    "custom_pronunciation = {\n",
+    "    \"tomato\": \"təˈmeɪˌtoʊ\"\n",
+    "}\n",
+    "print(\"Raw Text: \", raw_text)\n",
+    "print(\"SSML Text: \", ssml_text)\n",
+    "\n",
+    "req[\"text\"] = ssml_text\n",
+    "# Request to Riva TTS to synthesize audio\n",
+    "resp = riva_tts.synthesize(**req)\n",
+    "\n",
+    "# Playing the generated audio from Riva TTS request\n",
+    "audio_samples = np.frombuffer(resp.audio, dtype=np.int16)\n",
+    "ipd.display(ipd.Audio(audio_samples, rate=sample_rate_hz))\n",
+    "\n",
+    "# Passing custom pronunciation dictionary\n",
+    "ssml_text = '<speak>You say tomato, I say <phoneme alphabet=\"ipa\" ph=\"təˈmɑˌtoʊ\">tomato</phoneme>.</speak>'\n",
     "\n",
     "print(\"Raw Text: \", raw_text)\n",
     "print(\"SSML Text: \", ssml_text)\n",
     "\n",
     "req[\"text\"] = ssml_text\n",
+    "req[\"custom_pronunciation\"] = custom_pronunciation\n",
     "# Request to Riva TTS to synthesize audio\n",
     "resp = riva_tts.synthesize(**req)\n",
     "\n",
@@ -500,6 +532,11 @@
    "source": [
     "#### Expected results if you run the tutorial:\n",
     "`You say <phoneme alphabet=\"ipa\" ph=\"təˈmeɪˌtoʊ\">tomato</phoneme>, I say <phoneme alphabet=\"ipa\" ph=\"təˈmɑˌtoʊ\">tomato</phoneme>.`  \n",
+    "\n",
+    "<audio controls src=\"https://raw.githubusercontent.com/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_9.wav\" type=\"audio/wav\"></audio> \n",
+    "\n",
+    "`You say tomato, I say <phoneme alphabet=\"ipa\" ph=\"təˈmɑˌtoʊ\">tomato</phoneme>.`\n",
+    "\n",
     "<audio controls src=\"https://raw.githubusercontent.com/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_9.wav\" type=\"audio/wav\"></audio> \n"
    ]
   },
diff --git a/tts-finetune-nemo.ipynb b/tts-finetune-nemo.ipynb
@@ -98,7 +98,7 @@
    "outputs": [],
    "source": [
     "!pip install nvidia-pyindex\n",
-    "!pip install nemo_toolkit['all']\n",
+    "!pip install nemo_toolkit['all']==1.23.0\n",
     "!ngc registry resource download-version \"nvidia/riva/riva_quickstart:2.8.1\"\n",
     "!pip install \"riva_quickstart_v2.8.1/nemo2riva-2.8.1-py3-none-any.whl\"\n",
     "!pip install protobuf==3.20.0\n",