What other problems have you gotten your implementation to converge with?

I'm just running your example (Python 3.6, TensorFlow 1.6, NVIDIA TITAN X, CUDA 9 with CuDNN):

git clone git@github.com:hannw/sgrnn.git
cd sgrnn
pip install .

wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
tar xvf simple-examples.tgz
rm simple-examples.tgz

python sgrnn/main.py --model=small --data_path=simple-examples/data/ \
    --num_gpus=0 --rnn_mode=BASIC --save_path=/tmp/sgrnn

and unfortunately the training diverges every time

Epoch: 1 Learning rate: 1.000
0.001 perplexity: 3462.042 speed: 628 wps
0.101 perplexity: 8128272430211485696.000 speed: 10973 wps
0.201 perplexity: 120207729490494550265823232.000 speed: 12286 wps
0.301 perplexity: 13565869158556072395952619520000.000 speed: 12739 wps
0.401 perplexity: 56232215737478433981029280594788352.000 speed: 12937 wps
0.501 perplexity: 12288526736969419778825356413524508672.000 speed: 13170 wps
0.601 perplexity: 632620625034758550604580899158593896448.000 speed: 13319 wps
0.701 perplexity: 10080105677426370330298322327795790774272.000 speed: 13431 wps
0.801 perplexity: 115000120853598177122656739292984641585152.000 speed: 13453 wps
0.901 perplexity: 857297091568710692624085315481136757997568.000 speed: 13512 wps

Training diverges #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions