When training the metaworld_bin-picking task using two different seeds, seed0 and seed1, I observed significant differences in success rates. Seed1 achieved a maximum success rate of 0.7, while seed0 consistently remained far below seed1's rate (peaking at 0.4). I would like to understand the cause of this discrepancy. I trained the model using 10 sample datasets.
