Reproducing command line outputs in Python

Hi @giulioisac,

Thanks for the tool! I am interested in getting mainly `pgen` but also the other 2 values and for some reason I can't seem to reproduce the values I get from the CLI output when running the tool in python.

CLI:

```
sonnia-evaluate --humanTRB -I examples/data_seqs.txt --ppost -m 100 -d ';'

Load file
Evaluate
Q, Pgen, Ppost
1.786677 4.942290497358142e-10 8.830276773314599e-10
1.6456201 4.3237277280217545e-09 7.115213289239722e-09
2.0089138 3.5477274260014983e-09 7.1270784265640965e-09
1.0092895 9.910969183031456e-17 1.0003037161956793e-16
1.798311 1.4803340841545318e-09 2.662101059959238e-09
5.0797396 3.746403591525355e-08 1.9030754571375525e-07
0.14478417 1.37841409574113e-13 1.9957253719663077e-14
```

Python:

```
data_seqs1 = pd.read_csv("examples/data_seqs.txt", sep="\t", header=None)
data_seqs1 = data_seqs1[0].str.split(";", expand=True)
data_seqs1.columns = ["amino_acid", "v_gene", "j_gene"] # Needed to rename sequence column to "amino_acid" for filtering below to work
data_seqs = data_seqs1

processor=Processing(pgen_model="humanTRB")
filtered=processor.filter_dataframe(data_seqs)
data_seqs=filtered.values.astype(str)

qm1 = SoNNia(data_seqs=data_seqs, pgen_model="humanTRB")
Q_data1, pgen_data1, ppost_data1=qm1.evaluate_seqs(qm1.data_seqs[:100])

# Differences small
pgen_data1
array([4.94229245e-10, 1.99541171e-08, 3.54772768e-09, 9.91096918e-17,
       1.48033408e-09, 3.74640359e-08, 1.37841410e-13, 1.98526937e-07,
       1.43200163e-11, 3.46324736e-10, 4.16811824e-09, 1.03809061e-08,
       1.72376642e-08, 4.82014923e-10, 6.20020725e-11, 9.42590252e-11,
       2.25398698e-09, 3.65392727e-10, 5.27807584e-09, 5.84623461e-11,
       3.17030155e-08, 3.96359164e-10, 9.29037599e-10, 1.76579645e-12,
       1.08268917e-08, 2.56016679e-13, 1.30235963e-10, 3.19233825e-10,
       6.75090265e-08, 2.35117709e-07, 3.89271870e-10, 9.98558613e-16,
       3.37153493e-09, 6.24969884e-11, 2.93591180e-11, 6.43661884e-10,
       5.08752822e-08, 1.94376216e-09, 1.10084859e-07, 5.34738132e-09,
       5.60749429e-14, 3.35795675e-11, 1.19253638e-07, 2.65969580e-13,
       5.11265175e-11, 7.16134374e-11, 2.41560496e-12, 1.06468133e-11,
       4.43674340e-14, 1.51565540e-11, 4.47998545e-08, 2.39192875e-11,
       9.52311065e-10, 1.92318629e-08, 1.97742070e-10, 1.50782577e-12,
       4.21257471e-07, 1.19164572e-11, 1.63224208e-09, 4.56943828e-08,
       3.14277779e-11, 5.81003913e-12, 2.21937534e-10, 3.73076110e-13,
       1.50830456e-14, 5.26059728e-09, 1.50432240e-10, 3.40850325e-13,
       3.37399715e-11, 1.55534010e-11, 1.01344702e-08, 3.89670014e-10,
       1.30991358e-09, 3.41665007e-09, 5.25815722e-09, 2.02214481e-09,
       1.12841860e-11, 2.72105890e-10, 2.12228048e-08, 2.55940514e-13,
       1.18065346e-08, 6.44081186e-10, 2.19239232e-12, 5.52554413e-08,
       1.15042186e-07, 2.30651662e-15, 7.22324101e-08, 6.90871974e-09,
       3.13131778e-12, 3.69202511e-09, 7.67474164e-09, 3.91507583e-08,
       5.02981302e-12, 1.53827722e-10, 4.60381726e-09, 1.74514168e-13,
       3.77775917e-10, 5.54589410e-12, 1.01835989e-08, 2.64841347e-07])

# The post and Q changes even when the seed is set.
```

I may need to do a bit more reading to fully understand the tool, so apologies if I’m missing something basic here!

Best,
Liezel


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing command line outputs in Python #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing command line outputs in Python #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions