Skip to content

Reproducing command line outputs in Python #34

@liezeltamon

Description

@liezeltamon

Hi @giulioisac,

Thanks for the tool! I am interested in getting mainly pgen but also the other 2 values and for some reason I can't seem to reproduce the values I get from the CLI output when running the tool in python.

CLI:

sonnia-evaluate --humanTRB -I examples/data_seqs.txt --ppost -m 100 -d ';'

Load file
Evaluate
Q, Pgen, Ppost
1.786677 4.942290497358142e-10 8.830276773314599e-10
1.6456201 4.3237277280217545e-09 7.115213289239722e-09
2.0089138 3.5477274260014983e-09 7.1270784265640965e-09
1.0092895 9.910969183031456e-17 1.0003037161956793e-16
1.798311 1.4803340841545318e-09 2.662101059959238e-09
5.0797396 3.746403591525355e-08 1.9030754571375525e-07
0.14478417 1.37841409574113e-13 1.9957253719663077e-14

Python:

data_seqs1 = pd.read_csv("examples/data_seqs.txt", sep="\t", header=None)
data_seqs1 = data_seqs1[0].str.split(";", expand=True)
data_seqs1.columns = ["amino_acid", "v_gene", "j_gene"] # Needed to rename sequence column to "amino_acid" for filtering below to work
data_seqs = data_seqs1

processor=Processing(pgen_model="humanTRB")
filtered=processor.filter_dataframe(data_seqs)
data_seqs=filtered.values.astype(str)

qm1 = SoNNia(data_seqs=data_seqs, pgen_model="humanTRB")
Q_data1, pgen_data1, ppost_data1=qm1.evaluate_seqs(qm1.data_seqs[:100])

# Differences small
pgen_data1
array([4.94229245e-10, 1.99541171e-08, 3.54772768e-09, 9.91096918e-17,
       1.48033408e-09, 3.74640359e-08, 1.37841410e-13, 1.98526937e-07,
       1.43200163e-11, 3.46324736e-10, 4.16811824e-09, 1.03809061e-08,
       1.72376642e-08, 4.82014923e-10, 6.20020725e-11, 9.42590252e-11,
       2.25398698e-09, 3.65392727e-10, 5.27807584e-09, 5.84623461e-11,
       3.17030155e-08, 3.96359164e-10, 9.29037599e-10, 1.76579645e-12,
       1.08268917e-08, 2.56016679e-13, 1.30235963e-10, 3.19233825e-10,
       6.75090265e-08, 2.35117709e-07, 3.89271870e-10, 9.98558613e-16,
       3.37153493e-09, 6.24969884e-11, 2.93591180e-11, 6.43661884e-10,
       5.08752822e-08, 1.94376216e-09, 1.10084859e-07, 5.34738132e-09,
       5.60749429e-14, 3.35795675e-11, 1.19253638e-07, 2.65969580e-13,
       5.11265175e-11, 7.16134374e-11, 2.41560496e-12, 1.06468133e-11,
       4.43674340e-14, 1.51565540e-11, 4.47998545e-08, 2.39192875e-11,
       9.52311065e-10, 1.92318629e-08, 1.97742070e-10, 1.50782577e-12,
       4.21257471e-07, 1.19164572e-11, 1.63224208e-09, 4.56943828e-08,
       3.14277779e-11, 5.81003913e-12, 2.21937534e-10, 3.73076110e-13,
       1.50830456e-14, 5.26059728e-09, 1.50432240e-10, 3.40850325e-13,
       3.37399715e-11, 1.55534010e-11, 1.01344702e-08, 3.89670014e-10,
       1.30991358e-09, 3.41665007e-09, 5.25815722e-09, 2.02214481e-09,
       1.12841860e-11, 2.72105890e-10, 2.12228048e-08, 2.55940514e-13,
       1.18065346e-08, 6.44081186e-10, 2.19239232e-12, 5.52554413e-08,
       1.15042186e-07, 2.30651662e-15, 7.22324101e-08, 6.90871974e-09,
       3.13131778e-12, 3.69202511e-09, 7.67474164e-09, 3.91507583e-08,
       5.02981302e-12, 1.53827722e-10, 4.60381726e-09, 1.74514168e-13,
       3.77775917e-10, 5.54589410e-12, 1.01835989e-08, 2.64841347e-07])

# The post and Q changes even when the seed is set.

I may need to do a bit more reading to fully understand the tool, so apologies if I’m missing something basic here!

Best,
Liezel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions