forked from Hendricks-Laboratory/OT2Control
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcontroller.py
More file actions
3471 lines (3083 loc) · 163 KB
/
controller.py
File metadata and controls
3471 lines (3083 loc) · 163 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
'''
This module contains everything that the server needs to run. Partly seperate from the OT2 because
it needs different packages (OT2 uses historic packages) and partly for organizational purposes.
The core of this module is the ProtocolExecutor class. The ProtocolExecutor is responsible for
interfacing with the robot, the platereader, and googlesheets. It's purpose is to load a reaction
protocol from googlesheets and then execute that protocol line by line by communicating with the
robot and platereader. Attempts to do as much computation as possible before sending commands
to those applications
The ProtocolExecutor uses a PlateReader.
PlateReader is a custom class that is built for controlling the platereader.
In order to control the platereader, the software should be closed when PlateReader
is instantiated, and (obviously) the software should exist on the machine you're running
This module also contains two launchers.
launch_protocol_exec runs a protocol from the sheets using a protocol executor
launch_auto runs in automatic machine learning mode
A main method is supplied that will run if you run this script. It will call one of the launchers
based on command line args. (run this script with -h)
'''
from abc import ABC
from abc import abstractmethod
from collections import defaultdict
from collections import namedtuple
import socket
import json
import dill
import math
import os
import shutil
import webbrowser
from tempfile import NamedTemporaryFile
import logging
import asyncio
import threading
import time
import argparse
import re
import functools
import datetime
from bidict import bidict
import gspread
from df2gspread import df2gspread as d2g
from df2gspread import gspread2df as g2d
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
import numpy as np
import opentrons.execute
import opentrons.simulate
from opentrons import protocol_api, types
from boltons.socketutils import BufferedSocket
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib import rcParams
rcParams.update({'figure.autolayout': True})
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import Lasso
from Armchair.armchair import Armchair
from ot2_robot import launch_eve_server
from df_utils import make_unique, df_popout, wslpath, error_exit
from ml_models import DummyMLModel, LinReg
from exceptions import ConversionError
def init_parser():
parser = argparse.ArgumentParser()
mode_help_str = 'mode=auto runs in ml, mode=protocol or not supplied runs protocol'
parser.add_argument('-m','--mode',help=mode_help_str,default='protocol')
parser.add_argument('-n','--name',help='the name of the google sheet')
parser.add_argument('-c','--cache',help='flag. if supplied, uses cache',action='store_true')
parser.add_argument('-s','--simulate',help='runs robot and pr in simulation mode',action='store_true')
parser.add_argument('--no-sim',help='won\'t run simulation at the start.',action='store_true')
parser.add_argument('--no-pr', help='won\'t invoke platereader, even in simulation mode',action='store_true')
return parser
def main(serveraddr):
'''
prompts for input and then calls appropriate launcher
'''
parser = init_parser()
args = parser.parse_args()
if args.mode == 'protocol':
print('launching in protocol mode')
launch_protocol_exec(serveraddr,args.name,args.cache,args.simulate,args.no_sim,args.no_pr)
elif args.mode == 'auto':
print('launching in auto mode')
launch_auto(serveraddr,args.name,args.cache,args.simulate,args.no_sim,args.no_pr)
else:
print("invalid argument to mode, '{}'".format(args.mode))
parser.print_help()
def launch_protocol_exec(serveraddr, rxn_sheet_name, use_cache, simulate, no_sim, no_pr):
'''
main function to launch a controller and execute a protocol
'''
#instantiate a controller
if not rxn_sheet_name:
rxn_sheet_name = input('<<controller>> please input the sheet name ')
my_ip = socket.gethostbyname(socket.gethostname())
controller = ProtocolExecutor(rxn_sheet_name, my_ip, serveraddr, use_cache=use_cache)
if not no_sim:
controller.run_simulation(no_pr=no_pr)
if input('would you like to run the protocol? [yn] ').lower() == 'y':
controller.run_protocol(simulate, no_pr)
def launch_auto(serveraddr, rxn_sheet_name, use_cache, simulate, no_sim, no_pr):
'''
main function to launch an auto scientist that designs it's own experiments
'''
if not rxn_sheet_name:
rxn_sheet_name = input('<<controller>> please input the sheet name ')
my_ip = socket.gethostbyname(socket.gethostname())
auto = AutoContr(rxn_sheet_name, my_ip, serveraddr, use_cache=use_cache)
#note shorter iterations for testing
model = MultiOutputRegressor(Lasso(warm_start=True, max_iter=int(1e1)))
final_spectra = np.loadtxt(
"test_target_1.csv", delimiter=',', dtype=float).reshape(1,-1)
Y_SHAPE = 1 #number of reagents to learn on
ml_model = LinReg(model, final_spectra, y_shape=Y_SHAPE, max_iters=3,
scan_bounds=(540,560), duplication=2)
if not no_sim:
auto.run_simulation(ml_model, no_pr=no_pr)
if input('would you like to run on robot and pr? [yn] ').lower() == 'y':
model = MultiOutputRegressor(Lasso(warm_start=True, max_iter=int(1e4)))
ml_model = LinReg(model, final_spectra, y_shape=Y_SHAPE, max_iters=24,
scan_bounds=(540,560),duplication=2)
auto.run_protocol(simulate=simulate, model=ml_model,no_pr=no_pr)
class Controller(ABC):
'''
This class is a shared interface for the ProtocolExecutor and the ______AI__Executor___
ATTRIBUTES:
armchair.Armchair portal: the Armchair object to ship files across
rxn_sheet_name: the name of the reaction sheet
str cache_path: path to a directory for all cache files
bool use_cache: read from cache if possible
str eve_files_path: the path to put files from eve
str debug_path: the path to place debugging information
str my_ip: the ip of this controller
str server_ip: the ip of the server. This is modified for simulation, but returned to
original state at the end of simulation
dict<str:object> robo_params: convenient place for the parameters for the robot
+ bool using_temp_ctrl: True if the temperature control is being used
+ float temp: the temperature in celcius to keep the temp control at
+ df reagent_df: holds information about reagents
+ float conc: the concentration
+ str loc: location on labware
+ int deck_pos: the position on the deck
+ float mass: the mass of the tube with reagent and cap
dict<str:str> instruments: maps 'left' and 'right' to the pipette names
df labware_df
+ int deck_pos: the position of the labware on the deck
+ str name: the name of the labware
+ str first_usable: a location of the first usable tip/well on labware
+ list<str> empty_list: a list of locations on the labware that have empty tubes
df product_df: This information is used to figure out where to put chemicals
+ INDEX
+ str chemical_name: the name of the chemical
+ COLS
+ str labware: the requested labware you want to put it in
+ str container: the container you want to put it in
+ float max_vol: the maximum volume you will put in the container
bool simulate: whether a simulation is being run or not. False by default. changed true
temporarily when simulating
int buff_size: this is the size of the buffer between Armchair commands. It's size
corresponds to the number of commands you want to pile up in the socket buffer.
Really more for developers
PRIVATE ATTRS:
dict<str:ChemCacheEntry> _cached_reader_locs: chemical information from the robot
ChemCacheEntry is a named tuple with below attributes
The tuple has following structure:
str loc: the loc of the well on it's labware (translated to human if on pr)
int deck_pos: the position of the labware it's on
float vol: the volume in the container
float aspiratible_vol: the volume minus dead vol
CONSTANTS:
bidict<str:tuple<str,str>> PLATEREADER_INDEX_TRANSLATOR: used to translate from locs on
wellplate to locs on the opentrons object. Use a json viewer for more structural info
METHODS:
run_protocol(simulate, port) void: both args have good defaults. simulate can be used to
simulate on the plate reader and robot, but generally you want false to actually run
the protocol. port can be configured, but 50000 is default
run_simulation() int: runs a simulation on local machine. Tries plate reader, but
not necessary. returns an error code
close_connection() void: automatically called by run_protocol. used to terminate a
connection with eve
init_robot(simulate): used to initialize the robot. called automatically in run. simulate
is the same as used by the robot protocol
translate_wellmap() void: used to convert a wellmap.tsv from robot to wells locs
that correspond to platereader
'''
#this has two keys, 'deck_pos' and 'loc'. They map to the plate reader and the loc on that plate
#reader given a regular loc for a 96well plate.
#Please do not read this. paste it into a nice json viewer.
PLATEREADER_INDEX_TRANSLATOR = bidict({'A1': ('E1', 'platereader4'), 'A2': ('D1', 'platereader4'), 'A3': ('C1', 'platereader4'), 'A4': ('B1', 'platereader4'), 'A5': ('A1', 'platereader4'), 'A12': ('A1', 'platereader7'), 'A11': ('B1', 'platereader7'), 'A10': ('C1', 'platereader7'), 'A9': ('D1', 'platereader7'), 'A8': ('E1', 'platereader7'), 'A7': ('F1', 'platereader7'), 'A6': ('G1', 'platereader7'), 'B1': ('E2', 'platereader4'), 'B2': ('D2', 'platereader4'), 'B3': ('C2', 'platereader4'), 'B4': ('B2', 'platereader4'), 'B5': ('A2', 'platereader4'), 'B6': ('G2', 'platereader7'), 'B7': ('F2', 'platereader7'), 'B8': ('E2', 'platereader7'), 'B9': ('D2', 'platereader7'), 'B10': ('C2', 'platereader7'), 'B11': ('B2', 'platereader7'), 'B12': ('A2', 'platereader7'), 'C1': ('E3', 'platereader4'), 'C2': ('D3', 'platereader4'), 'C3': ('C3', 'platereader4'), 'C4': ('B3', 'platereader4'), 'C5': ('A3', 'platereader4'), 'C6': ('G3', 'platereader7'), 'C7': ('F3', 'platereader7'), 'C8': ('E3', 'platereader7'), 'C9': ('D3', 'platereader7'), 'C10': ('C3', 'platereader7'), 'C11': ('B3', 'platereader7'), 'C12': ('A3', 'platereader7'), 'D1': ('E4', 'platereader4'), 'D2': ('D4', 'platereader4'), 'D3': ('C4', 'platereader4'), 'D4': ('B4', 'platereader4'), 'D5': ('A4', 'platereader4'), 'D6': ('G4', 'platereader7'), 'D7': ('F4', 'platereader7'), 'D8': ('E4', 'platereader7'), 'D9': ('D4', 'platereader7'), 'D10': ('C4', 'platereader7'), 'D11': ('B4', 'platereader7'), 'D12': ('A4', 'platereader7'), 'E1': ('E5', 'platereader4'), 'E2': ('D5', 'platereader4'), 'E3': ('C5', 'platereader4'), 'E4': ('B5', 'platereader4'), 'E5': ('A5', 'platereader4'), 'E6': ('G5', 'platereader7'), 'E7': ('F5', 'platereader7'), 'E8': ('E5', 'platereader7'), 'E9': ('D5', 'platereader7'), 'E10': ('C5', 'platereader7'), 'E11': ('B5', 'platereader7'), 'E12': ('A5', 'platereader7'), 'F1': ('E6', 'platereader4'), 'F2': ('D6', 'platereader4'), 'F3': ('C6', 'platereader4'), 'F4': ('B6', 'platereader4'), 'F5': ('A6', 'platereader4'), 'F6': ('G6', 'platereader7'), 'F7': ('F6', 'platereader7'), 'F8': ('E6', 'platereader7'), 'F9': ('D6', 'platereader7'), 'F10': ('C6', 'platereader7'), 'F11': ('B6', 'platereader7'), 'F12': ('A6', 'platereader7'), 'G1': ('E7', 'platereader4'), 'G2': ('D7', 'platereader4'), 'G3': ('C7', 'platereader4'), 'G4': ('B7', 'platereader4'), 'G5': ('A7', 'platereader4'), 'G6': ('G7', 'platereader7'), 'G7': ('F7', 'platereader7'), 'G8': ('E7', 'platereader7'), 'G9': ('D7', 'platereader7'), 'G10': ('C7', 'platereader7'), 'G11': ('B7', 'platereader7'), 'G12': ('A7', 'platereader7'), 'H1': ('E8', 'platereader4'), 'H2': ('D8', 'platereader4'), 'H3': ('C8', 'platereader4'), 'H4': ('B8', 'platereader4'), 'H5': ('A8', 'platereader4'), 'H6': ('G8', 'platereader7'), 'H7': ('F8', 'platereader7'), 'H8': ('E8', 'platereader7'), 'H9': ('D8', 'platereader7'), 'H10': ('C8', 'platereader7'), 'H11': ('B8', 'platereader7'), 'H12': ('A8', 'platereader7')})
ChemCacheEntry = namedtuple('ChemCacheEntry',['loc','deck_pos','vol','aspirable_vol'])
DilutionParams = namedtuple('DilultionParams', ['cont','vol'])
def __init__(self, rxn_sheet_name, my_ip, server_ip, buff_size=4, use_cache=False, cache_path='Cache'):
'''
Note that init does not initialize the portal. This must be done explicitly or by calling
a run function that creates a portal. The portal is not passed to init because although
the code must not use more than one portal at a time, the portal may change over the
lifetime of the class
Note that pr cannot be initialized until you know if you're simulating or not, so it
is instantiated in run
'''
#set according to input
self.cache_path=cache_path
self._make_cache()
self.use_cache = use_cache
self.my_ip = my_ip
self.server_ip = server_ip
self.buff_size = 4
self.rxn_sheet_name = rxn_sheet_name
self.simulate = False #by default will be changed if a simulation is run
self._cached_reader_locs = {} #maps wellname to loc on platereader
#this will be gradually filled
self.robo_params = {}
#necessary helper params
self._check_cache_metadata(rxn_sheet_name)
credentials = self._init_credentials(rxn_sheet_name)
self.wks_key_pairs = self._get_wks_key_pairs(credentials, rxn_sheet_name)
self.name_key_wks = self._get_key_wks(credentials)
wks_key = self._get_wks_key(credentials, rxn_sheet_name)
rxn_spreadsheet = self._open_sheet(rxn_sheet_name, credentials)
header_data = self._download_sheet(rxn_spreadsheet,0)
self.header_data = header_data
input_data = self._download_sheet(rxn_spreadsheet,1)
deck_data = self._download_sheet(rxn_spreadsheet, 2)
self._init_robo_header_params(header_data)
self._make_out_dirs(header_data)
self.rxn_df = self._load_rxn_df(input_data) #products init here
self.tot_vols = self._get_tot_vols(input_data) #NOTE we're moving more and more info
#to the controller. It may make sense to build a class at some point
self._query_reagents(wks_key, credentials)
raw_reagent_df = self._download_reagent_data(wks_key, credentials)#will be replaced soon
#with a parsed reagent_df. This is exactly as is pulled from gsheets
empty_containers = self._get_empty_containers(raw_reagent_df)
self.robo_params['dry_containers'] = self._get_dry_containers(raw_reagent_df)
products_to_labware = self._get_products_to_labware(input_data)
self.robo_params['reagent_df'] = self._parse_raw_reagent_df(raw_reagent_df)
self.robo_params['instruments'] = self._get_instrument_dict(deck_data)
self.robo_params['labware_df'] = self._get_labware_df(deck_data, empty_containers)
self.robo_params['product_df'] = self._get_product_df(products_to_labware)
def _insert_tot_vol_transfer(self):
'''
inserts a row into self.rxn_df that transfers volume from WaterC1.0 to fill
the necessary products
Postconditions:
has inserted a row into the rxn_df to transfer WaterC1.0
If the reaction has already overflowed the total volume, will add negative volume
(which is impossible. The caller of this function must account for this.)
If no total vols were specified, no transfer step will be inserted.
'''
#if there are no total vols, don't insert the row, just return
if self.tot_vols:
end_vols = pd.Series(self.tot_vols)
start_vols = pd.Series([self._vol_calc(name)
for name in end_vols.index], index=end_vols.index)
del_vols = end_vols - start_vols
#begin building a dictionary for the row to insert
transfer_row_dict = {col:del_vols[col] if col in del_vols else np.nan
for col in self.rxn_df.columns}
#now have dict maps every col to '' except chemicals to add, which are mapped to float to add
transfer_row_dict.update(
{'op':'transfer',
'reagent':'Water',
'conc':1.0,
'chemical_name':'WaterC1.0',
'callbacks':''}
)
for chem_name in self._products:
if pd.isna(transfer_row_dict[chem_name]):
transfer_row_dict[chem_name] = 0.0
#convert the row to a dataframe
transfer_row_df = pd.DataFrame(transfer_row_dict, index=[-1], columns=self.rxn_df.columns)
self.rxn_df = pd.concat((transfer_row_df, self.rxn_df)) #add in column
self.rxn_df.index += 1 #update index to go 0-n instead of -1-n-1
def _get_tot_vols(self, input_data):
'''
params:
list<obj> input_data: as parsed from the google sheets
returns:
dict<str:float>: maps product names to their appropriate total volumes if specified
Preconditions:
self._products has been initialized
'''
product_start_i = input_data[0].index('reagent (must be uniquely named)')+1
product_tot_vols = input_data[3][product_start_i:]
return {product:float(tot_vol) for product, tot_vol in zip(self._products, product_tot_vols) if tot_vol}
def _check_cache_metadata(self, rxn_sheet_name):
'''
Checks a file, .metadata.txt with the cache path.
Postconditions:
If use_cache is true:
reads .metadata.txt
asserts that the rxn_sheet_name matches the name in sheet
prints the timestamp that the cache was last written
If use_cache is false:
writes .metadata.txt with the sheet name and a timestamp
'''
if self.use_cache:
assert (os.path.exists(os.path.join(self.cache_path, '.metadata.json'))), \
"tried to read metadata in cache, but file does not exist"
with open(os.path.join(self.cache_path, '.metadata.json'), 'r') as file:
metadata = json.load(file)
assert (metadata['name'] == rxn_sheet_name), "desired sheet was, '{}', but cached data is for '{}'".format(rxn_sheet_name, metadata['name'])
print("<<controller>> using cached data for '{}', last updated '{}'".format(
metadata['name'],metadata['timestamp']))
else:
metadata = {'timestamp':datetime.datetime.now().strftime('%d-%b-%Y %H:%M:%S:%f'),
'name':rxn_sheet_name}
with open(os.path.join(self.cache_path, '.metadata.json'), 'w') as file:
json.dump(metadata, file)
def _get_key_wks(self, credentials):
gc = gspread.authorize(credentials)
name_key_wks = gc.open_by_url('https://docs.google.com/spreadsheets/d/1m2Uzk8z-qn2jJ2U1NHkeN7CJ8TQpK3R0Ai19zlAB1Ew/edit#gid=0').get_worksheet(0)
return name_key_wks
def _get_wks_key_pairs(self, credentials, rxn_sheet_name):
'''
open and search a sheet that tells you which sheet is associated with the reaction
Or read from cache if cache is enabled
params:
ServiceAccountCredentials credentials: to access the sheets
str rxn_sheet_name: the name of sheet
returns:
list<list<str>> name_key_pairs: the data in the wks_key spreadsheet
Postconditions:
If cached data could not be found, will dump spreadsheet data to name_key_pairs.pkl
in cache path
'''
if self.use_cache:
#load cache
with open(os.path.join(self.cache_path, 'name_key_pairs.pkl'), 'rb') as name_key_pairs_cache:
name_key_pairs = dill.load(name_key_pairs_cache)
else:
#pull down data
gc = gspread.authorize(credentials)
name_key_wks = gc.open_by_url('https://docs.google.com/spreadsheets/d/1m2Uzk8z-qn2jJ2U1NHkeN7CJ8TQpK3R0Ai19zlAB1Ew/edit#gid=0').get_worksheet(0)
name_key_pairs = name_key_wks.get_all_values() #list<list<str name, str key>>
#Note the key is a unique identifier that can be used to access the sheet
#d2g uses it to access the worksheet
#dump to cache
with open(os.path.join(self.cache_path, 'name_key_pairs.pkl'), 'wb') as name_key_pairs_cache:
dill.dump(name_key_pairs, name_key_pairs_cache)
return name_key_pairs
def _init_pr(self, simulate, no_pr):
'''
params:
bool simulate: True indicates that the platereader should be launched in simulation
mode
bool no_pr: True indicates that even if platereader can be run in simulation mode,
it should not be. This should be run only for the marginal speedup that can be
gained by not using the platereader for certain tests
Postconditions:
self.pr is initialized with either a connection to the SPECTROstar if possible and
no_pr is false, otherwise, a Dummy with no connection, but the same interface
is supplied
'''
if no_pr:
self.pr = DummyReader(os.path.join(self.out_path, 'pr_data'))
else:
try:
self.pr = PlateReader(os.path.join(self.out_path, 'pr_data'), self.header_data, self.eve_files_path, simulate)
except:
print('<<controller>> failed to initialize platereader, initializing dummy reader')
self.pr = DummyReader(os.path.join(self.out_path, 'pr_data'))
def _download_sheet(self, rxn_spreadsheet, index):
'''
pulls down the sheet at the index
params:
gspread.Spreadsheet rxn_spreadsheet: the sheet with all the reactions
int index: the index of the sheet to pull down
returns:
list<list<str>> data: the input template sheet pulled down into a list
'''
if self.use_cache:
with open(os.path.join(self.cache_path,'wks_data{}.pkl'.format(index)), 'rb') as rxn_wks_data_cache:
data = dill.load(rxn_wks_data_cache)
else:
rxn_wks = rxn_spreadsheet.get_worksheet(index)
data = rxn_wks.get_all_values()
with open(os.path.join(self.cache_path,'wks_data{}.pkl'.format(index)),'wb') as rxn_wks_data_cache:
dill.dump(data, rxn_wks_data_cache)
return data
def _make_out_dirs(self, header_data):
'''
params:
list<list<str>> header_data: data from the header
Postconditions:
All paths used by this class have been initialized if they were not before
They are not overwritten if they already exist. paths variables of this class
have also been initialized
'''
out_path = 'Ideally this would be a gdrive path, but for now everything is local'
if not os.path.exists(out_path):
#not on the laptop
out_path = '/mnt/c/Users/science_356_lab/Robot_Files/Protocol_Outputs'
#get the root folder
header_dict = {row[0]:row[1] for row in header_data[1:]}
data_dir = header_dict['data_dir']
self.out_path = os.path.join(out_path, data_dir)
#if the folder doesn't exist yet, make it
self.eve_files_path = os.path.join(self.out_path, 'Eve_Files')
self.debug_path = os.path.join(self.out_path, 'Debug')
self.plot_path = os.path.join(self.out_path, 'Plots')
paths = [self.out_path, self.eve_files_path, self.debug_path, self.plot_path]
for path in paths:
if not os.path.exists(path):
os.makedirs(path)
def _make_cache(self):
if not os.path.exists(self.cache_path):
os.makedirs(self.cache_path)
def _init_credentials(self, rxn_sheet_name):
'''
this function reads a local json file to get the credentials needed to access other funcs
params:
str rxn_sheet_name: the name of the reaction sheet to run
returns:
ServiceAccountCredentials: the credentials to access that sheet
'''
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
#get login credentials from local file. Your json file here
path = 'Credentials/hendricks-lab-jupyter-sheets-5363dda1a7e0.json'
credentials = ServiceAccountCredentials.from_json_keyfile_name(path, scope)
return credentials
def _get_wks_key(self, credentials, rxn_sheet_name):
'''
open and search a sheet that tells you which sheet is associated with the reaction
params:
ServiceAccountCredentials credentials: to access the sheets
str rxn_sheet_name: the name of sheet
returns:
if self.use_cache:
str wks_key: the key associated with the sheet. It functions similar to a url
else:
None: this is ok because the wks key will not be used if caching
'''
name_key_pairs = self.wks_key_pairs
try:
i=0
wks_key = None
while not wks_key and i <= len(name_key_pairs):
row = name_key_pairs[i]
if row[0] == rxn_sheet_name:
wks_key = row[1]
i+=1
except IndexError:
raise Exception('Spreadsheet Name/Key pair was not found. Check the dict spreadsheet \
and make sure the spreadsheet name is spelled exactly the same as the reaction \
spreadsheet.')
return wks_key
def _open_sheet(self, rxn_sheet_name, credentials):
'''
open the google sheet
params:
str rxn_sheet_name: the title of the sheet to be opened
oauth2client.ServiceAccountCredentials credentials: credentials read from a local json
returns:
if self.use_cache:
gspread.Spreadsheet the spreadsheet (probably of all the reactions)
else:
None: this is fine because the wks should never be used if cache is true
'''
gc = gspread.authorize(credentials)
try:
if self.use_cache:
wks = None
else:
wks = gc.open(rxn_sheet_name)
except:
raise Exception('Spreadsheet Not Found: Make sure the spreadsheet name is spelled correctly and that it is shared with the robot ')
return wks
def _init_robo_header_params(self, header_data):
'''
loads the header data into self.robo_params
params:
list<list<str> header_data: as in gsheets
Postconditions:
simulate, using_temp_ctrl, and temp have been initialized according to values in
excel
'''
header_dict = {row[0]:row[1] for row in header_data[1:]}
self.robo_params['using_temp_ctrl'] = header_dict['using_temp_ctrl'] == 'yes'
self.robo_params['temp'] = float(header_dict['temp']) if self.robo_params['using_temp_ctrl'] else None
if self.robo_params['temp'] != None:
assert( self.robo_params['temp'] >= 4 and self.robo_params['temp'] <= 95), "invalid temperature"
self.dilution_params = self.DilutionParams(header_dict['dilution_cont'],
float(header_dict['dilution_vol']))
def _plot_setup_overlay(self,title):
'''
Sets up a figure for an overlay plot
params:
str title: the title of the reaction
'''
#formats the figure nicely
plt.figure(num=None, figsize=(4, 4),dpi=300, facecolor='w', edgecolor='k')
plt.legend(loc="upper right",frameon = False, prop={"size":7},labelspacing = 0.5)
plt.rc('axes', linewidth = 2)
plt.xlabel('Wavelength (nm)',fontsize = 16)
plt.ylabel('Absorbance (a.u.)', fontsize = 16)
plt.tick_params(axis = "both", width = 2)
plt.tick_params(axis = "both", width = 2)
plt.xticks([300,400,500,600,700,800,900,1000])
plt.yticks([i/10 for i in range(0,11,1)])
plt.axis([300, 1000, 0.0 , 1.0])
plt.xticks(fontsize = 10)
plt.yticks(fontsize = 10)
plt.title(str(title), fontsize = 16, pad = 20)
def plot_LAM_overlay(self,df,wells,filename=None):
'''
plots overlayed spectra of wells in the order that they are specified
params:
df df: dataframe with columns = chem_names, and values of each column is a series
of scans in 701 intervals.
str filename: the title of the plot, and the file
list<str> wells: an ordered list of all of the chem_names you want to plot.
Postconditions:
plot has been written with name "overlay.png" to the plotting dir. or
{filename}.png if filename was supplied
'''
if not filename:
filename = "overlay"
x_vals = list(range(300,1001))
#overlays only things you specify
y = []
#df = df[df_reorder]
#headers = [well_key[k] for k in df.columns]
#legend_colors = []
for chem_name in wells:
y.append(df[chem_name].iloc[-701:].to_list())
self._plot_setup_overlay(filename)
colors = list(cm.rainbow(np.linspace(0, 1,len(y))))
for i in range(len(y)):
plt.plot(x_vals,y[i],color = tuple(colors[i]))
patches = [mpatches.Patch(color=color, label=label) for label, color in zip(wells, colors)]
plt.legend(patches, wells, loc='upper right', frameon=False,prop={'size':3})
legend = pd.DataFrame({'Color':patches,'Labels': wells})
plt.savefig(os.path.join(self.plot_path, '{}.png'.format(filename)))
plt.close()
# below until ~end is all not used yet needs to be worked up
def plot_kin_subplots(self,df,n_cycles,wells,filename=None):
'''
TODO this function doesn't save properly, but it does show. Don't know issue
plots kinetics for each well in the order given by wells.
params:
df df: the scan data
int n_cycles: the number of cycles for the scan data
list<str> wells: the wells you want to plot in order
Postconditions:
plot has been written with name "{filename}_overlay.png" to the plotting dir.
If filename is not supplied, name is kin_subplots
'''
if not filename:
filename=kin_subplots
x_vals = list(range(300,1001))
colors = list(cm.rainbow(np.linspace(0, 1, n_cycles)))
fig, axes = plt.subplots(8, 12, dpi=300, figsize=(50, 50),subplot_kw=dict(box_aspect=1,sharex = True,sharey = True))
for idx, (chem_name, ax) in enumerate(zip(wells, axes.flatten())):
ax.set_title(chem_name)
self._plot_kin(ax, df, n_cycles, chem_name)
plt.subplots_adjust(wspace=0.3, hspace= -0.1)
ax.tick_params(
which='both',
bottom='off',
left='off',
right='off',
top='off'
)
ax.set_xlim((300,1000))
ax.set_ylim((0,1.0))
ax.set_xlabel("Wavlength (nm)")
ax.set_ylabel("Absorbance (A.U.)")
ax.set_xticks(range(301, 1100, 100))
#ax.set_aspect(adjustable='box')
#ax.set_yticks(range(0,1))
else:
[ax.set_visible(False) for ax in axes.flatten()[idx+1:]]
plt.savefig(os.path.join(self.plot_path, '{}.png'.format(filename)))
plt.close()
def _plot_kin(self, ax, df, n_cycles, chem_name):
'''
helper method for kinetics plotting methods
params:
plt.axes ax: or anything with a plot func. the place you want ot plot
df df: the scan data
int n_cycles: the number of cycles in per well scanned
str chem_name: the name of the chemical to be plotted
Postconditions:
a kinetics plot of the well has been plotted on ax
'''
x_vals = list(range(300,1001))
colors = list(cm.rainbow(np.linspace(0, 1, n_cycles)))
kin = 0
col = df[chem_name]
for kin in range(n_cycles):
ax.plot(x_vals, df[chem_name].iloc[kin*701:(kin+1)*701],color=tuple(colors[kin]))
def plot_single_kin(self, df, n_cycles, chem_name, filename=None):
'''
plots one kinetics trace.
params:
df df: the scan data
int n_cycles: the number of cycles in per well scanned
str chem_name: the name of the chemical to be plotted
str filename: the name of the file to write
Postconditions:
A kinetics trace of the well has been written to the Plots directory.
under the name filename. If filename was None, the filename will be
{chem_name}_kinetics.png
'''
if not filename:
filename = '{}_kinetics'.format(chem_name)
self._plot_setup_overlay('Kinetics {}: '.format(chem_name))
self._plot_kin(plt,df, n_cycles, chem_name)
plt.savefig(os.path.join(self.plot_path, '{}.png'.format(filename)))
plt.close()
def _get_empty_containers(self, raw_reagent_df):
'''
only one line, but there's a lot going on. extracts the empty lines from the raw_reagent_df
params:
df raw_reagent_df: as in reagent_info of excel
returns:
df empty_containers:
+ INDEX:
+ int deck_pos: the position on the deck
+ COLS:
+ str loc: location on the labware
'''
return raw_reagent_df.loc['empty' == raw_reagent_df.index].set_index('deck_pos').drop(columns=['conc', 'mass'])
def _get_dry_containers(self, raw_reagent_df):
'''
params:
df raw_reagent_df: the reagent dataframe as recieved from excel
returns:
df dry_containers:
note: cannot be sent over pickle as is because the index has duplicates.
solution is to reset the index for shipping
+ str index: the chemical name
+ float conc: the concentration once built
+ str loc: the location on the labware
+ int deck_pos: position on the deck
+ float required_vol: the volume of water needed to turn this into a reagent
'''
#other rows will be empty str unless dry
dry_containers = raw_reagent_df.loc[raw_reagent_df['molar_mass'].astype(bool)].astype(
{'deck_pos':int,'mass':float,'molar_mass':float})
dry_containers.drop(columns='conc',inplace=True)
dry_containers.reset_index(inplace=True)
dry_containers['index'] = dry_containers['index'].apply(lambda x: x.replace(' ','_'))
return dry_containers
def _parse_raw_reagent_df(self, raw_reagent_df):
'''
parses the raw_reagent_df into final form for reagent_df
params:
df raw_reagent_df: as in excel
returns:
df reagent_df: empties ignored, columns with correct types
'''
# incase not on axis
reagent_df = raw_reagent_df.drop(['empty'], errors='ignore')
reagent_df = reagent_df.loc[~reagent_df['molar_mass'].astype(bool)] #drop dry
reagent_df.drop(columns='molar_mass',inplace=True)
try:
reagent_df = reagent_df.astype({'conc':float,'deck_pos':int,'mass':float})
except ValueError as e:
raise ValueError("Your reagent info could not be parsed. Likely you left out a required field, or you did not specify a concentration on the input sheet")
return reagent_df
def _get_instrument_dict(self, deck_data):
'''
uses data from deck sheet to return the instrument params
Preconditions:
The second sheet in the worksheet must be initialized with where you've placed reagents
and the first thing not being used
params:
list<list<str>>deck_data: the deck data as in excel
returns:
Dict<str:str>: key is 'left' or 'right' for the slots. val is the name of instrument
'''
#the format google fetches this in is funky, so we convert it into a nice df
#make instruments
instruments = {}
instruments['left'] = deck_data[13][0]
instruments['right'] = deck_data[13][1]
return instruments
def _get_labware_df(self, deck_data, empty_containers):
'''
uses data from deck sheet to get information about labware locations, first tip, etc.
Preconditions:
The second sheet in the worksheet must be initialized with where you've placed reagents
and the first thing not being used
params:
list<list<str>>deck_data: the deck data as in excel
df empty_containers: this is used for tubes. it holds the containers that can be used
+ int index: deck_pos
+ str position: the position of the empty container on the labware
returns:
df:
+ str name: the common name of the labware
+ str first_usable: the first tip/well to use
+ int deck_pos: the position on the deck of this labware
+ str empty_list: the available slots for empty tubes format 'A1,B2,...' No specific
order
'''
labware_dict = {'name':[], 'first_usable':[],'deck_pos':[]}
for row_i in range(0,10,3):
for col_i in range(3):
labware_dict['name'].append(deck_data[row_i+1][col_i])
labware_dict['first_usable'].append(deck_data[row_i+2][col_i])
labware_dict['deck_pos'].append(deck_data[row_i][col_i])
labware_df = pd.DataFrame(labware_dict)
#platereader positions need to be translated, and they shouldn't be put in both
#slots
platereader_rows = labware_df.loc[(labware_df['name'] == 'platereader7') | \
(labware_df['name'] == 'platereader4')]
usable_rows = platereader_rows.loc[platereader_rows['first_usable'].astype(bool), 'first_usable']
assert (not usable_rows.empty), "please specify a first tip/well for the platereader"
assert (usable_rows.shape[0] == 1), "too many first wells specified for platereader"
platereader_input_first_usable = usable_rows.iloc[0]
platereader_name = self.PLATEREADER_INDEX_TRANSLATOR[platereader_input_first_usable][1]
platereader_first_usable = self.PLATEREADER_INDEX_TRANSLATOR[platereader_input_first_usable][0]
if platereader_name == 'platereader7':
platereader4_first_usable = 'F8' #anything larger than what is on plate
platereader7_first_usable = platereader_first_usable
else:
platereader4_first_usable = platereader_first_usable
platereader7_first_usable = 'G1'
labware_df.loc[labware_df['name']=='platereader4','first_usable'] = platereader4_first_usable
labware_df.loc[labware_df['name']=='platereader7','first_usable'] = platereader7_first_usable
labware_df = labware_df.loc[labware_df['name'] != ''] #remove empty slots
labware_df.set_index('deck_pos', inplace=True)
#add empty containers in list form
#there's some fancy formating here that gets you a series with deck as the index and
#comma seperated loc strings eg 'A1,A3,B2' as values
grouped = empty_containers['loc'].apply(lambda pos: pos+',').groupby('deck_pos')
labware_locs = grouped.sum().apply(lambda pos: pos[:len(pos)-1])
labware_df = labware_df.join(labware_locs, how='left')
labware_df['loc'] = labware_df['loc'].fillna('')
labware_df.rename(columns={'loc':'empty_list'},inplace=True)
labware_df.reset_index(inplace=True)
labware_df['deck_pos'] = pd.to_numeric(labware_df['deck_pos'])
return labware_df
def save(self):
self.portal.send_pack('save')
#server will initiate file transfer
files = self.portal.recv_ftp()
for filename, file_bytes in files:
with open(os.path.join(self.eve_files_path,filename), 'wb') as write_file:
write_file.write(file_bytes)
self.translate_wellmap()
def delete_wks_key(self):
'''
deletes key from the reaction key pair google sheet to prevent accidental
runs in the future
Postconditions:
if the key pair still exists, the key is deleted
'''
wks = self.name_key_wks
cell_list = wks.findall(str(self.rxn_sheet_name))
for cell in cell_list :
if cell:
wks.batch_clear(['B'+str(cell.row)])
def close_connection(self):
'''
runs through closing procedure with robot
Postconditions:
Log files have been written to self.out_path
Connection has been closed
'''
print('<<controller>> initializing breakdown')
self.save()
#server should now send a close command
self.portal.send_pack('close')
print('<<controller>> shutting down')
self.portal.close()
self.delete_wks_key()
def translate_wellmap(self):
'''
Preconditions:
there exists a file wellmap.tsv in self.eve_files, and that file has eve level
machine labels
Postconditions:
translated_wellmap.tsv has been created. translated is a copy of wellmap with
it's locations translated to human locs, but the labware pos remains the same
'''
df = pd.read_csv(os.path.join(self.eve_files_path,'wellmap.tsv'), sep='\t')
df['loc'] = df.apply(lambda r: r['loc'] if (r['deck_pos'] not in [4,7]) else self.PLATEREADER_INDEX_TRANSLATOR.inv[(r['loc'],'platereader'+str(r['deck_pos']))],axis=1)
df.to_csv(os.path.join(self.eve_files_path,'translated_wellmap.tsv'),sep='\t',index=False)
def init_robot(self, simulate):
'''
this does the dirty work of sending accumulated params over network to the robot
params:
bool simulate: whether the robot should run a simulation
Postconditions:
robot has been initialized with necessary params
'''
#send robot data to initialize itself
#note reagent_df can have index with same name so index is reset for transfer
cid = self.portal.send_pack('init', simulate,
self.robo_params['using_temp_ctrl'], self.robo_params['temp'],
self.robo_params['labware_df'].to_dict(), self.robo_params['instruments'],
self.robo_params['reagent_df'].reset_index().to_dict(), self.my_ip,
self.robo_params['dry_containers'].to_dict())
@abstractmethod
def run_simulation(self):
pass
@abstractmethod
def run_protocol(self,simulate):
pass
def _error_handler(self, e):
'''
When an error is thrown from a public method, it will be sent here and handled
'''
#handle the error
if self.portal.state == 1:
#Armchair recieved an error packet, so eve had a problem
try:
eve_error = self.portal.error_payload[0]
print('''<<controller>>----------------Eve Error----------------
Eve threw error '{}'
Attempting to save state on exit
'''.format(eve_error))
self.portal.reset_error()
self.close_connection()
self.pr.shutdown()
finally:
raise eve_error
else:
try:
print('''<<controller>> ----------------Controller Error----------------
<<controller>> Attempting to save state on exit''')
self.close_connection()
self.pr.shutdown()
finally:
time.sleep(.5) #this is just for printing format. Not critical
raise e
def _load_rxn_df(self, input_data):
'''
reaches out to google sheets and loads the reaction protocol into a df and formats the df
adds a chemical name (primary key for lots of things. e.g. robot dictionaries)
renames some columns to code friendly as opposed to human friendly names
params:
list<list<str>> input_data: as recieved in excel
returns:
pd.DataFrame: the information in the rxn_spreadsheet w range index. spreadsheet cols
Postconditions:
self._products has been initialized to hold the names of all the products
'''
cols = make_unique(pd.Series(input_data[0]))
rxn_df = pd.DataFrame(input_data[4:], columns=cols)
#rename some of the clunkier columns
rxn_df.rename({'operation':'op', 'dilution concentration':'dilution_conc','max number of scans':'max_num_scans','concentration (mM)':'conc', 'reagent (must be uniquely named)':'reagent', 'plot protocol':'plot_protocol', 'pause time (s)':'pause_time', 'comments (e.g. new bottle)':'comments','scan protocol':'scan_protocol', 'scan filename (no extension)':'scan_filename', 'plot filename (no extension)':'plot_filename'}, axis=1, inplace=True)
rxn_df.drop(columns=['comments'], inplace=True)#comments are for humans
rxn_df.replace('', np.nan,inplace=True)
rxn_df[['pause_time','dilution_conc','conc','max_num_scans']] = rxn_df[['pause_time','dilution_conc','conc','max_num_scans']].astype(float)
rxn_df['reagent'] = rxn_df['reagent'].apply(lambda s: s if pd.isna(s) else s.replace(' ', '_'))
rxn_df['chemical_name'] = rxn_df[['conc', 'reagent']].apply(self._get_chemical_name,axis=1)
self._rename_products(rxn_df)
#go back for some non numeric columns
rxn_df['callbacks'].fillna('',inplace=True)
self._products = rxn_df.loc[:,'reagent':'chemical_name'].drop(columns=['chemical_name', 'reagent']).columns
#make the reagent columns floats
rxn_df.loc[:,self._products] = rxn_df[self._products].astype(float)
rxn_df.loc[:,self._products] = rxn_df[self._products].fillna(0)
return rxn_df
@abstractmethod
def _rename_products(self, rxn_df):
'''
Different for Protocol Executor vs auto
renames dilutions acording to the reagent that created them
and renames rxns to have a concentration
Preconditions:
dilution cols are named dilution_1/2 etc
callback is the last column in the dataframe
rxn_df is not expected to be initialized yet. This is a helper for the initialization
params:
df rxn_df: the dataframe with all the reactions
Postconditions:
the df has had it's dilution columns renamed to a chemical name
'''
pass
def _get_products_to_labware(self, input_data):
'''
create a dictionary mapping products to their requested labware/containers
Preconditions:
self.rxn_df must have been initialized already
params:
list<list<str>> input data: the data from the excel sheet
returns:
Dict<str,list<str,str>>: effectively the 2nd and 3rd rows in excel. Gives
labware and container preferences for products
'''
cols = self.rxn_df.columns.to_list()
product_start_i = cols.index('reagent')+1
requested_containers = input_data[2][product_start_i+1:]
requested_labware = input_data[1][product_start_i+1:]#add one to account for the first col (labware)
#in df this is an index, so size cols is one less
products_to_labware = {product:[labware,container] for product, labware, container in zip(self._products, requested_labware,requested_containers)}
return products_to_labware
def _query_reagents(self, spreadsheet_key, credentials):
'''
query the user with a reagent sheet asking for more details on locations of reagents, mass
etc
Preconditions:
self.rxn_df should be initialized
params:
str spreadsheet_key: this is the a unique id for google sheet used for i/o with sheets
ServiceAccount Credentials credentials: to access sheets
PostConditions:
reagent_sheet has been constructed
'''
#you might make a reaction you don't want to specify at the start
reagent_df = self.rxn_df.loc[self.rxn_df['op'] != 'make', ['reagent', 'conc']]
reagent_df = reagent_df.groupby(['reagent','conc'], dropna=False).first().reset_index()
reagent_df.dropna(how='all',inplace=True)
rows_to_drop = []
duplicates = reagent_df['reagent'].duplicated(keep=False)
for i, reagent, conc in reagent_df.itertuples():
if duplicates[i] and pd.isna(conc):
rows_to_drop.append(i)
reagent_df.drop(index=rows_to_drop, inplace=True)
reagent_df.set_index('reagent',inplace=True)
reagent_df.fillna('',inplace=True)
#add water if necessary
needs_water = self.rxn_df['op'].apply(lambda x: x in ['make', 'dilution']).any()
if needs_water:
if 'Water' not in reagent_df.index:
reagent_df = reagent_df.append(pd.Series({'conc':1.0}, name='Water'))
else:
reagent_df.loc['Water','conc'] = 1.0
#start dropping products
rxn_names = self._products.copy() #going to drop template, hence copy
rxn_names = rxn_names.drop('Template', errors='ignore') #Template will throw error
#we now need to split the rxn_names into reagent names and concs.
#There may be duplicate reagents, so we will make a dictionary with list values of
#concs
rxn_name_dict = {}
for name in rxn_names:
reagent = self._get_reagent(name)
conc = self._get_conc(name)
if reagent in rxn_name_dict:
#already exists, append to list
rxn_name_dict[reagent].append(conc)
else:
#doesn't exist, create list
rxn_name_dict[reagent] = [conc]
rxn_names = pd.Series(rxn_name_dict, name='conc',dtype=object)
#rxn_names is now a series of concentrations with reagents as keys
reagent_df = reagent_df.join(rxn_names, how='left', rsuffix='2')