move execute_insert to top module nesting so pickle does not err by kenben · Pull Request #87 · dssg/collate

kenben · 2017-05-15T20:46:58Z

I got a pickle.PicklingError: Can't pickle <function Aggregation.execute_insert at [...]>; moving the execute_insert out of the Class makes it OK to pickle.

* Support distinct clauses in aggregates * flake8 * fix parentheses add tests * split_distinct should return tuple * fix tests and pep8 * single distinct

This drastically can simplify the writing of categorical comparisons: ``categorical(col, op, choices, include_null=True, maxlen=32)`` Args: col: the column name (or equivalent SQL expression) op: the SQL operation (e.g., '=' or '~' or 'LIKE') choices: A list or dictionary of values. When a dictionary is passed, the keys are a short name for the value. include_null: Should an extra `{col} is NULL` be added? (default True) maxlen: The maximum length of aggregate quantity names (default 32). Names longer than this will be truncated. Returns: a dictionary of aggregate quantities to be passed to Aggregate() A simple helper method to easily create many categorical columns from one source column by comparing it against many values. It effectively creates many quantities of the form "({col} {op} '{elt}')::INT" for elt in choices. The type of the comparison is converted to an integer so it can easily be used with 'sum' (for total count) and 'avg' (for relative fraction) aggregate functions. By default, the aggregates are simply named "{col}_{op}_{choice}", but that can easily get long and exceed the maximum column name length. If any name ends up longer than ``maxlen`` characters (32 by default), then each aggregate name gets truncated with a sequential number appended to ensure that they remain identifiable and unique (but note that ordering is not preserved). Use it like: ```py from collate import collate from collate.helpers import categorical collate.Aggregate(categorical('food', '=', ['hamburger','hotdog','sock']), ['sum','avg']) ```

Allow using None values to specify include_null within Categorical

Add multiple comparison Aggregate subclasses

moved SpacetimeAggregation into its own spacetime module and refactored the where filtering to a method in preparation for fixing join table and #42.

* spacetime join table * join_table arg to execute * python3 dict compatible

* Update sqlalchemy from 1.1.6 to 1.1.7 * Update sqlalchemy from 1.1.6 to 1.1.7

* Update sqlalchemy from 1.1.7 to 1.1.8 * Update sqlalchemy from 1.1.7 to 1.1.8 * Update sphinx from 1.5.3 to 1.5.4

include order in aggregate name and test it

use filter instead of case when

* Add support for restricting the "beginning of time" Adds a new keyword parameter for SpacetimeAggregations that enables restricting the rows included in their calculations based upon an absolute minimum date. Adds actual behavior tests for SpacetimeAggregation with testing.postgresql. This takes a SQL connection to allow validation against a SQL server. By default, `Aggregation.execute()` will call the validate method. SpacetimeAggregations now raise an error in the case where a date/interval combination happens to cross before the beginning of time (so long as the interval is not all). If this proves to be annoying, we can perhaps change it to be a warning or even add an optional override to explicitly allow this to occur. But I think it behooves us to start conservatively.

Scheduled biweekly dependency update for week 16

Allow overriding of choice quoting [Resolves #81]

This is a simple workaround; make non-lazy. Should fix #82

Don't modify dict during iteration when shortening keys

mbauman · 2017-05-15T20:49:14Z

collate/collate.py

            insert_list = [insert for insert in inserts[group]]

-            out = Parallel(n_jobs=n_jobs, verbose=51)(delayed(Aggregation.execute_insert)(conn_func, insert)
+            import pdb;pdb.set_trace()


mbauman · 2017-05-15T20:49:31Z

collate/collate.py

 from .sql import make_sql_clause, to_sql_name, CreateTableAs, InsertFromSelect


+def execute_insert(get_engine, insert):


maybe place this into the sql.py file?

What's the destiny for ka/parallel? We need to refactor things if we want to merge it into master eventually anyway.

Moved this.

kenben · 2017-05-16T17:49:16Z

I've merged master into this branch, and changed police-eis/pbp_additions to work with this branch.

potash and others added 30 commits November 17, 2016 18:45

initial operator overloading

db2f8ae

syntax, when

3f35a15

real test

b6cbaaa

truediv

b25ab25

consistency fix: make sure quantites is a tuple-valued dict

a5463a4

Fix whitespace issues

837329d

Update coverage from 4.2 to 4.3.1 (#46)

f528951

Update pytest from 3.0.4 to 3.0.5 (#44)

aef832f

Update csvkit from 0.9.1 to 1.0.0 (#43)

4584d39

Update csvkit from 1.0.0 to 1.0.1 (#48)

3a61e1c

Ep/distinct (#41)

e21034a

* Support distinct clauses in aggregates * flake8 * fix parentheses add tests * split_distinct should return tuple * fix tests and pep8 * single distinct

flake8

51d8d2f

Rename categorical to multicompare; create simple MultiCompare subclass

51f1134

support lists of tuples in quantities

e3f52a5

flake8

45029e0

Rename MultiCompare to Compare; introduce Categorical

75da2fa

Change default include_null behavior to False

f7b0342

Allow using None values to specify include_null within Categorical

Allow passing include_null shortname as a truthy value

b812dfa

Merge pull request #38 from dssg/mb/helpers

ded6fb6

Add multiple comparison Aggregate subclasses

Add non-dev requirements.txt (#32)

7ef0e61

Ep/refactor spacetime (#49)

3389d0a

moved SpacetimeAggregation into its own spacetime module and refactored the where filtering to a method in preparation for fixing join table and #42.

Ep/join table (#50)

ae5c8dd

* spacetime join table * join_table arg to execute * python3 dict compatible

Update to v0.2.0

4cc018e

get statements before exec

5a6033a

merge master

0ab0cb1

arithmetic

c843215

operator_str

e210332

docstring

5f9e2ac

make interval to accessible to spacetime aggregates

f6ae852

pyup-bot and others added 23 commits March 14, 2017 16:29

Update cryptography from 1.7.2 to 1.8.1 (#70)

2c60460

Update pytest from 3.0.6 to 3.0.7 (#73)

48def77

Bundle pyupdates, please

b2584fe

Update sqlalchemy to 1.1.7 (#76)

e18d8af

* Update sqlalchemy from 1.1.6 to 1.1.7 * Update sqlalchemy from 1.1.6 to 1.1.7

Scheduled biweekly dependency update for week 14 (#77)

fd6eb13

* Update sqlalchemy from 1.1.7 to 1.1.8 * Update sqlalchemy from 1.1.7 to 1.1.8 * Update sphinx from 1.5.3 to 1.5.4

use filter instead of case when

aee37a8

unused format kwarg

c006e22

add mode to tests

a59ac30

include order in aggregate name and test it

d841725

Whitespace

67f4db2

Merge pull request #79 from dssg/ep/order_name_bug

939bcde

include order in aggregate name and test it

Merge pull request #78 from dssg/ep/filter

c59866e

use filter instead of case when

Update sqlalchemy from 1.1.8 to 1.1.9

82bb17f

Update sqlalchemy from 1.1.8 to 1.1.9

f92e98b

Update tox from 2.6.0 to 2.7.0

bc3577e

Merge pull request #80 from dssg/pyup-scheduled-update-04-17-2017

4acdeaa

Scheduled biweekly dependency update for week 16

Allow overriding of choice quoting [Resolves #81]

3f9429f

Untangle quoting logic, rename quoting argument

a1d22ce

Merge pull request #83 from dssg/choice_quoting

36a93e2

Allow overriding of choice quoting [Resolves #81]

Don't modify dict during iteration when shortening keys

9682010

This is a simple workaround; make non-lazy. Should fix #82

Merge pull request #84 from dssg/mb/82

2d97d5a

Don't modify dict during iteration when shortening keys

move execute_insert to top module nesting so pickle does not err

c5d0791

mbauman suggested changes May 15, 2017

View reviewed changes

OpsWorks user bkuester added 4 commits May 15, 2017 20:51

remove that pdb

cf05dbd

move execute_insert to sql

470ae97

merge master, solve conflict

347da7e

re-add SpacetimeSubQueryAggregation

661d32c

kenben requested a review from k1aus May 16, 2017 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move execute_insert to top module nesting so pickle does not err#87

move execute_insert to top module nesting so pickle does not err#87
kenben wants to merge 73 commits intoka/parallelfrom
ka/parallel_pickle_error

kenben commented May 15, 2017

Uh oh!

mbauman May 15, 2017

Uh oh!

kenben May 15, 2017

Uh oh!

mbauman May 15, 2017

Uh oh!

kenben May 15, 2017 •

edited

Loading

Uh oh!

kenben May 16, 2017

Uh oh!

kenben commented May 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		from .sql import make_sql_clause, to_sql_name, CreateTableAs, InsertFromSelect


		def execute_insert(get_engine, insert):

Conversation

kenben commented May 15, 2017

Uh oh!

mbauman May 15, 2017

Choose a reason for hiding this comment

Uh oh!

kenben May 15, 2017

Choose a reason for hiding this comment

Uh oh!

mbauman May 15, 2017

Choose a reason for hiding this comment

Uh oh!

kenben May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kenben May 16, 2017

Choose a reason for hiding this comment

Uh oh!

kenben commented May 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kenben May 15, 2017 •

edited

Loading