move execute_insert to top module nesting so pickle does not err#87
Open
kenben wants to merge 73 commits intoka/parallelfrom
Open
move execute_insert to top module nesting so pickle does not err#87kenben wants to merge 73 commits intoka/parallelfrom
kenben wants to merge 73 commits intoka/parallelfrom
Conversation
* Support distinct clauses in aggregates * flake8 * fix parentheses add tests * split_distinct should return tuple * fix tests and pep8 * single distinct
This drastically can simplify the writing of categorical comparisons:
``categorical(col, op, choices, include_null=True, maxlen=32)``
Args:
col: the column name (or equivalent SQL expression)
op: the SQL operation (e.g., '=' or '~' or 'LIKE')
choices: A list or dictionary of values. When a dictionary is passed,
the keys are a short name for the value.
include_null: Should an extra `{col} is NULL` be added? (default True)
maxlen: The maximum length of aggregate quantity names (default 32).
Names longer than this will be truncated.
Returns: a dictionary of aggregate quantities to be passed to Aggregate()
A simple helper method to easily create many categorical columns from one
source column by comparing it against many values. It effectively creates
many quantities of the form "({col} {op} '{elt}')::INT" for elt in choices.
The type of the comparison is converted to an integer so it can easily be
used with 'sum' (for total count) and 'avg' (for relative fraction)
aggregate functions.
By default, the aggregates are simply named "{col}_{op}_{choice}", but
that can easily get long and exceed the maximum column name length. If any
name ends up longer than ``maxlen`` characters (32 by default), then each
aggregate name gets truncated with a sequential number appended to ensure
that they remain identifiable and unique (but note that ordering is not
preserved).
Use it like:
```py
from collate import collate
from collate.helpers import categorical
collate.Aggregate(categorical('food', '=', ['hamburger','hotdog','sock']), ['sum','avg'])
```
Allow using None values to specify include_null within Categorical
Add multiple comparison Aggregate subclasses
moved SpacetimeAggregation into its own spacetime module and refactored the where filtering to a method in preparation for fixing join table and #42.
* spacetime join table * join_table arg to execute * python3 dict compatible
* Update sqlalchemy from 1.1.6 to 1.1.7 * Update sqlalchemy from 1.1.6 to 1.1.7
* Update sqlalchemy from 1.1.7 to 1.1.8 * Update sqlalchemy from 1.1.7 to 1.1.8 * Update sphinx from 1.5.3 to 1.5.4
include order in aggregate name and test it
use filter instead of case when
* Add support for restricting the "beginning of time" Adds a new keyword parameter for SpacetimeAggregations that enables restricting the rows included in their calculations based upon an absolute minimum date. Adds actual behavior tests for SpacetimeAggregation with testing.postgresql. This takes a SQL connection to allow validation against a SQL server. By default, `Aggregation.execute()` will call the validate method. SpacetimeAggregations now raise an error in the case where a date/interval combination happens to cross before the beginning of time (so long as the interval is not all). If this proves to be annoying, we can perhaps change it to be a warning or even add an optional override to explicitly allow this to occur. But I think it behooves us to start conservatively.
Scheduled biweekly dependency update for week 16
Allow overriding of choice quoting [Resolves #81]
This is a simple workaround; make non-lazy. Should fix #82
Don't modify dict during iteration when shortening keys
mbauman
suggested changes
May 15, 2017
collate/collate.py
Outdated
| insert_list = [insert for insert in inserts[group]] | ||
|
|
||
| out = Parallel(n_jobs=n_jobs, verbose=51)(delayed(Aggregation.execute_insert)(conn_func, insert) | ||
| import pdb;pdb.set_trace() |
collate/collate.py
Outdated
| from .sql import make_sql_clause, to_sql_name, CreateTableAs, InsertFromSelect | ||
|
|
||
|
|
||
| def execute_insert(get_engine, insert): |
Member
There was a problem hiding this comment.
maybe place this into the sql.py file?
Author
There was a problem hiding this comment.
What's the destiny for ka/parallel? We need to refactor things if we want to merge it into master eventually anyway.
Author
|
I've merged |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I got a
pickle.PicklingError: Can't pickle <function Aggregation.execute_insert at [...]>; moving theexecute_insertout of the Class makes it OK to pickle.