Conversation
613942a to
778c09a
Compare
|
|
||
| # Create the spot fleet role | ||
| # https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html | ||
| spotIamFleetRoleName = "AmazonEC2SpotFleetRole" |
There was a problem hiding this comment.
It would be neat if we could think of a way to keep support for spot instance types for folks where Spot is cheaper than on-demand. On the other hand, it might be enough to point to this commit and say "undo this one".
There was a problem hiding this comment.
we can certainly do it, it's just some plumbing
| @@ -205,7 +177,6 @@ def batch_setup(region_name, run_id, vpc_id, securityGroupIds, computeEnvironmen | |||
| 'cost_resource_group': run_id, | |||
| }, | |||
| bidPercentage=60, | |||
There was a problem hiding this comment.
This can probably be removed, too.
| width = 1 << dz | ||
|
|
||
| size = 0 | ||
| sizes = {} |
There was a problem hiding this comment.
This might be easier to read. {} always looks like an empty dict to me, not a set.
| sizes = {} | |
| sizes = set() |
There was a problem hiding this comment.
this is actually a dict! I need to track which tile has that size so I can combine them correctly at lower zooms
There was a problem hiding this comment.
Oh, well nevermind then. Maybe dict() then? 😄
| count_at_this_zoom = counts_at_zoom[z] | ||
| zoom_10_equiv_count = count_at_this_zoom * (4 ** (10 - z)) | ||
| counts_at_zoom_sum += zoom_10_equiv_count | ||
| if counts_at_zoom_sum == 4**10: |
There was a problem hiding this comment.
Should this 10 be one of the zoom values that gets passed in? What happens when we want to switch away from zoom 10?
There was a problem hiding this comment.
doh. Yes definitely
There was a problem hiding this comment.
I guess it really just needs to be a number we're comfortable we're not going to go over. It could be zoom 20 and everything would be fine (except overflow?). At hardcoded zoom 10, if we ended up grouping into zoom 11 jobs then we'd be doing a 4^-1 calc which would then compare an int to a float, and things might get bad
|
|
||
| def viable_container_overrides(mem_mb): | ||
| """ | ||
| Turns a number into the next highest even multiple that AWS will accept, and the min number of CPUs you need for that amount |
There was a problem hiding this comment.
Might be worth mentioning that this is a workaround for what seems to be a bug in Batch that prevents arbitrary memory/CPU requests.
|
|
||
| # now that we know what we want, pick something AWS actually supports | ||
| viable_mem_request, required_min_cpus = viable_container_overrides(adjusted_mem) | ||
| print("REMOVEME: [%s] enqueueing %s at %s mem mb and %s cpus" % (time.ctime(), coord_line, viable_mem_request, required_min_cpus)) |
There was a problem hiding this comment.
Remove REMOVEME? This looks like a useful thing to print maybe.
There was a problem hiding this comment.
There are A LOT of these. 25k with this configuration
| self.read_metas_to_file(missing_meta_file, compress=True) | ||
|
|
||
| print("Splitting into high and low zoom lists") | ||
| print("[%s] Splitting into high and low zoom lists" % (time.ctime())) |
There was a problem hiding this comment.
nit: prefix the log with [make_meta_tiles] too
| for line in fh: | ||
| c = deserialize_coord(line) | ||
| if c.zoom < split_zoom: | ||
| this_coord = deserialize_coord(line) |
There was a problem hiding this comment.
might need to rebase master
| for coord in missing_high: | ||
| fh.write(serialize_coord(coord) + "\n") | ||
|
|
||
| print("[%s] Done splitting into high and low zoom lists" % (time.ctime())) |
There was a problem hiding this comment.
nit: prefix log with [make_meta_tiles]
| parser.add_argument('--allowed-missing-tiles', default=2, type=int, | ||
| help='The maximum number of missing metatiles allowed ' | ||
| 'to continue the build process.') | ||
| parser.add_argument('--tile-specifier-file', |
|
|
||
| return 0 | ||
|
|
||
| def get_mem_reqs_mb(self, coord_str): |
There was a problem hiding this comment.
nit: the name mib is probably more precise to mean 1024
| or zoom_max (usually lower, e.g: 7) depending on whether big_jobs | ||
| contains a truthy value for the RAWR tile. The big_jobs are looked up | ||
| at zoom_max. | ||
| High zoom jobs are output between split_zoom (RAWR tile granularity) and zoom_max |
There was a problem hiding this comment.
suggestion, give a concrete example. As a new hire in the team, this doc is probably still not easy to follow
|
|
||
| reordered_lines = tile_specifier.reorder(coord_lines) | ||
|
|
||
| print("[%s] Starting to enqueue %d tile batches" % (time.ctime(), len(reordered_lines))) |
There was a problem hiding this comment.
nit: log prefix of the module name
| Look up the RAWR tiles in the rawr_bucket under the prefix and with the | ||
| given key format, group the RAWR tiles (usually at zoom 10) by the job | ||
| group zoom (usually 7) and sum their sizes. Return an ordered list of job coordinates | ||
| by descending raw size sum. |
| Provides the ability to sort tiles based on an ordering and specify memory reqs | ||
| """ | ||
|
|
||
| def __init__(self, default_mem_gb=8, spec_dict={}): |
There was a problem hiding this comment.
I feel we probably should have a central place for all the default values such as 8 here.
-- these changes are reasonably well grouped by commit --
/usr/bin/time