Multiple Perf Improvements#80

Open

tgrigsby-sc wants to merge 5 commits intomasterfrom

travisg/20210520-optimizations-from-rawr-tile-size

Contributor

tgrigsby-sc commented May 21, 2021 •

edited

Loading

-- these changes are reasonably well grouped by commit --

Use rawr tile size to determine metatile build zoom levels
Change instance type to M5 - reduce mem/cpu by 4GB for cheaper instances with more memory than we need
Switch to on-demand instance type
Big jobs code to spread jobs across (zoom_max, split_zoom) range instead of just one or the other
Add TileSpecifier. It allows more complex behavior around specifying mem reqs in a file, but for now it just uses the output of the big jobs code to dictate the order we enqueue high zoom metatile jobs.
Change zoom_max to 6 to allow grouping of the less intense high zoom jobs. Reduces variance of job runtimes which allows better utilization of EC2 resources
Bug fixes around usage of /usr/bin/time

tgrigsby-sc added 3 commits

May 20, 2021 12:52


          update instance type to standard m5. 4GB mem/cpu

06d7996


          change to on demand instance type

a559ec1


          bug fixes for low zoom and raw tile docker builds

923bacb

tgrigsby-sc requested review from iandees and peitili

May 21, 2021 23:01

tgrigsby-sc added 2 commits

May 21, 2021 16:04


          add tile specifier and sort by raw tile size

714b0ea


          change zoom_max to 6 to allow better grouping of fast high zoom jobs

778c09a

tgrigsby-sc force-pushed the travisg/20210520-optimizations-from-rawr-tile-size branch from 613942a to 778c09a Compare

May 21, 2021 23:04

iandees reviewed

View reviewed changes

batch-setup/batch_setup.py

-                  # Create the spot fleet role
-                  # https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html
-                  spotIamFleetRoleName = "AmazonEC2SpotFleetRole"

Member

iandees May 21, 2021

It would be neat if we could think of a way to keep support for spot instance types for folks where Spot is cheaper than on-demand. On the other hand, it might be enough to point to this commit and say "undo this one".

Contributor Author

tgrigsby-sc May 21, 2021

we can certainly do it, it's just some plumbing

batch-setup/batch_setup.py

                                   'cost_resource_group': run_id,
                               },
                               bidPercentage=60,

Member

iandees May 21, 2021

This can probably be removed, too.

batch-setup/make_meta_tiles.py

    
                      width = 1 << dz

                      size = 0

                      sizes = {}

Member

iandees May 21, 2021

This might be easier to read. {} always looks like an empty dict to me, not a set.

Suggested change

      
                    sizes = {}
          
                    sizes = set()

Contributor Author

tgrigsby-sc May 21, 2021

this is actually a dict! I need to track which tile has that size so I can combine them correctly at lower zooms

Member

iandees May 21, 2021

Oh, well nevermind then. Maybe dict() then? 😄

batch-setup/make_meta_tiles.py

+                      count_at_this_zoom = counts_at_zoom[z]
+                      zoom_10_equiv_count = count_at_this_zoom * (4 ** (10 - z))
+                      counts_at_zoom_sum += zoom_10_equiv_count
+                  if counts_at_zoom_sum == 4**10:

Member

iandees May 21, 2021

Should this 10 be one of the zoom values that gets passed in? What happens when we want to switch away from zoom 10?

Contributor Author

tgrigsby-sc May 21, 2021

doh. Yes definitely

Contributor Author

tgrigsby-sc May 21, 2021

I guess it really just needs to be a number we're comfortable we're not going to go over. It could be zoom 20 and everything would be fine (except overflow?). At hardcoded zoom 10, if we ended up grouping into zoom 11 jobs then we'd be doing a 4^-1 calc which would then compare an int to a float, and things might get bad

batch-setup/make_meta_tiles.py

+              def viable_container_overrides(mem_mb):
+                  """
+                  Turns a number into the next highest even multiple that AWS will accept, and the min number of CPUs you need for that amount

Member

iandees May 21, 2021

Might be worth mentioning that this is a workaround for what seems to be a bug in Batch that prevents arbitrary memory/CPU requests.

batch-setup/make_meta_tiles.py

+                      # now that we know what we want, pick something AWS actually supports
+                      viable_mem_request, required_min_cpus = viable_container_overrides(adjusted_mem)
+                      print("REMOVEME: [%s] enqueueing %s at %s mem mb and %s cpus" % (time.ctime(), coord_line, viable_mem_request, required_min_cpus))

Member

iandees May 21, 2021

Remove REMOVEME? This looks like a useful thing to print maybe.

Contributor Author

tgrigsby-sc May 21, 2021

There are A LOT of these. 25k with this configuration

peitili reviewed

View reviewed changes

batch-setup/make_meta_tiles.py

                           self.read_metas_to_file(missing_meta_file, compress=True)
-                          print("Splitting into high and low zoom lists")
+                          print("[%s] Splitting into high and low zoom lists" % (time.ctime()))

Contributor

peitili May 24, 2021

nit: prefix the log with [make_meta_tiles] too

batch-setup/make_meta_tiles.py

    
                              for line in fh:

                                  c = deserialize_coord(line)

                                  if c.zoom < split_zoom:

                                  this_coord = deserialize_coord(line)

Contributor

peitili May 24, 2021

might need to rebase master

batch-setup/make_meta_tiles.py

                               for coord in missing_high:
                                   fh.write(serialize_coord(coord) + "\n")
+                          print("[%s] Done splitting into high and low zoom lists" % (time.ctime()))

Contributor

peitili May 24, 2021

nit: prefix log with [make_meta_tiles]

batch-setup/make_meta_tiles.py

                   parser.add_argument('--allowed-missing-tiles', default=2, type=int,
                                       help='The maximum number of missing metatiles allowed '
                                       'to continue the build process.')
+                  parser.add_argument('--tile-specifier-file',

Contributor

peitili May 25, 2021

where is this arg used?

batch-setup/make_meta_tiles.py


		return 0

		def get_mem_reqs_mb(self, coord_str):

Contributor

peitili May 25, 2021

nit: the name mib is probably more precise to mean 1024

batch-setup/make_meta_tiles.py

    
                      or zoom_max (usually lower, e.g: 7) depending on whether big_jobs

                      contains a truthy value for the RAWR tile. The big_jobs are looked up

                      at zoom_max.

                      High zoom jobs are output between split_zoom (RAWR tile granularity) and zoom_max

Contributor

peitili May 25, 2021

suggestion, give a concrete example. As a new hire in the team, this doc is probably still not easy to follow

batch-setup/make_meta_tiles.py


		reordered_lines = tile_specifier.reorder(coord_lines)

		print("[%s] Starting to enqueue %d tile batches" % (time.ctime(), len(reordered_lines)))

Contributor

peitili May 25, 2021

nit: log prefix of the module name

batch-setup/make_meta_tiles.py

+                  Look up the RAWR tiles in the rawr_bucket under the prefix and with the
+                  given key format, group the RAWR tiles (usually at zoom 10) by the job
+                  group zoom (usually 7) and sum their sizes. Return an ordered list of job coordinates
+                  by descending raw size sum.

Contributor

peitili May 25, 2021

nit: rawr size sum

batch-setup/make_meta_tiles.py

+                  Provides the ability to sort tiles based on an ordering and specify memory reqs
+                  """
+                  def __init__(self, default_mem_gb=8, spec_dict={}):

Contributor

peitili May 25, 2021

I feel we probably should have a central place for all the default values such as 8 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet