Skip to content

Significant fraction of the code is single-threaded #129

@cphyc

Description

@cphyc

I am generating a relatively large set of ICs, which takes hours to generate. I have profiled the run to show the load on the node that's running genetIC (see image below) to estimate whether it is parallelised efficiently. The paramfile for this run is available in the details below.

Image

The code is run with 128 cores (out of 128 available). As you can see, several sections aren't parallelised at all (e.g., between 12:30 and 13:20).

Are there sections that fundamentally cannot be parallelised?

Details
# Cosmology, based on Planck 2018
Om           0.3158
Ol           0.6842
s8           0.8117
ns           0.9660
hubble       0.6732
z            50

# Import transfer function data.
# If you change the cosmological parameters above be sure to update the transfer
# function accordingly. Some help is given in the folder tools/transfer_function/
# within the genetIC distribution
camb    ../planck_2018_transfer_out.dat

# Seed the field using the default algorithm (parallel Fourier space):
random_seed     22122002

# Specify output directory:
outname ICs
outdir  .

# Pick output format:
outformat grafic

# Specify the base-level grid, 100 Mpc/h, 512 cells on a side:
base_grid 100   512

# Create zoom region
mapper_relative_to ../paramfile_DMonly_reference.txt
id_file region_part_ids.txt

# Create a nested zoomed region of size
zoom_grid 4 2048

# This is to also get a base grid of 256
subsample 2

# # Lazy version of getting high-resolution DM grid
# supersample 2

# Recenter the region
centre_output

# Create passive scalar in zoomed region
pvar 2e-5

done

The memory use looks like this:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions