100% CPU utilisation at points (other calculations) while running simulations

Hi there,

I’m trying to debug an issue that looks like fdtd-engine isn’t obeying the resource limits put in place for a simulation. I’m not a Lumerical user so I currently don’t know if this is a bug or if it’s just a misconfiguration on our end.

What I’m seeing is that even though a simulation is run using a fixed number of threads, at various points in the sim all CPUs seem to go to 100% for short blips. This obviously isn’t great on a shared cluster node.

What is going on here? Are there points in a sim where it does garbage collection or combines the output from the individual sims? Is there a config setting that can limit the number of cores that are consumed in whatever job is being run here?

Shareef.

@shareef,

Welcome to Lumerical Knowledge Exchange.

Do you run or submit to a job scheduler on your cluster?
Can you please send us the exact command used to run your FDTD simulation your cluster?
We also have more information about resource configurations in our KB:

Best,
Lito

Hi there,

we’re currently not using a job scheduler but are about to move to PBSPro.

I’ve spoken to the engineers using Lumerical and they say it looks like this high utilisation happens under multiple scenario’s, one of which is “when the mode expansion monitors are calculating the mode expansion”.

The guys are currently launching jobs from the GUI with resources of 64 processes/1 thread.

@shareef,

To clarify:

  • You are accessing the same node/machine as other users and they are running simulations on the same machine while you are also running your simulations?
  • Or the issue happens even if you are the only one logged into this machine and you are the only one running simulations?

Also:

  1. Can you provide the command that you use to run your FDTD simulation?

  2. If possible, also the command that other users execute to run their simulations with mode expansion monitors?

    Example:

    /opt/lumerical/2020a/mpich2/nemesis/bin/mpiexec -n 32 /opt/lumerical/2020a/bin/fdtd-engine-mpich2nem -t 1 /path/your simulation_file.fsp
    

Hi there,

On your clarification question, erm, they’re both true unless you’re expecting Lumerical to behave differently depending on who’s logged into the machine?

In answer to your other questions, this is all being run interactively through the GUI so the user has no visibility of the underlying commands. Does it dump the commands used to a log file that would be useful to upload?

Shareef.

@shareef,

Sorry my question was = When the issue happens:

  • Are there multiple users logging into to the same machine/node at the same time and all these users are running simulations jobs at the same time on the same (1) machine?
  • Or only 1 user at a time?

You are running simulations from the CAD and not from the command line.
What is the OS of this machine?
Which version of FDTD are you using?
Would it be possible to share your simulation file to see whats affecting the CPU utilization based on your project file?

Thanks,
Lito

I don’t think it matters who is logged in and what is running, the behaviour is the same, that’s my point. We have not noticed any dependency on this behaviour, it happens all the time.

We’re using the GUI, not the command line.

OS is CentOS7.

Lumerical version is lumerical-2020a-r1.

No, we can’t share any simulation files from our own projects but I can check to see if any of your own examples show the same behaviour.

I’ll also see if I can check an identical run using the command line instead of the GUI.

Shareef.

@shareef,

Please try to run an example from our website: apps.lumerical.com. Let us know which one has the same issue.

Thanks,
Lito

Hi there,

I found the issue myself when looking at the Resources menu in the GUI. The default setting for ‘maximum number of threads’ on the Design Environment tab is ‘auto-detect’. This needs to be manually set to something sensible.

How is this controlled on the command line or is it an environment wide setting that just needs to be set once?

@shareef,

I just confirmed with our team, the design environment tab in the resource configuration sets the number of threads/cores when doing calculations from within the CAD. i.e. analysis group, functions, scripts. Any calculations that is part of the design environment/CAD will use this settings.

This is set on the GUI. If you do not want to use max available threads/cores on your machine for CAD calculations, you can set this to any number below your max number of cores. Otherwise to let the CAD use all available threads on your machine for its calculations, leave this to auto detect.

Best,
Lito

@lyap

Thanks. Is this different to the behviour in batch mode or when using your PBS scripts? This seems like a bug rather than good design intent.

@shareef,

The batch mode when running simulations directly using the command line will use the number of processes indicated on your “command to execute”.

# For example - running your simulation/project file (.fsp)
/opt/lumerical/2020a/mpich2/nemesis/mpiexec -n 8 /opt/lumerical/2020a/bin/fdtd-engine-mpich2nem -t 1 /simulation.fsp

# or using the our batch script 
/opt/lumerical/2020a/bin/fdtd-run-local.sh -n 8 *.fsp
  • The above will run the simulation file(s) using 8 MPI processes/cores.

  • For the PBS submissions it will be similar to the above, when you submit this to your cluster - you will indicate the number of processes and the command to run using MPI will take the number pf processes based on your script.

  • If you run a script (.lsf) or a Python script (.py), it will require the CAD component and any calculations done in the script that does not call or run the solver/engine component will use the settings on the design environment tab” in the resource configuration.

  • We have this video for more information about our product components. Lumerical Product Components

Hope this helps.
Lito