Categorizing the cores of cpu to run parallel simulations?

I am using a server with 256 cores, which FDTD is also installed on it. According to our licence I directly connect to software, set my simulation and run it. Based on my evaluation having only 8 cores is the ideal case to run my simulation in fastest way. Now I want to do parametric sweep on this simulation. So I want to divide my 256 cores into groups of 8 (256/8 = 32) and run multiple programs in parallel. How can I do this, is it possible to seperate the cores in this way? How can I be sure that the the parallel programs doesn’t intersect and run on the same cores? Do I need a new licence?

@masoud.kheyri,

Welcome to Lumerical Knowledge Exchange.

To answer your inquiries;

Based on my evaluation having only 8 cores is the ideal case to run my simulation in fastest way. Now I want to do parametric sweep on this simulation. So I want to divide my 256 cores into groups of 8 (256/8 = 32) and run multiple programs in parallel. How can I do this, is it possible to seperate the cores in this way? How can I be sure that the the parallel programs doesn’t intersect and run on the same cores?

Yes you can try and utilize all your resources (256 cores) to run parameter sweep. You can add resources in the Resource Configuration Manager with localhost as the Hostname for all the resources you are adding.

If you are running Windows, you can use the processor binding option to bind each process to a specific core. (not support on our default MPICH2 on Linux/macOS - these systems will autoselect which cores your simulation job will be running on).

Once you have configured the default launching preset, simply set the processes to 8 and “Duplicate” your resource.

Depending on the number of sweep you will be generating. 20-24 resource line might work? Try to run some tests on how many resource lines would be optimal for your sweep. i.e. 32 or lesser resources, same method to obtain the optimum FDTD resource configuration for a simulation. the individual sweep’s log file should show the FDTD speed (mnodes/S).

Do I need a new licence?

If you are only running the sweep on 1 machine, i.e. all localhost on your resource configuration as shown above, you do need additional FDTD solver licenses.

Hope this helps.

-Lito

Hello,
Thank you for your response, I followed your response and I added 10 resources and allocated 8 processes of localhost to each one and I ran the sweep. But I saw that simulation times is proportionately slower. Is it because all of them using the same FDTD engine?

Which licence should I purchase ?

And also it is not clear for me that when I add multiple resources (as you described) how to allocate each resource to a desired sweep parameter, for example suppose I want to do the simulation on a sweep parameter on 20 values I know for 10 of them I need less computational power and they converge faster. So I create two groups of resources, group one with for example 4 processes and group two with 8 processes. Then I allocate the group one to the sweep values that converge faster and group two to the values that are slower.

thank you very much

@masoud.kheyri,

What operating system are you using on this machine with 256 cores?

The slow down is typical since FDTD will try to use all available bandwidth between your CPU and RAM when running on the same machine.

When running sweeps/optimizations we highly recommend to do these on multiple computers/nodes on your network. We have these KX article for details.

The above scenario would require additional “HPC FDTD Acclerator” licenses, depending on how many machines/nodes you want to run simulations at the same time.

When using a single machine, you cannot run different simulations at the same time and use different resource settings. If these are all parameter sweeps you can run them directly in sequence using a batch/script file from the command line and indicate the processes you want to use for each simulation file or batch of simulations…

# example for Linux shown below
cd /location_of_your_specific_sweep_files
/opt/lumerical/2020a/mpich2/nemesis/bin/mpiexec -n 8 /opt/lumerical/2020a/bin/fdtd-engine-mpich2nem -t 1 simulationfile1.fsp
/opt/lumerical/2020a/mpich2/nemesis/bin/mpiexec -n 16 /opt/lumerical/2020a/bin/fdtd-engine-mpich2nem -t 1 simulationfile02.fsp

# or use the fdtd-run-local.sh script to run all simulation files on the current directory
cd /location_of_your_specific_sweep_files
/opt/lumerical/2020a/bin/fdtd-run-local.sh -n 16 *.fsp

Hope this helps.

-Lito

Thank you for your answer. I learned different things but still I can’t do what I wanted to do.
My operating system is windows server 2008 R2 entreprise.

I have many cores on a single machine and a simulation file that I want to do parametric sweep simulation. For one parameter having 8 cores is enough, as I said I want to be able to run the programs simultaneously and allocate 8 cores to each one. for example in a sweep with 10 values, I run 10 simulation simultaneously and allocate 8 cores to each one.

I followed the answers and I learned a lot but still I can’t do this. Here I try to ask more questions to make it more clear.

1_ what are the difference between different engines? And which engine I should use to my purpose?
for example I wrote this batch file :
@echo off
start /affinity 2
call “C:\Program Files\Lumerical\FDTD\bin\fdtd-run-local.bat” -n 20 “\FLATLIGHT\Public\MasoudKheyri\Lumerical\Calculating_density_states\High_Performance_computing_analyziz\Batchfile\Batch\Dipole_mirror_31.fsp”
echo finished

Here I have used fdtd-run-local.bat and It works. I can also put fdtd-engine and it also works.I expected that fdtd-engine-msmpi will work but it didn’t work and I received the message : fdtd-engine-msmpi is not valid application in windos. I mean I wrote the following line but it didn’t work:

call “C:\Program Files\Microsoft MPI\Bin\mpiexec.exe” -n 4 “C:\Program Files\Lumerical\2019b\bin\fdtd-engine-msmpi.exe” -t 1 “\FLATLIGHT\Public\MasoudKheyri\Lumerical\Calculating_density_states\High_Performance_computing_analyziz\Batchfile\Batch\Dipole_mirror_3.fsp”

but I think the mpi setup is Ok because the following line is working:
call “C:\Program Files\Microsoft MPI\Bin\mpiexec.exe” -n 10 “C:\Program Files\Lumerical\FDTD\mpitest\cpi-msmpi.exe”

  1. my second question is what are the -t and -n in the patch file exactly? I thought that -t related to threads and -n (nodes) are the number of cpus. but when I used fdtd-run-local.bat with n =1 and t =1 as you can see all the my cpu processors are working:


    should I use different engine? what is going wrong?

  2. I create 5 similar batch files I set n=4 for each one and I ran them through 5 different cmds. I saw that the speed is approximately 1/5 slower and over speed is not changing 5*1/5 =1. I guess the problem comes back to this fact that still I am not able to allocate the number of cores for each program. How can I fix it.

@masoud.kheyri,

Note:

Windows Server 2008 is not anymore supported by Lumerical.
See: Lumerical Supported Systems for details.

FDTD-run-local:

The batch file fdtd-run-local.bat is using the Microsoft MPI to run your simulations. When using this , -n = to the cores you want to use. i.e. you have an 8 core machine and you want yo use 4 cores (-n 4) then the CPU utilization will be ~ 50% it will not be limited to run only on a specific number of cores (windows will distribute it to available cores as possible). If you use -n 1, then the CPU utilization should be ~ 12.5%. With more than 64 cores on a machine, MS MPI does not work well.

Try to use Intel MPI instead on your resource configuration (CAD) This supports more than 64 core systems.

The batch file is basically:

“C:\Program Files\Microsoft MPI\Bin\mpiexec.exe” -n 4 “C:\Program Files\Lumerical\2019b\bin\fdtd-engine-msmpi.exe” -t 1 “//FLATLIGHT/Public/MasoudKheyri/Lumerical/Calculating_density_states/High_Performance_computing_analyziz/Batchfile/Batch/Dipole_mirror_3.fsp”

Try to copy and paste the above command into your CLI and run it. (try the path with "/" instead of the "\")

Resource configuration settings:

  • When running with MPI the “-n” option on the MPI represents the number of cores to be used and we keep “-t 1” on the engine option.

  • But when running directly the Engine without the MPI component, there is no options for -n. On the engine options we set -t = number of cores.

  • These difference is shown in the resource configuration on the CAD. When selecting Local Computer as your job launching preset, the Processes option is disabled and only the Threads option is enabled.

More about resources configuration can be found in our KB:

-Lito

Hello,
Instead of using run bottom in CAD view for my program, I ran the following batch file
@echo off
:: start /affinity 1
call “C:\Program Files\Lumerical\FDTD\bin\fdtd-run-local.bat” -n 8 “\FLATLIGHT\Public\MasoudKheyri\Lumerical\Calculating_density_states\High_Performance_computing_analyziz\Batchfile\Dipole_mirror_3.fsp”
echo finished

This batch file is properly working in cmd and I have the results. but now when I came back to run my simulation again in CAD view, I see an engine problem error.

I only use a single local computer to run my programs.

Thank you very much.

@masoud.kheyri

Try to remove the Lumerical preference files from your machine. Then check your resource settings on the CAD. It should be using Microsoft MPI as the job launching preset.

Try to change the job launching preset to Local Computer and see if either settings works or not?

Best,
Lito

@masoud.kheyri,

I have merged your posts into this one. It seems to be the same requests regarding running using several cores a machine and to run several simulations at the same time.

-Lito

I removed the file from both folders in Lumerical preference files, but it didnt work

Have you tried using a different job launching preset? i.e. Local Computer or Intel MPI when running your simulations from the CAD?

By the way, which version of FDTD are you using? You can obtain this from the About page on FDTD.

-Lito

Lumerical version is : 8.1.1332 release 2017b

Yes I have these options: Local Computer, Remote: MPICH2, Remote: Microsoft MPI and custom.

MPICH2 is now working. I remeber that before starting to run batchfiles I was using Microsoft MPI in CAD view that now doesn’t work anymore.

So I think I got a little bit confused because when I run my batch file using the engine “fdtd-run-local.bat” that it uses Microsoft MPI,it runs well but in CAD view when I set Joblaunching to Microsoft MPI it doesn’t work.
but in CAD view MPICH2 is working but when I want to run its corresponding engie in my batchfie with fdtd-engine-mspi.exe it doesn’t work and error 193 appears, saying this fdtd-engine-msmpi.exe is not a valid Win32 application.

I mean this line works :
call “C:\Program Files\Lumerical\FDTD\bin\fdtd-run-local.bat” -n 10 -t 10 “\FLATLIGHT\Public\MasoudKheyri\Lumerical\Calculating_density_states\High_Performance_computing_analyziz\Batchfile\Batch\Dipole_mirror_3.fsp”

but this line doesn’t work:
call “C:\Program Files\Microsoft MPI\Bin\mpiexec.exe” -n 4 “C:\Program Files\Lumerical\FDTD\bin\fdtd-engine-msmpi.exe” -t 1 “//FLATLIGHT/Public/MasoudKheyri/Lumerical/Calculating_density_states/High_Performance_computing_analyziz/Batchfile/Batch/Dipole_mirror_3.fsp”

echo finishedCapture

@masoud.kheyri,

If I am not mistaken the 2017b version of FDTD was using the MPICH2 on the batch file.
You can verify this by opening the fdtd-run-local.bat file using a text editor like notepad and search for "MPICH2".

There might be an issue with your Microsoft MPI installation. Please try and re-install this as shown in the KX post below. Your current installation might have been messed up by a system update.

Note: The currently supported versions of Lumeical are available for download on our website: 2019b R6 and 2020a R3.

-Lito