Information on hardware specifications

memory
cpu
hardware

#1

We hope to provide some general tips and guidelines for users interested in understanding how certain computer hardware may effect the speed of FDTD simulations on a single workstation.

The Total Memory Bandwidth or overall Data Transfer Rate between the RAM and CPU is crucial. To run a simulation, large amounts of data must be continuously transferred between the RAM and CPU. When the memory bus is unable to transfer data fast enough, the processor is forced to wait, limiting the overall speed of your simulation. With modern multi-core processors, this transfer rate between CPU and RAM is often the bottleneck, even when using the fastest available RAM. This explains why simulations don’t go 4x faster on a 4 core processor: the speed between CPU and RAM is simply not fast enough to keep all 4 cores running at full speed. 2-3x faster is more typical, implying that the transfer rate between CPU and RAM is only fast enough to keep 2-3 cores running at full speed. When purchasing a computer, a high memory bandwidth is very important.
Notes:

  • The data transfer rate is a function of both the CPU and the RAM.
  • It is important to ensure that the selected RAM and CPU are compatible. For example, if the CPU only supports RAM data rates up to DDR4 1333, there is no value in installing DDR4 1866 RAM.
  • CPU specifications:
    • The ‘Max memory bandwidth’ specification for a CPU states the maximum data transfer rate supported by the CPU, assuming the fasted possible RAM is used.
    • For computers with multiple CPU’s, the ‘Max memory bandwidth’ should be multiplied by the number of processors to get the overall data transfer rate of the computer.
  • RAM specifications:
    • The ‘RAM type’ (eg. DDR3, DDR4, …) is important. Higher generations of RAM typically have better performance.
    • The ‘Data rate’ (eg. 2400 MHz) is important. Higher data rates give better performance.

Example:

The Xeon E7-4830 has a ‘Max memory bandwidth’ of 85 GB/s. Up to 4 of these processors can be installed in a single workstation, which would give a total bandwidth of 4 x 85 = 340 GB/s. To achieve optimum performance, DDR4-1866 memory modules should be used.


From Intel Xeon spec sheet: http://Intel Xeon Processor E7-4830 v4

RAM Size:
The amount of RAM determines the size of the simulation that can run on the computer. It does not affect the simulation speed (as long as you have enough RAM so the computer is not forced to swap memory to the hard drive). If swapping occurs, the simulation speed will be extremely slow.
Desktop computers typically have 6-12 GB of RAM, which is sufficient for running a large fraction of simulations. Workstations often have 48-96 GB of RAM, which is sufficient for running almost all simulations.
Using all the available memory slots on your system is highly recommended to utilize all available memory bus between the CPU and RAM.

Memory Bandwidth:
The memory bandwidth between the CPU and RAM is a crucial hardware specification. In most computers, the memory bandwidth is the factor that limits the overall simulation speed. Therefore, when purchasing a computer, having a fast memory bandwidth is very important.

Core Count:
FDTD Solutions will try and use all of your CPU cores to run as quickly as possible. However, having more cores does not always make the simulation go faster. Typically, the computer memory bus speed is not fast enough to keep all of the CPU cores running simultaneously. For example, on a typical desktop computer with 8 cores, the simulation speed might increase by a factor of 2-3x when using eight cores compared to one core. The explanation for this increase is that the memory bus is only fast enough to keep two or three cores running at full speed. Using additional cores does not increase the overall simulation speed because each core is forced to spend some time waiting for data to be transferred to/from the RAM. Obviously, a faster memory bus allows more of the cores to be efficiently utilized.
Most CPUs’ support hyper-threading, which allows the operating system to treat each physical CPU core as two cores. This feature does not provide any speed increase for FDTD simulations because the overall performance bottleneck is the data transfer rate between the CPU and RAM, not the actual computing capability of the cores.

CPU clock speed:
The CPU clock speed is typically not the most important factor for FDTD simulation speeds. While a faster clock speed does allow each core to run more quickly, the overall simulation speed is limited by the access between CPU and RAM.

Workstations with multiple CPU’s:
Workstations with multiple CPU’s are a good way to increase the speed the simulation speed. The most important factor is that each CPU has its own memory bus connection to the RAM. As explained above, the data transfer rate between CPU and RAM is the performance bottleneck, so having one memory bus per CPU allows the simulation speed scale very well with the number of CPU’s.

Other computer hardware:
Other computer hardware such as hard drives and GPU’s typically don’t have a significant effect on the simulation speed. Similarly, the network speed does not have a significant effect on the simulation speed when the entire simulation is calculated on a single computer. However, for customers running individual simulations that are distributed between multiple computers, the network speed is extremely important. High speed, low latency interconnects such as InfiniBand are recommended in such cases.

See also:
Workstation hardware recommendations for running FDTD simulations
Is it possible to run FDTD simulations on a cluster?
Hardware Resources upgrade for better performance
https://www.lumerical.com/support/whitepaper/fdtd_nehalem_parallel_performance.html

The above examples are not intended to be endorsements of these models or brands. They are simply examples used to illustrate the points described in the page.


Hardware/CPU recommendations for FDTD and DEVICE