A lot of effort go into creating spec sheets and publish various numbers on what performance can be expected in order to help chose and size the right firewall model. This post tries to dive a little beyond these numbers and look a little at the hardware behind the figures.
Note: These notes are not ‘verified’. They are based on log output from various platforms and a bit of (qualified) guesswork. In other words facts may be slightly inaccurate.
General Architecture
The Palo Alto firewall has a separated control and data plane architecture. Furthermore the data plane is roughly divided into three stages; Network-, Security- and Signature processing as depicted below.
The three subsystems of the Data plane is implemented differently in the various models as shown in the Table later on. The Security Processor subsystem is responsible for doing the App-ID, session handling and similar, the other two are pretty self explanatory. It is worth noting that the management plane also holds some features which sometimes can become a bottleneck for operation.
In a spec sheet note on this architecture; when a PA-3020 list specs as:
- 2 Gbps firewall throughput (App-ID enabled)
- 1 Gbps threat prevention throughput
They are saying that the Security Processor part will handle 2 Gbps if it does not need to go to the Signature Processor for threat inspection as this only has 1 Gbps capacity. The reported specsheet number tends to hold true in my experience (though some features do hurt more, SSL in particular).
Cavium Octeon MIPS processors
All Palo Alto Networks Appliances uses Cavium processors at the heart of their dataplane (also state here in the common criteria). This processor is present in a lot of different networking equipment these days ranging from F5 to Cisco and Palo Alto Networks. It has hardware assist for common security appliance functions such as SSL and decompression and offers a very high throughput on network traffic. It also comes in a wide variety of sizes (number of cores), which makes it easy to scale into smaller and larger appliances utilizing the same code base.
There is an extensive amount of information on the manufactures website.
Palo Alto Networks architecture
It is important to note that the lower end models utilize their DP cores for all DP subsystems while the larger have FPGAs. This is of course done in order to save some chips and thus manufacturing cost. The same goes for the management plane which in higher end models sports an Intel processor but in the lower end also is left to the Cavium processor.
In the data plane I believe it to be an OK solution to use the Cavium for all three subsystems, however on the management plane it can create problems as tasks here are often of a much more generic nature and as such are not really suited for a purpose build processor as the Cavium. The old PA-500 also used this approach with rendered its management UI more or less useless (as it is running on-box on the management plane). To make matters worse, more and more features are pushed to the control plane, as user interaction with the firewall can be required (splash pages, mitm features etc.).
Model | MP architecture | MP cores | DP architecture | DP cores | DP Security | DP Signature | DP Network |
PA-200 | CN6320 | 1 | CN6320 | 1 | Cavium | Cavium | Cavium |
PA-220 | CN7130 | 2 | CN7130 | 2 | Cavium | Cavium | Cavium |
PA-8×0 | CN7240(?) | 3 | CN7240(?) | 4/5* | Cavium | Cavium | Cavium |
PA-3020 | Celeron (P4505) | 2 | CN6335 | 6 | Cavium | FPGA | Cavium |
PA-3050 | Celeron (P4505) | 2 | CN6335 | 6 | Cavium | FPGA | FPGA |
PA-5050 | Xeon (L5410) | 4 | CN5650 | 12 | Cavium | FPGA | FPGA |
PA-5220 | i7 | 8 | CN7885 | 40 | Cavium | FPGA | FPGA |
PA-7050 | i7 (2715QE) | 8 | CN6880 | 32** | Cavium | FPGA | FPGA |
*The differentiation between the 820 and the 850 looks like it is only the activation of a single core (giving the larger a 20% advantage, which matches the Palo Alto model specsheet difference). I am a little uncertain as the excact version of Cavium Octeon but the referenced appear as the logical choice.).
** The 7050 is a chassis with different Network Processing Cards (NPC). The data plane architecture and number of cores would be dependedĀ on which NPC it is using (though to my knowledge as of the time of this writing they all use the mentioned one)
Conclusion (if any)
The key take-away is probably that you get what you pay for… meaning the higher end models will have more dedicated ASICs for performing various task while the lower end models will not. One thing to take note on however is the generation of Cavium used for the platform as this translates directly into performance (especially SSL and similar features). This again translates more or less directly to the age of the firewall model.
Various Log Outputs (aka Appendix)
Here is just a bunch of various log outputs from different models
PA-200
OCTEON CN6320-AAP pass 2.1, Core clock: 800 MHz, IO clock: 800 MHz, DDR clock: 666 MHz (1332 Mhz data rate) RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=2.
PA-220
OCTEON CN7130-AAP pass 1.2, Core clock: 1000 MHz, IO clock: 500 MHz, DDR clock: 800 MHz (1600 Mhz DDR)
PA-3020 and PA-3050 (possible PA-3060)
OCTEON CN6335-AAP pass 2.2, Core clock: 1000 MHz, IO clock: 800 MHz, DDR clock: 533 MHz (1066 Mhz DDR), DFM clock: 533 MHz ... CPU0: Intel(R) Celeron(R) CPU P4505 @ 1.87GHz stepping 05 CPU1: Intel(R) Celeron(R) CPU P4505 @ 1.87GHz stepping 05
PA-5050
OCTEON CN5650-NSP pass 2.1, Core clock: 700 MHz, DDR clock: 400 MHz (800 Mhz DDR) OCTEON CN5220-CP pass 2.0, Core clock: 500 MHz, DDR clock: 331 MHz (662 Mhz DDR) ... CPU0: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a CPU1: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a CPU2: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a CPU3: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a
PA-5220
OCTEON CN7885-AAP pass 2.0, Core clock: 1600 MHz, IO clock: 1000 MHz, DDR clock: 1050 MHz (2100 Mhz DDR)
PA-7050
OCTEON CN5220-CP pass 2.0, Core clock: 500 MHz, DDR clock: 331 MHz (662 Mhz DDR) OCTEON CN6645-AAP pass 1.2, Core clock: 1100 MHz, IO clock: 800 MHz, DDR clock: 533 MHz (1066 Mhz DDR), DFM clock: 533 MHz OCTEON CN6880-AAP pass 2.2, Core clock: 1000 MHz, IO clock: 800 MHz, DDR clock: 667 MHz (1334 Mhz DDR) ... CPU0: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU1: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU2: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU3: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU4: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU5: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU6: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07 CPU7: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
PA-850
mgmt kernel: Linux version 3.10.87-oct2-mp (build@2ce2b9b3bc97) (gcc version 4.7.0 (Cavium Inc. Version: SDK_BUILD build 49) ) #4 SMP Thu Aug 17 09:28:02 EDT 2017 RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=8. SMP: Booting CPU01 (CoreId 1)... CPU revision is: 000d9702 (Cavium Octeon III) SMP: Booting CPU02 (CoreId 2)... CPU revision is: 000d9702 (Cavium Octeon III) .... SMP: Booting CPU07 (CoreId 7)... CPU revision is: 000d9702 (Cavium Octeon III) Brought up 8 CPUs
Be the first to comment on "Palo Alto Networks Firewall Hardware Internals"