Palo Alto Networks Firewall Hardware Internals

A lot of effort go into creating spec sheets and publish various numbers on what performance can be expected in order to help chose and size the right firewall model. This post tries to dive a little beyond these numbers and look a little at the hardware behind the figures.

Note: These notes are not ‘verified’. They are based on log output from various platforms and a bit of (qualified) guesswork. In other words facts may be slightly inaccurate.

General Architecture

The Palo Alto firewall has a separated control and data plane architecture. Furthermore the data plane is roughly divided into three stages; Network-, Security- and Signature processing as depicted below.

Palo Alto Firewall Architecture (cited from here). Note that there are 4 core functions: Control Plane and Dataplane (Network, Security and Signature engines). Going up the chain in hardware models each of these module is implemented with purpose build hardware (high-end models) or in software (lesser models). A packet requiring full inspection will go from bottom to top and all the way down again. Much more details can be seen here.

The three subsystems of the Data plane is implemented differently in the various models as shown in the Table later on. The Security Processor subsystem is responsible for doing the App-ID, session handling and similar, the other two are pretty self explanatory. It is worth noting that the management plane also holds some features which sometimes can become a bottleneck for operation.

In a spec sheet note on this architecture; when a PA-3020 list specs as:

  • 2 Gbps firewall throughput (App-ID enabled)
  • 1 Gbps threat prevention throughput

They are saying that the Security Processor part will handle 2 Gbps if it does not need to go to the Signature Processor for threat inspection as this only has 1 Gbps capacity. The reported specsheet number tends to hold true in my experience (though some features do hurt more, SSL in particular).

Cavium Octeon MIPS processors

All Palo Alto Networks Appliances uses Cavium processors at the heart of their dataplane (also state here in the common criteria). This processor is present in a lot of different networking equipment these days ranging from F5 to Cisco and Palo Alto Networks. It has hardware assist for common security appliance functions such as SSL and decompression and offers a very high throughput on network traffic. It also comes in a wide variety of sizes (number of cores), which makes it easy to scale into smaller and larger appliances utilizing the same code base.

The latest version of the Octeon processor has a significant edge performancewise over the older generations. (in particular in SSL)

There is an extensive amount of information on the manufactures website.

Palo Alto Networks architecture

It is important to note that the lower end models utilize their DP cores for all DP subsystems while the larger have FPGAs. This is of course done in order to save some chips and thus manufacturing cost. The same goes for the management plane which in higher end models sports an Intel processor but in the lower end also is left to the Cavium processor.

In the data plane I believe it to be an OK solution to use the Cavium for all three subsystems, however on the management plane it can create problems as tasks here are often of a much more generic nature and as such are not really suited for a purpose build processor as the Cavium. The old PA-500 also used this approach with rendered its management UI more or less useless (as it is running on-box on the management plane). To make matters worse, more and more features are pushed to the control plane, as user interaction with the firewall can be required (splash pages, mitm features etc.).

Model MP architecture MP cores DP architecture DP cores DP Security DP Signature DP Network
PA-200 CN6320 1 CN6320 1 Cavium Cavium Cavium
PA-220 CN7130 2 CN7130 2 Cavium Cavium Cavium
PA-8×0 CN7240(?) 3 CN7240(?) 4/5* Cavium Cavium Cavium
PA-3020 Celeron (P4505) 2 CN6335 6 Cavium FPGA Cavium
PA-3050 Celeron (P4505) 2 CN6335 6 Cavium FPGA FPGA
PA-5050 Xeon (L5410) 4 CN5650 12 Cavium FPGA FPGA
PA-5220 i7 8 CN7885 40 Cavium FPGA FPGA
PA-7050 i7 (2715QE) 8 CN6880 32** Cavium FPGA FPGA

*The differentiation between the 820 and the 850 looks like it is only the activation of a single core (giving the larger a 20% advantage, which matches the Palo Alto model specsheet difference). I am a little uncertain as the excact version of Cavium Octeon but the referenced appear as the logical choice.).

** The 7050 is a chassis with different Network Processing Cards (NPC). The data plane architecture and number of cores would be depended on which NPC it is using (though to my knowledge as of the time of this writing they all use the mentioned one)

 

The three different generation of Cavium Processors. A full table of the Caviums processor families can be seen here.

Conclusion (if any)

The key take-away is probably that you get what you pay for… meaning the higher end models will have more dedicated ASICs for performing various task while the lower end models will not. One thing to take note on however is the generation of Cavium used for the platform as this translates directly into performance (especially SSL and similar features). This again translates more or less directly to the age of the firewall model.

 

Various Log Outputs (aka Appendix)

Here is just a bunch of various log outputs from different models

PA-200

OCTEON CN6320-AAP pass 2.1, Core clock: 800 MHz, IO clock: 800 MHz, DDR clock: 666 MHz (1332 Mhz data rate)
RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=2.

PA-220

OCTEON CN7130-AAP pass 1.2, Core clock: 1000 MHz, IO clock: 500 MHz, DDR clock: 800 MHz (1600 Mhz DDR)

PA-3020 and PA-3050 (possible PA-3060)

OCTEON CN6335-AAP pass 2.2, Core clock: 1000 MHz, IO clock: 800 MHz, DDR clock: 533 MHz (1066 Mhz DDR), DFM clock: 533 MHz

...
CPU0: Intel(R) Celeron(R) CPU P4505 @ 1.87GHz stepping 05
CPU1: Intel(R) Celeron(R) CPU P4505 @ 1.87GHz stepping 05

PA-5050

OCTEON CN5650-NSP pass 2.1, Core clock: 700 MHz, DDR clock: 400 MHz (800 Mhz DDR)
OCTEON CN5220-CP pass 2.0, Core clock: 500 MHz, DDR clock: 331 MHz (662 Mhz DDR)
...
CPU0: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a
CPU1: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a
CPU2: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a
CPU3: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz stepping 0a

PA-5220

OCTEON CN7885-AAP pass 2.0, Core clock: 1600 MHz, IO clock: 1000 MHz, DDR clock: 1050 MHz (2100 Mhz DDR)

PA-7050

OCTEON CN5220-CP pass 2.0, Core clock: 500 MHz, DDR clock: 331 MHz (662 Mhz DDR)
OCTEON CN6645-AAP pass 1.2, Core clock: 1100 MHz, IO clock: 800 MHz, DDR clock: 533 MHz (1066 Mhz DDR), DFM clock: 533 MHz
OCTEON CN6880-AAP pass 2.2, Core clock: 1000 MHz, IO clock: 800 MHz, DDR clock: 667 MHz (1334 Mhz DDR)
...
CPU0: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU1: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU2: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU3: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU4: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU5: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU6: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07
CPU7: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz stepping 07

PA-850

mgmt kernel: Linux version 3.10.87-oct2-mp (build@2ce2b9b3bc97) (gcc version 4.7.0 (Cavium Inc. Version: SDK_BUILD build 49) ) #4 SMP Thu Aug 17 09:28:02 EDT 2017
RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=8.

SMP: Booting CPU01 (CoreId 1)...
CPU revision is: 000d9702 (Cavium Octeon III)
SMP: Booting CPU02 (CoreId 2)...
CPU revision is: 000d9702 (Cavium Octeon III)
....
SMP: Booting CPU07 (CoreId 7)...
CPU revision is: 000d9702 (Cavium Octeon III)
Brought up 8 CPUs

Be the first to comment on "Palo Alto Networks Firewall Hardware Internals"

Leave a comment

Your email address will not be published.


*