This article details key considerations for building a workstation for facial recognition. For specific workstation use cases and applications read our top 7 use cases in 2022.
Workstations are high-performing computers designed to improve workflow and perform multiple functions at once. They do much more than your average PC.
Technically speaking, workstations are comprised of a variety of hardware — such as the high-end x86 Intel CPU (either Core or Xeon), AMD CPU, and NVIDIA GPU. They often run on Linux or Windows. For facial recognition, any of these configurations can successfully run and manage multiple live video streams in a single workstation.
There are two common workstation infrastructure setups: on-premises or cloud-based.
Workstations used for security and surveillance are best installed on site and directly connected to devices via the intranet (internal network). This setup allows tens to hundreds of cameras performing facial recognition throughout a single building, plant, or campus to communicate back to the workstation. The intranet guarantees security, with all image and video feeds controlled within the organization’s firewall.
In addition to security, intranet networks provide other benefits for workstation performance. Many surveillance cameras run 1080p video streams which use 2 to 8 Mbps of bandwidth. High-speed intranets are capable of meeting such high bandwidth requirements.
When optimally designed, on-premises workstations can decode large amounts of data in real time. This is one of the most important factors to consider. Surveillance cameras encode video streams into H.264 (AVC) or H.265 (HEVC) codec for transmission. AI vision algorithms then decode the compressed video stream into raw video buffers. Workstations with GPUs are able to decode this data in real time. Many GPUs can decode more than 40 video streams simultaneously. GPUs by NVIDIA can accelerate AI algorithms to run on top of decoded video buffers, offering unmatched runtime performance and computing capacity.
As an alternative to on-premises setups, workstations can also be installed in cloud or data centers. While on-premises installations are hosted in-house, cloud installations are hosted by third-party providers in a public cloud, a private cloud, or a hybrid of the two. This external workstation does not require much effort from the end user, and allows businesses to scale up or down easily, but security risks and exposure can be higher when a third party manages data. You can limit risk by renting space at a data center to set up your own cloud-based servers or workstations.
It is common for servers to be deployed in the cloud when use cases involve devices such as mobile phones, or transmitting images or videos for facial recognition.
The two most popular configurations are workstation-grade and server-grade.
For most on-premise installations where you need to monitor tens to hundreds of surveillance cameras, a workstation-grade configuration is best.
A workstation-grade configuration is best for on-premises installations where you need to monitor tens to hundreds of surveillance cameras. We recommend using Intel Core i, Xeon CPU, or AMD server-class CPU; with NVIDIA Quadro GPU.
These configurations are well suited for facial recognition and other AI vision algorithms for multiple surveillance camera channels and feeds. Supermicro™ makes workstations that match these popular configurations. They typically include up to four GPU cards in a singular framework, enabling them to handle hundreds of video channels, especially when using a fully optimized facial recognition algorithm like FaceMe®.
For optimal performance we recommend using the Quadro GPU series, such as the Quadro RTX 6000/8000, or the NVIDIA RTX A6000. If budget is a constraint, the Quadro RTX 5000 is a more affordable solution but it cannot handle as much traffic or as many video channels as the higher-level models. The Quadro RTX 6000 and Quadro RTX 8000 have similar performance for facial recognition. When we tested on our VH model for FaceMe, both provided 340fps. The Quadro RTX 5000 supported 220fps. The highest-performing GPU is the RTX A6000 with 410fps.
Other viable options include the GeForce series GPU, such as the RTX 3090. While offering high performance at a low cost, they are not designed for 24/7 use and only have a one-year warranty. For these reasons the Quadro series offers better options. All Quadro series GPU card frameworks are well designed and have fans so they do not require strict temperature or humidity control. They are flexible and easier to deploy in nearly any facility, from smart offices to factories to commercial buildings.
A server-grade configuration is best for cloud-based installations where you need to handle facial recognition requests from hundreds to millions of devices. We recommend a combination of CPU and fanless (passive cooling) GPU.
Because server-grade configurations are housed in data centers and server rooms, temperature and humidity must be controlled. GPUs often include fans to help exhaust heat and maintain high performance for heavy workloads. However, passive cooling cards like the Tesla V100 or T4 are completely fanless. Instead, they are built with a heat sink that requires good air flow to ensure the GPU operates within thermal limits. Fanless GPUs with passive cooling are best for server and data center environments.
We recommend either the iEi HTB-200-C236 or Advantech MIC-770 systems. Both are well designed, providing great airflow and catering to the NVIDIA T4 heat sink. Temperature is controlled, keeping them well under thermal limits while the GPU operates at its full workload. For more information on qualified servers you can check out NVIDIA’s Qualified Server Catalog.
The number of servers and GPUs needed to run facial recognition depends on how many transactions per second are required during peak hours. Consider the user base, including the number of daily active users as well as frequency. We tested the FaceMe VH model and found the T4 was able to support facial recognition for 192fps with the GPU temperature well controlled.
To save on server computation power, we suggest moving some workloads from the server to the edge. For example, you could run face detection on the edge and face recognition on the server. This split frees up server power — only requiring it to process frames that contain a face. You could also have face detection, face template extraction, and anti-spoofing on the edge for high-end smartphones (such as Android phones using Snapdragon). Then the server would only need to handle face extraction for lower-end to middle-end smartphones (like legacy iPhones).
Overall, when thinking about server configurations and which is best for your needs, you will want to choose a facial recognition algorithm that supports both server and edge deployment. For more on edge-based deployments visit Facial Recognition at the Edge - The Ultimate Guide 2022.
ystem architecture should always minimize the data flow between CPU, GPU, and memory. As outlined in our in-depth article, Facial Recognition at the Edge - The Ultimate Guide 2022, it can be challenging to design a good facial recognition system. With dozens of concurrent video streams running between CPU, GPU, and memory, even the strongest facial recognition algorithms will be slow if not properly implemented.
FaceMe has an optimized system architecture that ensures the highest performance. For example, on a single workstation FaceMe with NVIDIA RTX A6000 can handle 340 to 410fps (the exact number may vary depending on which FaceMe facial recognition model is used). This is equivalent to handling 25 to 41 concurrent video channels (each with 10fps) per GPU, an outstanding cost-performance offering. The following table lists the facial recognition performance of several popular GPUs that we tested with our latest model:
* Tested with 1080p image, each image contains one face.
The two most common OSes for servers and workstations are Windows and Linux. Both have unique pros and cons to consider before deciding which is right for you. It is important to consider which applications, or SDKs, you plan to use. A good facial recognition algorithm, like FaceMe, supports both Linux and Windows — granting more flexibility to plan and manage your platforms. Let’s take a closer look at each.
Linux is a popular choice because there are no license fees. It is most often used for server-grade configurations. It’s open source, reliable, stable, and able to operate 24/7. Linux is also easy to control and manage — we highly recommend it.
Of the Linux variations, Ubuntu is the most popular (with Debian and Red Hat following close behind). Ubuntu is well liked because it’s easy to set up Docker, the most popular container deployment and management platform.
Windows is the dominant desktop OS, and many IT teams are more familiar with it than Linux. If this the case for your business, choosing a server that runs Windows is a good option. In addition, if you want your system to run other Microsoft applications, such as Exchange, Microsoft SQL, or Active Directory, it will be smart to keep the OS consistent. This will make it much easier to write, develop, and maintain all application components on the server.
There are several factors to consider when building a workstation to suit your specific facial recognition requirements. When evaluating options, one of the most important things is to understand the specific use case and performance needs, as well as where you want the workstation to live: either on-premises or in the cloud.
Once you think you have the right build and design in mind, we recommend conducting proof-of-concept (POC) so you can make improvements and adjustments before fully launching.