#Jetson Jetson Nano  Jetson TX1 Jetson TX2  Jetson Xavier  Benchmarks
Explore tagged Tumblr posts
fastcompression · 6 years ago
Text
Benchmark comparison for Jetson Nano, TX2, Xavier NX and AGX
Author: Fyodor Serzhenko
NVIDIA has released a series of Jetson hardware modules for embedded applications. NVIDIA® Jetson is the world's leading embedded platform for image processing and DL/AI tasks. Its high-performance, low-power computing for deep learning and computer vision makes it the ideal platform for mobile compute-intensive projects.
We've developed an Image & Video Processing SDK for NVIDIA Jetson hardware. Here we present performance benchmarks for the available Jetson modules. As an image processing pipeline, we consider a basic camera application as a good example for benchmarking.
Tumblr media
Hardware features for Jetson Nano, TX2, Xavier NX and AGX Xavier
Here we present a brief comparison for Jetsons hardware features to see the progress and variety of mobile solutions from NVIDIA. These units are aimed at different markets and tasks
Table 1. Hardware comparison for Jetson modules
Tumblr media
In camera applications, we can usually hide Host-to-Device transfers by implementing GPU Zero Copy or by overlapping GPU copy/compute. Device-to-Host transfers can be hidden via copy/compute overlap.
Hardware and software for benchmarking
CPU/GPU NVIDIA Jetson Nano, TX2, Xavier NX and AGX Xavier
OS L4T (Ubuntu 18.04)
CUDA Toolkit 10.2 for Jetson Nano, TX2, Xavier NX and AGX Xavier
Fastvideo SDK 0.16.4
NVIDIA Jetson Comparison: Nano vs TX2 vs Xavier NX vs AGX Xavier
For these NVIDIA Jetson modules, we've done performance benchmarking for the following standard image processing tasks which are specific for camera applications: white balance, demosaic (debayer), color correction, resize, JPEG encoding, etc. That's not the full set of Fastvideo SDK features, but it's just an example to see what kind of performance we could get from each Jetson. You can also choose a particular debayer algorithm and output compression (JPEG or JPEG2000) for your pipeline.
Tumblr media
Table 2. GPU kernel times for 2K image processing (1920×1080, 16 bits per channel, milliseconds)
Tumblr media
Total processing time is calculated for the values from the gray rows of the table. This is done to show the maximum performance benchmarks for a specified set of image processing modules which correspond to real-life camera applications.
Each Jetson module was run with maximum performance
MAX-N mode for Jetson AGX Xavier
15W for Jetson Xavier NX and Jetson TX2
10W for Jetson Nano
Here we've compared just the basic set of image processing modules from Fastvideo SDK to let Jetson developers evaluate the expected performance before building their imaging applications. Image processing from RAW to RGB or RAW to JPEG are standard tasks, and now developers can get detailed info about expected performance for the chosen pipeline according to the table above. We haven't tested Jetson H.264 and H.265 encoders and decoders in that pipeline. As soon as H.264 and H.265 encoders are working at the hardware level, encoding can be done in parallel with CUDA code, so we should be able to get even better performance.
We've done the same kernel time measurements for NVIDIA GeForce and Quadro GPUs. Here you can get the document with the benchmarks.
Software for Jetson performance comparison
We've released the software for a GPU-based camera application on GitHub, and it's available to download both binaries and source codes for our gpu camera sample project. It's implemented for Windows 7/10, Linux Ubuntu 18.04 and L4T. Apart from a full image processing pipeline on GPU for still images from SSD and for live camera output, there are options for streaming and for glass-to-glass (G2G) measurements to evaluate real latency for camera systems on Jetson. The software currently works with machine vision cameras from XIMEA, Basler, JAI, Matrix Vision, Daheng Imaging, etc.
To check the performance of Fastvideo SDK on a laptop/desktop/server GPU without any programming, you can download Fast CinemaDNG Processor software with GUI for Windows or Linux. That software has a Performance Benchmarks window, and there you can see timing for each stage of image processing. This is a more sofisticated method of performance testing, because the image processing pipeline in that software can be quite advanced, and you can test any module you need. You can also perform various tests on images with different resolutions to see how much the performance depends on image size, content and other parameters.
Other blog posts from Fastvideo about Jetson hardware and software
Jetson Image Processing
Jetson Zero Copy
Jetson Nano Benchmarks on Fastvideo SDK
Jetson AGX Xavier performance benchmarks
JPEG2000 performance benchmarks on Jetson TX2
Remotely operated walking excavator on Jetson
Low latency H.264 streaming on Jetson TX2
Performance speedup for Jetson TX2 vs AGX Xavier
Source codes for GPU-Camera-Sample software on GitHub to connect USB3 and other cameras to Jetson
Original article see at: https://www.fastcompression.com/blog/jetson-benchmark-comparison.htm
Subscribe to our mail list: https://mailchi.mp/fb5491a63dff/fastcompression
0 notes
fastcompression · 6 years ago
Text
Jetson Nano Benchmarks on Fastvideo SDK
Embedded imaging applications can definitely benefit from the latest release of NVIDIA Jetson Nano hardware. NVIDIA Jetson Nano is a small, powerful computer with embedded GPU that lets you run multiple neural networks in parallel for applications like image classification, object detection, segmentation, and speech processing.
We've tested Image & Video Processing SDK from Fastvideo with NVIDIA Jetson Nano Developer Kit and here we present our results of benchmarking for software modules which are specific for camera applications.
Tumblr media
Fig.1. Jetson Nano Module
NVIDIA Jetson Nano hardware: Quad Core, 4GB RAM, GPU
128-core Maxwell GPU (for display and compute)
Quad-core ARM A57 @ 1.43 GHz (main CPU)
4 GB LPDDR4 (rated at 25.6 GB/s)
Gigabit Ethernet
4x USB 3.0, USB 2.0 Micro-B (the Micro USB port could be utilized both for 5V power input and for data)
HDMI 2.0 & eDP 1.4 (4K monitor support, HDMI or Display Port)
Support of MIPI CSI-2 and PCIe Gen2 high-speed I/O
DC Barrel jack for 5V power input
Storage microSD
Dimensions: 100 mm × 80 mm × 29 mm (carrier board is included)
It's interesting to note that according to CUDA Device Query application, the name of tested Jetson Nano module is "NVIDIA Tegra X1" with CUDA Capability 5.3. So it reminds Jetson TX1, but with half of CUDA Cores.
Video Encoding and Decoding Options (NVIDIA NVENC and NVDEC benchmarks)
Video Encode 4K @ 30 fps, 4x for 1080p @ 30 fps, 9x for 720p @ 30 fps (H.264/H.265)
Video Decode 4K @ 60 fps, 2x for 4K @ 30 fps, 8x for 1080p @ 30 fps, 18x for 720p @ 30 fps (H.264/H.265)
Tumblr media
Fig.2. Jetson Nano Developer Kit
Hardware and software for benchmarking
CPU/GPU NVIDIA Jetson Nano Developer Kit
OS L4T (Ubuntu 18.04)
JetPack 4.2 with CUDA CUDA Toolkit 10.0
Fastvideo SDK 0.14.1
Jetson Nano Power Consumption and Power Management
In Jetson Nano hardware, NVIDIA uses Dynamic Voltage and Frequency Scaling (DVFS) approach. That power management technology is utilized in most of modern computer hardware to maximize power savings, where the voltage used in a component is increased or decreased, depending upon external conditions.
Jetson Nano Developer Kit is configured to accept power via the Micro USB connector. Some Micro USB power supplies are designed in such a way to output slightly more than 5V to account for voltage loss across the cable. The critical point is that the Jetson Nano module requires a minimum of 4.75V to operate. It's recommended to use a power supply capable of delivering 5V at the J28 Micro-USB connector.
There are some other power supply options for Jetson Nano. If total load is expected to exceed 2A, e.g., due to peripherals attached to the carrier board or due to high performance computational tasks, you have to lock the J48 Power Select pins disable power supply via Micro USB and enable 5V-4A via the J25 power jack. Another option is to supply 5V-6A via the J41 expansion header (two 5V pins can be used to power the developer kit at 3A each). The Jetson Nano Developer Kit is equipped with a passive heatsink, to which a fan can be mounted.
Tumblr media
Fig.3. Top View of Jetson Nano Developer Kit
In general, total power usage comprised of carrier board, Jetson Nano module and peripherals. It is determined by particular use case. The carrier board consumes between 0.5W (at 2A) and 1.25W (at 4A) with no peripherals attached.
Jetson Nano module is designed to optimize power efficiency and it supports two software-defined power modes. The default mode provides a 10W power budget for the modules, and the other, a 5W budget. These power modes constrain the module to near their 10W or 5W budgets by capping the GPU and CPU frequencies and the number of online CPU cores.
Individual parts of the CORE power domain, such as video encode (NVENC) and video decode (NVDEC), are not covered by these budgets. This is a reason why power modes constrain Jetson Nano module to near a power budget, but not to the exact power budget. Your particular use case determines the module’s actual power consumption.
According to the performed tests with Fastvideo SDK, normal operation of Jetson Nano Developer Kit in 10W mode required more power than USB can offer (5V and 2A). USB-powered Jetson Nano can't work continuously under heavy workload on default clock (no jetson_clocks applied). It hanged up in 30-60 seconds after workload began. It seems to be due to power consumption by carrier board and other periphery devices. USB-powered Jetson Nano is working perfectly in 5W mode, but with less performance.
For Jetson Nano benchmark measurements was used external power supply with 5V and 4A. This is more than we could get from a standard Micro USB power adapter (5V and 2A), but it's necessary to get high performance. As we understand, one could get even better performance by supplying more power to Jetson Nano.
To manage the speed and the amount of power consumed on the NVIDIA Jetson Nano, we use nvpmodel -m0 and jetson_clocks to get maximum performance.
Jetson Nano Benchmark Performance for Camera Applications
For Jetson Nano we've done benchmarks for the following image processing kernels which are conventional for camera applications: white balance, demosaic, color correction, LUT, resize, gamma, jpeg / jpeg2000 / h.264 encoding, etc. It's not a full set of Fastvideo SDK features, but this is just an example of what we could get with Jetson Nano.
We've measured GPU kernel time for each image processing module to get understanding of how fast it could be done on Jetson Nano. This is the way to evaluate total time for the chosen set of modules from Fastvideo SDK. As soon as for some modules the performance depends on image content, you can request Fastvideo SDK for NVIDIA Jetson Nano (or for any othe NVIDIA GPU) for evaluation and to carry on with your own tesing.
CUDA initialization and GPU memory buffers allocations are not included in the benchmarks. Usually we do that just once, before the measurements, so it doesn't affect GPU performance.
For testing we've utilized 2K raw image (1920×1080, 8-bit) and 4K raw image (3840×2160, 8-bit), though all computations were carried out with 16-bit precision. Before JPEG compression we've converted 16-bit data to 8-bit per channel to comply with JPEG Standard. JPEG2000 compression benchmarks were measured for 24-bit images with 4:4:4 subsampling.
We've marked with gray color those rows in the Tables which are included in the simplest image processing pipeline of camera application for 2K and 4K resolutions. That pipeline consists of Host to Device Transfer, White Balance, HQLI Debayer, Color Correction, Gamma, JPEG compression, Device to Host Transfer. In the latest row of each Table we have shown the total GPU kernel time in ms, performance in MB/s and achieved FPS for the pipeline.
Table 1. Jetson Nano performance benchmarks for 2K raw image processing (1920×1080, 8-bit)
Tumblr media
In real life camera application, there is a possibility to eliminate Host to Device copy by utilizing Jetson Zero-Copy. In that case, image from a camera is written via DMA directly to pinned buffer in system memory. Pinned buffer is accessible in both CPU and GPU. As other option, Device to Host copy could be hidden by overlapping of data transfer and computations in multi-thread application. Jetson Nano can do concurrent copy and kernel execution with 1 copy engine.
We can see that for the simplest image processing pipeline for 2K image on NVIDIA Jetson Nano we can reach 100 fps performance. If we utilize H.264 encoding via hardware-based NVENC (instead of Fastvideo CUDA-based Motion JPEG encoding) for the same pipeline, we could get 120 fps performance, which is the limitation of H.264 encoder (NVENC) for 2K resolution.
Table 2. Jetson Nano performance benchmarks for 4K raw image processing (3840×2160, 8-bit)
Tumblr media
The same image processing pipeline for 4K RAW image on NVIDIA Jetson Nano could bring us the performance 30 fps. If we utilize H.264 encoding via hardware-based NVENC (instead of Fastvideo JPEG or MJPEG on GPU), we still get not more than 30 fps, which is the maximum for H.264 encoder (NVENC) for 4K resolution, but GPU occupancy in that case would be less.
We can see that Jetson Nano has sufficient performance for image processing in camera applications. For resolutions up to 4K we can get realtime performance to convert RAW to RGB with JPEG or H.264 compression.
Here we've published just a small part of Jetson Nano benchmarks that we've actually got with Fastvideo SDK. We would suggest to test that SDK with your image processing pipeline. You can send us your request to get evaluation version of Fastvideo Image Processing SDK for Jetson Nano, TK1, TX1, TX2 or AGX Xavier to carry out your testing for your images and your pipeline. Just fill the Contact Form below to get that SDK for your Jetson.
Other blog posts from Fastvideo about Jetson hardware and software
Jetson Image processing
Jetson Nano vs TX1 vs TX2 vs Xavier Benchmark Comparison
Jetson Zero Copy
Low latency H.264 streaming from Jetson TX2 to PC
Performance speedup for Jetson TX2 vs AGX Xavier
Remotely operated walking excavator on Jetson
Jetson AGX Xavier performance benchmarks
Original article see at: https://www.fastcompression.com/blog/jetson-nano-benchmarks-image-processing.htm
0 notes
fastcompression · 6 years ago
Text
Remotely operated walking excavator on Jetson at Bauma 2019 Trade Fair
We invite you to experience a demonstration taking place at the Menzi Muck stand at Bauma, the World’s Leading Trade Fair for construction machinery, mining machines, construction vehicles and equipment.
Exhibition location: Messegelände, 81823 Munich, Germany Dates: April 8–14th, 2019
The Demonstration of remotely controlled Menzi Muck walking excavator will be held outdoors in the Open-air Area North on stand FN.824/1 The daily demo slots are: 11:00, 12:30, 14:00, 15:30, 17:00
Menzi Muck walking excavator
Imaging Hardware
Two 3.1 MPix XIMEA cameras with Sony IMX252 image sensor
Connected to NVIDIA Jetson TX2 via carrier board
Realtime image processing software
Provides up to 60 fps of synchronized image acquisition from each of the two XIMEA cameras
Realtime processing is offering H.264/H.265 encoding and streaming (including black level, white balance, demosaicing, autoexposure, etc.)
Glass-to-glass video latency over 4G/5G network ~50 ms
Video:
https://youtu.be/OExJWg2ZzQ8
Links
Menzi Muck AG
ETH Zürich, Mechanical and Process Engineering Dept.
MRTech SK
XIMEA Corp.
Fastvideo
Jetson Nano vs TX1 vs TX2 vs Xavier Benchmark Comparison
Original article see at: https://www.fastcompression.com/blog/jetson-video-system-walking-excavator.htm
0 notes