EmbDev.net

Forum: ARM programming with GCC/GNU tools Help understanding Neon intrinsics


von Chris (chrisz)


Rate this post
useful
not useful
Hi everyone,

I am new to Arm Neon programming, and am having a very difficult time 
making sense of the conventions for mapping gcc intrinsics to actual Arm 
Neon v8 instructions. Can someone explain to me how to do the following:

I am making a custom hashing function that uses AES rounds. All the data 
is byte wide and in fixed blocks of 16MB, and I am just trying to pipe 
it into the ARM Neon core as quickly as possible. From reading the Arm 
developer documentation here:

https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/LD1--multiple-structures---Load-multiple-single-element-structures-to-one--two--three--or-four-registers-?lang=en

I want to use a sequence of these instructions:
1
LD1  {  V0.16B,  V1.16B,  V2.16B,  V3.16B }, [x0], #64
2
LD1  {  V4.16B,  V5.16B,  V6.16B,  V7.16B }, [x0], #64
3
...

However, after reading this page:

https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#load-1

I can't find any intrinsic which actually generates this instruction, 
let alone the GCC convention for using these intrinsics.

Can someone give me an example of a simple C program, compilable by gcc 
(preferably v9.4 as that is what is available on Ubuntu Bionic, the 
version running on the embedded platform), that can access these 
instructions? I'm fine with embedded assembly if that is what is 
necessary, but I can't figure out the correct syntax to make it work.

Thank you for any advice.

von Johann L. (gjlayde)


Rate this post
useful
not useful
When built-in functions are too awkward to use (or don't exist to begin 
with) maybe better switch to [inline] assembly.

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.