#llvm
Explore tagged Tumblr posts
Text
LLVM should've been written in lisp.
11 notes
·
View notes
Text
LLVM Weekly - #591, April 28th 2025
AsiaLLVM program, GCC 15 released, instrumentation framework proposal, linker benchmarking, Flang supported standards, and more
0 notes
Text
Intel’s oneAPI 2024 Kernel_Compiler Feature Improves LLVM

Kernel_Compiler
The kernel_compiler, which was first released as an experimental feature in the fully SYCL2020 compliant Intel oneAPI DPC++/C++ compiler 2024.1 is one of the new features. Here’s another illustration of how Intel advances the development of LLVM and SYCL standards. With the help of this extension, OpenCL C strings can be compiled at runtime into kernels that can be used on a device.
For offloading target hardware-specific SYCL kernels, it is provided in addition to the more popular modes of Ahead-of-Time (AOT), SYCL runtime, and directed runtime compilation.
Generally speaking, the kernel_compiler extension ought to be saved for last!
Nonetheless, there might be some very intriguing justifications for leveraging this new extension to create SYCL Kernels from OpenCL C or SPIR-V code stubs.
Let’s take a brief overview of the many late- and early-compile choices that SYCL offers before getting into the specifics and explaining why there are typically though not always better techniques.
Three Different Types of Compilation
The ability to offload computational work to kernels running on another compute device that may be installed on the machine, such as a GPU or an FPGA, is what SYCL offers your application. Are there thousands of numbers you need to figure out? Forward it to the GPU!
Power and performance are made possible by this, but it also raises more questions:
Which device are you planning to target? In the future, will that change?
Could it be more efficient if it were customized to parameters that only the running program would know, or do you know the complete domain parameter value for that kernel execution? SYCL offers a number of choices to answer those queries:
Ahead-of-Time (AoT) Compile: This process involves compiling your kernels to machine code concurrently with the compilation of your application.
SYCL Runtime Compilation: This method compiles the kernel while your application is executing and it is being used.
With directed runtime compilation, you can set up your application to generate a kernel whenever you’d want.
Let’s examine each one of these:
1. Ahead of Time (AoT) Compile
You can also precompile the kernels at the same time as you compile your application. All you have to do is specify which devices you would like the kernels to be compiled for. All you need to do is pass them to the compiler with the -fsycl-targets flag. Completed! Now that the kernels have been compiled, your application will use those binaries.
AoT compilation has the advantage of being easy to grasp and familiar to C++ programmers. Furthermore, it is the only choice for certain devices such as FPGAs and some GPUs.
An additional benefit is that your kernel can be loaded, given to the device, and executed without the runtime stopping to compile it or halt it.
Although they are not covered in this blog post, there are many more choices available to you for controlling AoT compilation. For additional information, see this section on compiler and runtime design or the -fsycl-targets article in Intel’s GitHub LLVM User Manual.
SPIR-V
2. SYCL Runtime Compilation (via SPIR-V)
If no target devices are supplied or perhaps if an application with precompiled kernels is executed on a machine with target devices that differ from what was requested, this is SYCL default mode.
SYCL automatically compiles your kernel C++ code to SPIR-V (Standard Portable Intermediate form), an intermediate form. When the SPIR-V kernel is initially required, it is first saved within your program and then sent to the driver of the target device that is encountered. The SPIR-V kernel is then converted to machine code for that device by the device driver.
The default runtime compilation has the following two main benefits:
First of all, you don’t have to worry about the precise target device that your kernel will operate on beforehand. It will run as long as there is one.
Second, if a GPU driver has been updated to improve performance, your application will benefit from it when your kernel runs on that GPU using the new driver, saving you the trouble of recompiling it.
However, keep in mind that there can be a minor cost in contrast to AoT because your application will need to compile from SPIR-V to machine code when it first delivers the kernel to the device. However, this usually takes place outside of the key performance route, before parallel_for loops the kernel.
In actuality, this compilation time is minimal, and runtime compilation offers more flexibility than the alternative. SYCL may also cache compiled kernels in between app calls, which further eliminates any expenses. See kernel programming cache and environment variables for additional information on caching.
However, if you prefer the flexibility of runtime compilation but dislike the default SYCL behavior, continue reading!
3. Directed Runtime Compilation (via kernel_bundles)
You may access and manage the kernels that are bundled with your application using the kernel_bundle class in SYCL, which is a programmatic interface.
Here, the kernel_bundle techniques are noteworthy.build(), compile(), and link(). Without having to wait until the kernel is required, these let you, the app author, decide precisely when and how a kernel might be constructed.
Additional details regarding kernel_bundles are provided in the SYCL 2020 specification and in a controlling compilation example.
Specialization Constants
Assume for the moment that you are creating a kernel that manipulates an input image’s numerous pixels. Your kernel must use a replacement to replace the pixels that match a specific key color. You are aware that if the key color and replacement color were constants instead of parameter variables, the kernel might operate more quickly. However, there is no way to know what those color values might be when you are creating your program. Perhaps they rely on calculations or user input.
Specialization constants are relevant in this situation.
The name refers to the constants in your kernel that you will specialize in at runtime prior to the kernel being compiled at runtime. Your application can set the key and replacement colors using specialization constants, which the device driver subsequently compiles as constants into the kernel’s code. There are significant performance benefits for kernels that can take advantage of this.
The Last Resort – the kernel_compiler
All of the choices that as a discussed thus far work well together. However, you can choose from a very wide range of settings, including directed compilation, caching, specialization constants, AoT compilation, and the usual SYCL compile-at-runtime behavior.
Using specialization constants to make your program performant or having it choose a specific kernel at runtime are simple processes. However, that might not be sufficient. Perhaps all your software needs to do is create a kernel from scratch.
Here is some source code to help illustrate this. Intel made an effort to compose it in a way that makes sense from top to bottom.
When is It Beneficial to Use kernel_compiler?
Some SYCL users already have extensive kernel libraries in SPIR-V or OpenCL C. For those, the kernel_compiler is a very helpful extension that enables them to use those libraries rather than a last-resort tool.
Download the Compiler
Download the most recent version of the Intel oneAPI DPC++/C++ Compiler, which incorporates the kernel_compiler experimental functionality, if you haven’t already. Purchase it separately for Windows or Linux, via well-known package managers only for Linux, or as a component of the Intel oneAPI Base Toolkit 2024.
Read more on Govindhtech.com
#oneAPI#Kernel_Compiler#LLVM#InteloneAPI#SYCL2020#SYCLkernels#FPGA#SYC#SPIR-Vkernel#OpenCL#News#Technews#Technology#Technologynews#Technologytrends#govindhtech
1 note
·
View note
Text
If I had a nickel for every C/C++ compiler with an abbreviated name that used to stand for something, but was later retconned to mean something else (or just nothing at all), I would have... at least 3 nickels? possibly more? why do they keep doing that? is there even a name for this phenomenon?
1 note
·
View note
Text
Embark on a journey with our new article that delves into the intricacies of MoAI, an innovative Mixture of Experts approach in an open-source Large Language and Vision Model (LLVM). Learn how MoAI leveraging auxiliary visual information and multiple intelligences to revolutionize the field. Discover how this model aligns and condenses outputs from external CV models, efficiently using relevant information for vision language tasks. Understand the unique blend of visual features, auxiliary features from external CV models, and language features that MoAI brings together.
#artificial intelligence#ai#open source#machine learning#machinelearning#nlp#MoAI#VisionLanguageModels#AI#ArtificialIntelligence#MachineLearning#DeepLearning#NLP#ComputerVision#FutureOfAI#TechNews#AIResearch#LLVM#OCR#OpenSource#DataScience#NeuralNetworks#KAIST#MixtureOfExperts#VisionLanguageTasks
0 notes
Text
Small post today
Today is a small post because I have decided to take a break from building the language... In order to learn a bit about LLVM This way I can start learning it when the size of the project is mangeable and learn as I go rather than having to do all the complicated thing at once
1 note
·
View note
Text
Microsoft migliora il supporto di Windows on Arm per Visual Studio con MAUI, LLVM, Node, Unity
Al suo evento Build 2023 di oggi, Microsoft ha condiviso alcuni degli sviluppi e dei progressi compiuti in termini di sviluppo di Windows on Arm. Nell’elenco delle applicazioni che stanno ricevendo gli aggiornamenti ci sono Visual Studio, Low Level Virtual Machine (LLVM), Node.JS , WiX Installer, Luminar Neo e Unity Player. Visual Studio è ora disponibile con il supporto Arm grazie a .NET…

View On WordPress
0 notes
Text
toycalculator, an MLIR/LLVM compiler experiment.
Over the last 8 years, I’ve been intimately involved in building a pair of LLVM based compilers for the COBOL and PL/I languages. However, a lot of my work was on the runtime side of the story. This was non-trivial work, with lots of complex interactions to figure out, but it also meant that I didn’t get to play with the fun (codegen) part of the compiler. Over the last month or so, I’ve been…
0 notes
Text
my main motivating factor for learning new programming languages is how funny I think it would be
#like fortran seems pretty silly to me#and lisp with its parentheses#tbh i only know 2 things about lisp (its in emacs and it uses parens a lot)#also LLVM IR#cause why would you write code in an ir#theres probably actually a good reason to write llvm ir please tell me kif there is thabks
0 notes
Note
re: the programming post
hold up, THERE'S A C COMPILER FOR 6502 COMPATIBLE SYSTEMS?! You're telling me I don't have to faff about with assembly?!
There are multiple C compilers for 6502. Like, atleast 8.
Many are kinda work-in-progress, but cc65 is the popular one. It also has a pretty solid assembler, with lots of hardware targets available.
Lots of folks are having good luck with the new hotness: llvm-mos, but it needs an x86 environment and I'm using a pi because of course I am.
I got overwhelmed trying to figure out how to make a new hardware target profile for the Cactus and I will try to circle back around to it later. That VIC-20 serial cartridge I've been building sorta took priority.
42 notes
·
View notes
Text
LLVM Weekly - #590, April 21st 2025. Recent Clang improvements, LLVM_LINK_DYLIB by default?, static analysis roundtable notes, __ptrauth support, Bolt instrumentation support for RISC-V, how maintainers handle post-commit review, and more
0 notes
Text
New AMD RDNA 4 GPU are the GFX1200 and GFX1201!Good news!

GPUs GFX1200 and GFX1201 for Radeon RX 8000 series found
The AMD RDNA 4:
The two AMD RDNA 4 “Navi 4X” GPUs were listed in the most recent LLVM project notes. A developer has brought attention to the fact that the new GFX12 targets, GFX1200 and GFX1201, need target names and ELF numbers to be provided. It is said that for the time being, both GPUs perform exactly like the GFX11 of the “RDNA 3” family, but because they are still a year away from commercialization, significant variations may occur in the future.
The Navi 41 SKU formerly utilized the GFX1200 and GFX1201 ID, however according to rumors, AMD shelved their Navi 4X flagships, which were intended to use the Navi 4C die for a chiplet-like design. According to reports, the corporation is now concentrating “only” on the mainstream and high-end markets.
The key distinction between flagship and high-end goods is that AMD classifies items costing more than $500 as enthusiast class, and these devices mostly use the best Navi-X1 dies. That is not likely to be the case this time around, however GFX1200 and GFX1201 it is.
The two RDNA-4 “Navi 4X” IDs may represent the Navi-44 and Navi-48 SKUs, based on a post by Olrak29_, who has previously assisted in creating the die block diagrams and settings for various RDNA GPUs. AMD has previously made it clear that the numbers does not reflect performance or segment location, despite the name maybe suggesting that they are very inexpensive SKUs.
Initial development of AMD’s next RDNA-4 architecture, which will power the Radeon RX 8000 series, has started. In an attempt to better outperform the RDNA-3 family and add new functionality to the Navi-4X chips, the red team has announced interesting developments for their next product line.
A fully redesigned graphics pipeline, enhanced ray tracing capabilities, increased efficiency, and the astute use of AI accelerators are a few of them. These are positive changes that should help AMD and the graphics card business in the future. Based on the RDNA-4 architecture, AMD intends to release its next generation of Radeon graphics processors around 2024. There will be usage of a novel production technique.
Read more on Govindhtech.com
#AMD#RDNA4#GPU#GFX1200#GFX1201#RX8000series#Navi4X#LLVM#GPUs#RDNAGPUs#NaviX1#technews#technology#govindhtech
0 notes
Note
How DOES the C preprocessor create two generations of completely asinine programmers??
oh man hahah oh maaan. ok, this won't be very approachable.
i don't recall what point i was trying to make with the whole "two generations" part but ill take this opportunity to justifiably hate on the preprocessor, holy fuck the amount of damage it has caused on software is immeasurable, if you ever thought computer programmers were smart people on principle...
the cpp:
there are like forty preprocessor directives, and they all inject a truly mind-boggling amount of vicious design problems and have done so for longer than ive been alive. there really only ever needed to be one: #include , if only to save you the trouble of manually having to copy header files in full & paste them at the top of your code. and christ almighty, we couldn't even get that right. C (c89) has way, waaaay fewer keywords than any other language. theres like 30, and half of those aren't ever used, have no meaning or impact in the 21st century (shit like "register" and "auto"). and C programmers still fail to understand all of them properly, specifically "static" (used in a global context) which marks some symbol as inelligible to be touched externally (e.g. you can't use "extern" to access it). the whole fucking point of static is to make #include'd headers rational, to have a clear seperation between external, intended-to-be-accessed API symbols, and internal, opaque shit. nobody bothers. it's all there, out in the open, if you #include something, you get all of it, and brother, this is only the beginning, you also get all of its preprocessor garbage.
this is where the hell begins:
#if #else
hey, do these look familiar? we already fucking have if/else. do you know what is hard to understand? perfectly minimally written if/else logic, in long functions. do you know what is nearly impossible to understand? poorly written if/else rats nests (which is what you find 99% of the time). do you know what is completely impossible to understand? that same poorly-written procedural if/else rat's nest code that itself is is subject to another higher-order if/else logic.
it's important to remember that the cpp is a glorified search/replace. in all it's terrifying glory it fucking looks to be turing complete, hell, im sure the C++ preprocessor is turing complete, the irony of this shouldn't be lost on you. if you have some long if/else logic you're trying to understand, that itself is is subject to cpp #if/#else, the logical step would be to run the cpp and get the output pure C and work from there, do you know how to do that? you open the gcc or llvm/clang man page, and your tty session's mem usage quadruples. great job idiot. trying figuring out how to do that in the following eight thousand pages. and even if you do, you're going to be running the #includes, and your output "pure C" file (bereft of cpp logic) is going to be like 40k lines. lol.
the worst is yet to come:
#define #ifdef #ifndef (<- WTF) #undef you can define shit. you can define "anything". you can pick a name, whatever, and you can "define it". full stop. "#define foo". or, you can give it a value: "#define foo 1". and of course, you can define it as a function: "#define foo(x) return x". wow. xzibit would be proud. you dog, we heard you wanted to kill yourself, so we put a programming language in your programming language.
the function-defines are pretty lol purely in concept. when you find them in the wild, they will always look something like this:
#define foo(x,y) \ (((x << y)) * (x))
i've seen up to seven parens in a row. why? because since cpp is, again, just a fucking find&replace, you never think about operator precedence and that leads to hilarious antipaterns like the classic
#define min(x,y) a < b ? a : b
which will just stick "a < b ? a: b" ternary statement wherever min(.. is used. just raw text replacement. it never works. you always get bitten by operator precedence.
the absolute worst is just the bare defines:
#define NO_ASN1 #define POSIX_SUPPORTED #define NO_POSIX
etc. etc. how could this be worse? first of all, what the fuck are any of these things. did they exist before? they do now. what are they defined as? probably just "1" internally, but that isn't the point, the philosophy here is the problem. back in reality, in C, you can't just do something like "x = 0;" out of nowhere, because you've never declared x. you've never given it a type. similar, you can't read its value, you'll get a similar compiler error. but cpp macros just suddenly exist, until they suddenly don't. ifdef? ifndef? (if not defined). no matter what, every permutation of these will have a "valid answer" and will run without problem. let me demonstrate how this fucks things up.
do you remember "heartbleed" ? the "big" openssl vulnerability ? probably about a decade ago now. i'm choosing this one specifically, since, for some reason, it was the first in an annoying trend for vulns to be given catchy nicknames, slick websites, logos, cable news coverage, etc. even though it was only a moderate vulnerability in the grand scheme of things...
(holy shit, libssl has had huge numbers of remote root vulns in the past, which is way fucking worse, heartbleed only gave you a random sampling of a tiny bit of internal memory, only after heavy ticking -- and nowadays, god, some of the chinese bluetooth shit would make your eyeballs explode if you saw it; a popular bt RF PHY chip can be hijacked and somehow made to rewrite some uefi ROMs and even, i think, the microcode on some intel chips)
anyways, heartbleed, yeah, so it's a great example since you could blame it two-fold on the cpp. it involved a generic bounds-checking failure, buf underflow, standard shit, but that wasn't due to carelessness (don't get me wrong, libssl is some of the worst code in existence) but because the flawed cpp logic resulted in code that:
A.) was de-facto worthless in definition B.) a combination of code supporting ancient crap. i'm older than most of you, and heartbleed happened early in my undergrad. the related legacy support code in question hadn't been relevant since clinton was in office.
to summarize, it had to do with DTLS heartbeats. DTLS involves handling TLS (or SSLv3, as it was then, in the 90s) only over UDP. that is how old we're talking. and this code was compiled into libssl in the early 2010s -- when TLS had been the standard for a while. TLS (unlike SSLv3 & predecessors) runs over TCP only. having "DTLS heartbeat support in TLS does not make sense by definition. it is like drawing a triangle on a piece of paper whose angles don't add up to 180.
how the fuck did that happen? the preprocessor.
why the fuck was code from last century ending up compiled in? who else but!! the fucking preprocessor. some shit like:
#ifndef TCP_SUPPORT <some crap related to UDP heartbeats> #endif ... #ifndef NO_UDP_ONLY <some TCP specific crap> #endif
the header responsible for defining these macros wasn't included, so the answer to BOTH of these "if not defined" blocks is true! because they were never defined!! do you see?
you don't have to trust my worldview on this. have you ever tried to compile some code that uses autoconf/automake as a build system? do you know what every single person i've spoken to refers to these as? autohell, for automatic hell. autohell lives and dies on cpp macros, and you can see firsthand how well that works. almost all my C code has the following compile process:
"$ make". done. Makefile length: 20 lines.
the worst i've ever deviated was having a configure script (probably 40 lines) that had to be rune before make. what about autohell? jesus, these days most autohell-cursed code does all their shit in a huge meta-wrapper bash script (autogen.sh), but short of that, if you decode the forty fucking page INSTALL doc, you end up with:
$ automake (fails, some shit like "AUTOMAKE_1.13 or higher is required) $ autoconf (fails, some shit like "AUTOMCONF_1.12 or lower is required) $ aclocal (fails, ???) $ libtoolize (doesn't fail, but screws up the tree in a way that not even a `make clean` fixes $ ???????? (pull hair out, google) $ autoreconf -i (the magic word) $ ./configure (takes eighty minutes and generates GBs of intermediaries) $ make (runs in 2 seconds)
in conclusion: roflcopter
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ disclaimer | private policy | unsubscribe
159 notes
·
View notes
Text
oh shit, whitequark is doing a non-toy port of LLVM to WASM. There's some demos of this in the past but a real working WASM clang would be huge. I considered trying to get a WASM clang up for the RGBLED thing but I'm an idiot who can't do that, whereas whitequark is not.
30 notes
·
View notes
Text
I desprately need someone to talk to about this
I've been working on a system to allow a genetic algorithm to create DNA code which can create self-organising organisms. Someone I know has created a very effective genetic algorithm which blows NEAT out of the water in my opinion. So, this algorithm is very good at using food values to determine which organisms to breed, how to breed them, and the multitude of different biologically inspired mutation mechanisms which allow for things like meta genes and meta-meta genes, and a whole other slew of things. I am building a translation system, basically a compiler on top of it, and designing an instruction set and genetic repair mechanisms to allow it to convert ANY hexadecimal string into a valid, operable program. I'm doing this by having an organism with, so far, 5 planned chromosomes. The first and second chromosome are the INITIAL STATE of a neural network. The number and configuration of input nodes, the number and configuration of output nodes, whatever code it needs for a fitness function, and the configuration and weights of the layers. This neural network is not used at all in the fitness evaluation of the organism, but purely something the organism itself can manage, train, and utilize how it sees fit.
The third is the complete code of the program which runs the organism. Its basically a list of ASM opcodes and arguments written in hexadecimal. It is comprised of codons which represent the different hexadecimal characters, as well as a start and stop codon. This program will be compiled into executable machine code using LLVM IR and a custom instruction set I've designed for the organisms to give them a turing complete programming language and some helper functions to make certain processes simpler to evolve. This includes messages between the organisms, reproduction methods, and all the methods necessary for the organisms to develop sight, hearing, and recieve various other inputs, and also to output audio, video, and various outputs like mouse, keyboard, or a gamepad output. The fourth is a blank slate, which the organism can evolve whatever data it wants. The first half will be the complete contents of the organisms ROM after the important information, and the second half will be the initial state of the organisms memory. This will likely be stored as base 64 of its hash and unfolded into binary on compilation.
The 5th chromosome is one I just came up with and I am very excited about, it will be a translation dictionary. It will be 512 individual codons exactly, with each codon pair being mapped between 00 and FF hex. When evaulating the hex of the other chromosomes, this dictionary will be used to determine the equivalent instruction of any given hex pair. When evolving, each hex pair in the 5th organism will be guaranteed to be a valid opcode in the instruction set by using modulus to constrain each pair to the 55 instructions currently available. This will allow an organism to evolve its own instruction distribution, and try to prevent random instructions which might be harmful or inneficient from springing up as often, and instead more often select for efficient or safer instructions.
#ai#technology#genetic algorithm#machine learning#programming#python#ideas#discussion#open source#FOSS#linux#linuxposting#musings#word vomit#random thoughts#rant
8 notes
·
View notes