jatothrim - Tumblr blog

jatothrim · 7 years ago

Text

PCONE stats update

The PCONE library is making progress and becoming an full parallel C++ programming toolkit!

It is by fact most enthusiastic project I have been developing. I try keep track of the movements of C++ standard to provide you an library that is fresh inside. Of course I have this bad habit of releasing information of my future plans and code that may not yet exist, but I just like too much writing about the stuff. :)

I have already QSBR (quiescent state based) memory reclaimer ready that is stable and now I'm developing few more things into the set of tools:

EBR (epoch based reclamation), similar to QSBR.

EBR has the advantage that threads can sleep outside read-side locks.

With EBR memory is also recovered sooner than with QSBR.

Hazard Pointers. Long name for this is Safe Memory Reclamation aka SMR.

RCU deque class that has similar interface to std::deque<>It is actually a container adapter on its core, so technically I could make it support any STL container.

The main bullet here is that, once ready, I will release these (QSBR, EBR, HP's, ...) with a license that allows you to statically link them into your program. Even if it is commercial. This is in-contrast to the PCONE main API that is and will be licensed under LGPL.

There are of course more mature libraries for these e.g. libURCU which is under GPL/LGPL, and such does not allow fully statically link them into your program. libURCU is also POSIX/Linux only library. I don't infer anything that there is something wrong with (L)GPL, but I just feel that fully GPL’ing PCONE library would be too restrictive. Especially because it is an fresh library and developers adopting new libraries tend to be really picky about restrictive licenses ;)

I have first proto implementation of the PCONE API now ready, but I need to refactor quite lot to implement some features I didn’t foresee at last time I blogged about PCONE. One such feature is rescheduling contextes in such way that each context data remains cache-hot. There will be certainly optimizations for NUMA achitectures if I ever get hardware to test my code on such platform.

Another must-have feature is option to set maximum allowed parallelization “width” of task-batch. E.g. Allow task-batch to be processed only by just two threads simultaneously instead of default maximum number of detected CPU cores. Or instantiate large number of contextes per task-batch compared to number of available physical cores.

I have also written bit of code for the polished version pcone user-space task context scheduler. This is where I discovered the need for the RCU deque, EBR and HP's. To keep each worker thread task queue balanced and well fed they needed to access each others task queues. This is very bad because task queues were protected by mutexes. This lead to classic resource contention problem, test code would often just dead lock and at best the scheduling would work but not scale making it pointless.

So I wrote the RCU deque (container adapter) to get rid of the mutexes on the task context scheduler queues. :-)

Recently I googled more about the user-space task switching and found out that only Go-lang and Ada have robust user-space task scheduling at language-level. (There may exist few others I don’t know about.) With C++ there is ongoing work to provide stackless coroutines, but PCONE needs stackful coroutines to work with. According to wikipedia there are seven libraries for C++ providing user-space and three of them are low-level or stackless. I would like to see PCONE on that list as High-level general purpose option. :)

In general PCONE begins to look a lot like OpenCL with only CPU devices supported. Difference is that PCONE will be much, much more flexible and C++11 friendly.

Thats all, thank you for reading.

0 notes

jatothrim · 7 years ago

Text

Spring 2018 KiloEngine develop

I have Recently worked on KiloEngine code-base to overhaul and refactor it to modern C++11 standards. Mostly thanks to increasing amount of sun-light hare on Finland, Kuopio.

In the process of refactoring the code-base I checked out cross-compilation support. Cross-compiling the KiloEngine on Linux targeting Windows allowed me to eliminate large piles code-rot and junk.

I counted has been almost two years since the windows code has been build-tested. As arch Linux user using windows for developing code is so to speak... painful at least. I can now write code and build-test simultaneously for both Linux and Windows! But I shall note this work is still under heavy development!

Here is a story about the KiloEngine and my life:

For me it is remarkable the KiloEngine code-base is still somewhat alive and kicking, even though it has been one-mans project for about seven years. I’m only 7 years off from the guy who wrote the TempleOS and only half as crazy. :D

From it's birth as CoolBasic Classic game engine, written in PureBasic and the fade-away of CoolBasic community forums in 2009-2012.

Following me as self-taught software engineer and developer writing the next iteration in C++.

Shadowed by my personal life issues and few very hard years.

I have learned so much about writing software that stable, workable and how to use correct engineering approaches and patterns. With this fresh knowledge will probably have to rewrite oldest parts of the KiloEngine in effort to keep the project roccking.

0 notes

jatothrim · 8 years ago

Text

Blog post about curiosity, and minimally about computing.

I was asked by friends to figure out from what some playing dices were made from. They were playing the popular Magic the Gathering card game.

The dices were regular cube shaped with 1 to 6 dots each side made from some unknown metal.

The thing that puzzled me and my friends was that these metal dices would not stick to a magnet. I.e they were non-magnetic, but also far too heavy to be made from aluminium or titanium.

I was very bored at this time, so I decided to analyze from what the dices were really made from. I normally avoid doing any calculus. This question puzzled me enough to do it.

I only had a ruler, blob of blutack and two 2dl measuring cups. And four of those metal dices. I decided best approach wouble be to calculate the density of the metal dices.

I managed to craft a scale made from these objects, using one dice’s edge as pivot point under the ruler. The flimsy scale with empty measuring cups and only blutack keeping it together was very tricky to get it balanced.

Eventually the scale would tilt just by gently blowing air over the cup on raised end.

The assembled scale was now balanced enough for me. I placed remaining three dices in the scale’s cup. Next I dripped water into the second cup (that was now firmly tilted up, from weight of the dices in other end) until the scale started to tilt over and I stopped there.

The scale was now balanced using water in one cup and three dices in other cup.

I measured the volume of the water in the cup. I was ~75ml. I scrapped the scale and measured the side of the metal dice using the ruler. It was about 1,5cm.

So volume of the three dices summed was 3 x 1,5 x 1,5 x 1,5 == 10,125cm³

Converting the volume of the water into mass 75ml = ~75 grams

So the density of the metal is ~75g / ~10,125cm³ == ~7,407 g/cm³.

Here is table of common metal densities (in kilograms per cubic meter):

Chromium – 6856 kg/m³

Cobalt – 8906 kg/m³

Copper, pure – 8940 kg/m³

Iron – 7870 kg/m³

Magnesium – 1746 kg/m³

Manganese – 7192 kg/m³

Molybdenum – 10300 kg/m³

Nickel – 8890 kg/m³

Steel, carbon – 7850 kg/m³

Titanium – 4540 kg/m3

So ... it is not ordinary steel, it is a exotic allow of these. Googling: ”non-magnetic metal with 7,407 g/cm3 density” gave me few hints. It happens that ”austentic stainless steel” comes very close to the metal dice physical properties: it is non-magnetic and it’s density is ~7,99g/cm3. Also there are only few common varities of this steel alloy: austentic stainless steel 304 and 316.

Now I know even what ratios of certain chemical elements, Like Iron, Chromium, Nickel, Molybdenium, Manganese, Nitrogen the metal dices are probaly made from.

Most improtantly: why the dices don’t stick to magnet!

See the PDF of austentic stainless steel 316.

That was fun for hours.

-Written by JATothrim. Thank you for reading!

0 notes

jatothrim · 8 years ago

Text

PIMPLify your code now!

The KiloEngine Core is going through a heart transplant surgery. The first KiloEngine SBCM (short them for shared library in kiloengine) library ABI's (Application Binary Interface) are finally (after 4 years) being materialized. The API’s (Application Programming Interface) themselves have become so ubiquitous and stable that It feels stupid to not use them in other projects. But without stable ABI they are quite useless in other projects outside KiloEngine. I finally found a sane developing method for C++ classes that are being built into a shared library. It's basically reduced PIMPL idiom with C++11. PIMPL completly separates the class implementation from the public interface. If you have a normal class with normal code all that is needed is write public wrap header and completly move the old code into SBCM side. The moved class does not need to be modified too much. In the wrap header we write identically named class (say ksMutex) that has only single data member: void * impl; Note also that the wrap class is not related to the orginal ksMutex class type in any way than name.

Then we need to write the ABI call table struct for the class. It is a simple POD structure that has only function pointers as data members. Each member function in ksMutex gets it’s own function ptr:

namespace priv { struct mutex_call_abi { int api_ver; void * (*ctor)(void); void (*dtor)(void *); bool (*try_lock)(void *); void (*lock)(void *); void (*unlock)(void *); bool (*lock_timed)(void *, unsigned long int); void (*relock)(void *); int * (*mutex_addr)(void*); }; };

The api_ver member is optional but it is a good part ensuring the ABI remains stable. Next, we define a C function. This returns the above struct via pointer:

KILOAPI_CFUNC priv::mutex_call_abi * ksMutex_fntable();

In the wrapper class the ksMutex::lock() is implemented as inline function:

void ksMutex::lock() { ksMutex_fntable()->lock(impl); }

The trick for being simple was to reduce the too-many-C-funcs-to-handle into only single function that returns the ABI call table. In ksMutex wrapper class constructor we do:

ksMutex::ksMutex() { impl = ksMutex_fntable()->ctor(); }

The First invocation of ksMutex_fntable() initializes the ABI call table. Then the ctor() function initializes the SBCM internal ksMutexImpl instance. We could slighty optimize this bit more by placing ksMutex_fbtable() returned pointer in static variable.

In the SBCM implementation we simply cast the first void* argument to our ksMutexImpl implemention class type and invoke the corresponding member function. This is safe since same code created and allocated the class instance. This “cast and call operation” is warped in C++11 lambda which address is ultimately saved into the ABI call table. The C++11 lambdas here must have extern “C” call convention. They must be callable from pure C language. It looks bit messy but works fantastic since we only need to make only single extra real function in SBCM implementation.

This also reveals a need for custom memory managment, since client wrapper class must not never free() or delete the impl pointer. ksMalloc() machinery works fine for this purpose. No need to write anything new here. For some stack allocated classes using ksMalloc() would be ineffecient. The solution would be to reserve extra unformated space in the wraper class just enough to fit the implementation class in this buffer. The SBCM code would then use placement new to construct the object into the client memory. This effectively reverses the memory managment roles but still keeps the PIMPL idiom intact.

(edited post 17.09.2017)

0 notes

jatothrim · 8 years ago

Text

Midnightly KiloEngine

I had now my summer holiday so I can continue bleeding more life into the KiloEngine. Plus stop being sucked and stinged by the those damm finnish blood thirsty mosquitos. Plus cute red panda pictures from Ähtäri Zoo. Welp back to Kuopio.

I *tired* do some initial Linux windowing and OpenGL RCTX coding before my holidays. However I realised I'm realy realy forced to fix the ThreadSubSystem cruft in KiloEngine as first priority.

Btw. ThreadSubSystem is probaly oldest piece of code I have written in KiloEngine.. And... ThreadSubSystem is now so old that I can see I was still learning how to program in C++. 7 years is a very long time for *any* programming project. Thankfully I don't need to fix the old code. I only need to replace old ThreadSubSystem internals with PCONE.

I'm just brutally ripping out the old guts and puting in the new technology!

Doing this I learned the PCONE way of running tasks actually differs from ThreadSubSystem style a quite lot. PCONE has no concept of 'runnable instances' i.e. it is stateless. PCONE only really cares about the condition functor result and as long as it returns true the pool threads will simply spin in a loop. Also if PCONE was initialized with e.g. 32 threads it will simply invoke the condition functor 32 times.

ThreadSubSystem subroutine pools are more stateful and bit more smart than that. Programmer can decide how many ThreadTasks are used to initialize the task-pool and those instances are circulated though the system with two lockfree queues in TaskPoolController class. TaskPoolController does what the class name suggests. It controls the execution of ThreadTask objects in the subroutine pool. (I'm thinking about renaming subroutine pool to something else)

ThreadTask objects are initialized with uniqueue data and queued for execution using virtual function callback in TaskPoolController class. This is equalent to condition functor in PCONE. The PCONE work functor will then pull ThreadTasks from the ready-to-go queue, execute them and push them back to storage-queue where TaskPoolController will the recycle and pickup ThreadTask instances again. Overall ThreadSubSystem provides a class based API abstracting PCONE’s simplistic lambda function based API. Best thing is that those queues of polymorphic ThreadTasks can be arbitrary long! E.g. Single task-pool may consist of multiple types of derived ThreadTasks that do different things. This allows to plan and program ahead how to execute entire landscapes of work.

Soon as I refurbish the ThreadSubSystem I’ll publish the PCONE library within the KiloEngine project.

Yes - you will need to clone KiloEngine repository in order to gain PCONE library. Reason for this is that PCONE build needs few parts from the KiloEngine project to build. Precisely just ksSemaphore code and boost binary libraries. Otherwise PCONE will be distributed as completly separate project. Second thing is that I hope this will make fellow programmers try and view the KiloEngine and not just the amazing PCONE tech. It is just few ticks in cmake-gui after all. Thanks. -JATothrim

0 notes

jatothrim · 8 years ago

Text

PCONE: heat up your cores.

Here is the latest child of the KiloEngine: the PCONE library. I had developed it years as part of the KiloEngine ThreadSubSystem but this magic parallelization API’s implementation just didn't work fully as I meant it todo and it was always buggy or crashing.

Now I shall introduce you to perhaps my greatest thing I have ever programmed. If you are scared programming C++ code, APIs, multithreading, C/C++, RAII, mutexes, atomic cpu operations or fibers; I suggest you to ran with great escape velocity speed away of this hard core programming article of mine. ;-D

First things first. Are you redy? If yes, Go and boil some strong green-tea before continuing. You will need it. No, don’t make black coffee because is has only small L-teanine contents. Coffee only has caffeine. No focus. As I said this article is about making parallelization of C++ code easier. the ”PCONE” jargon stands for ”Parallel Co-operatively Nested Execution”. It should become clear what all those different words mean during this article. The PCONE library will combine three threading technologies into single library and API:

The first programming technology is using pools of OS threads. Creating a thread of execution in operating system is massively costly. This known by all programmers that have wroten even single multithreaded program. Using a pool of fixed number of OS threads reduces the overhead of creating and scheduling of these threads to minimum. Each ”pool-thread” is simply assinged new task after it has finished with the previous work in loop until the library is teared down.

The second technology is co-operative multitasking. I realy need to explain this concept good. Here single OS thread is divided into two or more context of execution called fiber. Thus multiple fibers that execute on top of same OS thread are not never actually executing in parallel. Fibers must co-operatively context switch from one fiber to another to allow sharing same OS thread. Userspace programs have capablity to actually swap out the old CPU state and stack and install new context-of-execution in its place. I call this ”co-operative user-space context-of-execution scheduling” (shortly: COUSCS) because the program has full control of when context-switches happen. The OS does this kind of context switches automatically under the program all time but the program is not aware of it – program just experiences the effects of it. OS context switches are transparent and we have no control for them. The point here is that context-of-execution consist just of two parts: the CPU register state and the stack storing the thread local state and the swaping this is fairly cheap. Thankfully boost C++ library developers have provided a cross-CPU and OS context switcing API for me: boost::context. So I don’t have don’t need to learn writing ARM x64 cpu assembly language on Win10 my self. :-)

The third technology is the library architechture itself and how it resources these two primary technologies. Yes I’m serious: It is fresh new way of doing parallelism and writing parallel code with C++. The smarts and future plans I have poured into it are just remarkably intresting. In the mix are Read-Copy-Update (RCU) and from earlier article QSBR memory reclaimation tech turbo charged.

The PCONE library allows application to parallelize its code easily via co-operatively executing multiple nested levels of parallel-task-batches simultaneously. That is a one heck mouthful of parallelism in one sentence! To explain how PCONE is different from all other lame thread-pool libraries and implementations I’ll just try broadly explain how it works internally. To use the PCONE library you first make instance of PCONE::Manager via API initialization routine Manager::create(<num_of_threads>). It initializes the PCONE shared library and bootstraps the pool of OS threads. Later Manager::start(<functor>, <cond>) is then used to assign the <functor> to be executed in parallel by the pool-threads. Then Manager::start() returns a handle identifying the parallel-task-batch instance containing the given task functor and the <cond> functor. This handle is then given to Manager::join() to wait for the assigned condition functor return false once for all pool-threads. Once pool-threads have finished executing the functors and get returned false from the condition functor pool-threads stop executing the task functor from that parallel-task-batch instance. When pool-thread is executing a task functor or the condition functor, that function code is allowed to start new nested parallel-task-batch instance within the already executing one! The application programmer can continue the process of parallelizing the task or condition function code endlessly into more divided parallel tasks. This is an massively useful feature and/or technique because the parallel-task-batch instances form a ordered flow of control that is naturally similiar and compatible for way a function invokes other functions. Here parallel-task-batch invokes yet another parallel-task-batchs . The most latest and deepest nesting parallel-task-batch instance must be finished first before other parallel-task-batch instances before it can be finalized in LIFO order. That is before Manager::join() can return. So multiple parallel-task-batches are managed internallly as a stack. start() pushes new task-batch and join() pops one off.

The PCONE library guarantees that no the Big Deluge of thread-pools flooding the system will happen, no double number of pool-threads are fork-bombed nor performance will greatly suffer. The trick here is that the PCONE library uses boost::context to neatly escape from the previous task functor context-of-execution in Manager::join() and jump into more nested parallel-task-batch instance available. This nesting of parallelized tasks costs only some amount memory to keep the each pool-thread’s context-of-execution per parallel-task-batch instance. The COUSCS operation as explained earlier is in itself is fairly cheap. The handing of memory is effectively is the hard and costly part.

The now obsolete kiloengine versions of this functionality had cubersome problems to function correctly. The worst problem was that the overhad and mess of locks and mutexes in the start() and join(). This was unsolved problem haunting me until I happended to read reddit thread about badly implemented multi-threading in games. Continuing reading, I was introduced to Read-copy-update technique (RCU) with Quiescent State-Based Reclaimation (QSBR). The article had very good code example that I kept studying.

For one week I furiously programmed, tested and learned the RCU + QSBR and finally PCONE version alpha v0.0.1 was born. The RCU+QSBR allows start() and join() operate entirely without operating system mutexes. Locks and mutexes are only used in cases pool-thread gets starved from useful work and has proved it has nothing else to execute.

Even here I have implemented custom counting semaphore for windows and linux to reduce the cost of blocking the pool-threads. Sadly C++ does not provide semaphores or even in near future. The windows version especially proved to be more performant than standard library based implementations. Thanks to RCU based updating the PCONE will make some progress even if one pool-worker would sleep or get stuck blocked in middle of task functor. Progress is possible while that top parallel-task-pool instance can be finished by any pool-thread.

The end result is nearly 100% utilization of CPU cores even when task-pools are nesting deeply.

I have another big block to implement and that is fiber-mutexes. These are special type of mutual exclusion providing fiber-level serialization of execution. Using them allows the fibers to exclude other fibers from entering the critical-section providing much finer control than blocking entire pool-threads. If fiber would block on fiber-mutex, it won’t actualy block the whole OS thread or even enter OS kernel at all, but COUSCS out of the mutex lock() into another eligble context-of-execution that is allowed to run! This would provide absolute massive boost to parallelism, since the OS thread would almost always be able multiplex tasks even if single fiber was blocked!

Hey that’s it. Superb thanks for reading and please ask me anything if you have question!

-Recursively, JATothrim

#parallel #multitasking #c++11 #programming

0 notes

jatothrim · 8 years ago

Text

Going nukes with concurrency.

I always wondered how read-copy-update, shortly RCU, programming technique worked in practice. Well not any more. KiloEngine has had very intresting concurrecy API feature for quite long time. The idea is good and very useful, but implementation was a nightmare. Keyword and/or problem was scalability. You can’t use std::mutexes. It’s the devil impersonating as unscalable code. The API is designed so that you can schedule millions of small tasks that are executed then asynchronously and concurrently in thread-pool. Unfortunately doing lots of small tasks multiplies the overhead times number of worker-threads manyfold. I had few times no scaling at all, and all workers just sat in std::mutex::lock() most of the time. The cure for this witchcrafted curse is RCU + CAS. Or better explained duplicating immutable shared data-structure, updating the local copy of it and then publishing the local copy globally using std::atomic and CAS-loop. The old data structure is not touched nor modified, but it is replaced as whole. Pseudo code of updating std::string data:

std::atomic<std::string*> data; const std::string * old = data.load(); std::string * newstr = new std::string(old->begin(), old->end()); updatestringpostfix(newstr); while(!data.compare_exchange_weak(old, newstr)) { newstr->assing(old->begin(), old->end()); updatestringpostfix(newstr); }

All this without ever taking a mutex. If you followed you might wonder what happens to the old data? The outdated data is not deleted immediately, because other threads may be still accessing it. Because we used no mutexes we didn’t exclude access to the old data structure. We actually intentionally allow thread to access the outdated data. The other threads have no idea yet that data was modified and continue read the old string object. They’ll get the updated data when they really want to reload the pointer from the global data. The above pseudo code scales almost linearly to amount of cpu cores available. But heyy.. doesn’t that code leak memory? Yes. It does leak memory. If that was written in garbace collected language it would work as is, because the garbace collector would eventually detect no one is using the abandoned string objects. But in C++ we must free the memory or our string updater will eat the whole universe of free memory.

We’ll need machinery that detects when the old string data is possible to be deleted safely. This is done with “Quiescent State-Based Reclamation” or shortly QSBR. Basically it stores the old std::string objects from above code after the CAS-loop in a list. This is called logical deletion. This logical reclaimation list manipulation must be thread-safe.

Threads whill then need to check if they have declared “Quiescent State“ in onQuiescentState(). If not, they’ll set the quescent state and check if it they were the last thread enabling it. If thread was the last one it enters the magic code section where magic happens. Other threads are forbidden to enter this magic code before the Quiescent State is zeroed.

Or at least when you call onQuiescentState() at correct place. onQuiescentState() can only be called when it is impossible for the thread to reference anything that it put into that logical reclaimation list. E.g. when thread is not holding any mutexes or locks? If all good the thread in onQuiescentState() swaps the logical reclaimation list with null and swaps the physical reclaimation list with the logical one. Now the thread proceeds to brutally kill anything that was on the previous physical reclaimation list having no consequences or sigsegvs. It then zeroes the global quiescent state and the killing cycle repeats. :D

0 notes

jatothrim · 8 years ago

Text

Hypertux

That’s the name of my kernel. Or more exactly custom patches that enables Profile Guided Optimizations to the Linux kernel itself!

The orginal patch + research paper was already so cool it has its own web site: http://coolypf.com/

What the kernel PGO does is fairly simple in consept: it implements GCC -fprofile-generate and -fprofile-use gcc optimizations on the kernel level.

Unfortunately the orginal coolypf.com PGO patch is almost impossible to get working today. I still need to build gcc from source, but now you can use GCC 6.3.1 with my patches instead of ancient gcc 4.9. This is a requirment because ‘regular’ gcc tries to refer its own gcov implementation, but there is not such thing as ‘kernel mode libgcov’. The kernel gcov subsystem implements the -fprofile-generate instrumentation functions into what gcc smears calls all over from the kernel code. :D Thus the instrumentation phase is *very* invasive and the produced instrumented ‘GCOV Kernel’ may be unstable. The instrumented kernel is also 50% slower!!

The kernel already supports -fprofile-arcs but is missing -fprofile-values that coolypf.com patches implement. However the example implementation has one serious flaw: as the kernel is inherently concurrent and the orginal version didn’t even try to make the kgcov code thread safe. Result was often very unstable kernel. Today as-of writing I finished implemeting atomic counting of the gcov data counters via x86 locked instructions.

I was surprised how easy it was in the end. The actual pain was figuring out why calcsum + kgcc failed to accept the profile data. The reason for this failure was simple but hard to solve: the kernel gcov code wrote incorrect version magic into the generated gcda profile files. The kernel still thinks it only supports gcc 4.7. Since I had no idea how the gdca files are structured I viewed one random specimen with hexedit and saw this “adcg*704” as first eight bytes of the file.

Hey.. adcg is gcda backwards!! and *704 is “407*“ as exactly in the kgcov-tool error messages! So I modified calcsum-4.9cpp to replace those four bytes with corrent version magic.

I then did a test build to check if PGO enabled vs. regular code was any different. KASAM!

Result: the asm code is finally different! Plus it looks some what more optimized than normal. I can’t wait to extract the gcov profile data from this instrumented kernel and got hypertux mode. ;)

0 notes

jatothrim · 8 years ago

Text

I’m getting the factorial fuzz out of this thing..

The RHZ data compressor project has been fun project to hack on. As of speaking I'm processing about ~1500MB of data to generate a special lookup table for the compressor part.

The compressor basically has three methods for processing the input: Lemppel-Ziv style mode, my own invention FUZZ mode and storing the data as-is.

The FUZZ compression mode basically tries to use byte-sequence mutation algorithm F(k,<byte-sequence>) to solve F(k, <past-seen-data>) == <input-data> equation and 'bruteforce' index k so that the mutated byte-sequence matches the input data-bytes. Generator function F(...) could theothetically be anything that mutates the past-seen data-bytes like SHA1(), MD5() or permutating the past-seen bytes. I use fairly speedy implementation of that last one.

The hard thing is discovering out 'generally fastest' mutator function F. I currently mutate 16-byte at time. So there are 15! possible combinations to test-out. And fifteen factorial is by the way massive number. 15! is infact so big that 32-bit k-index won’t be able to hold it. Also it is impossible for classical computer to try test out all values of k in reasonable amount of time.

To solve this problem I wrote a program that processes large amounts of data over time. It thus learns the most common successful k-index values for the mutator function F(k,<past-seen-data>)

The fuzztablegen program has now ran for hours and results are intresting. Certain k-index values for F(k,<byte-sequence>) seem to be many times more frequent and/or successful than others. Only time will tell. I have to run massive amounts of multiple gigabytes of data though the F() function to get best possible lookup table.

0 notes