khuey
khuey
about:khuey
31 posts
Don't wanna be here? Send us removal request.
khuey · 5 years ago
Text
Taiwanese Pineapple Cakes (鳳梨酥)
If you need an introduction to what pineapple cakes are, I suggest this. If you would prefer to drool over a picture of one instead, here you go:
Tumblr media
Commercially prepared pineapple cakes (especially the cheaper ones) typically contain a mixture of pineapple and winter melon in the filling. Most recipes for making pineapple cakes at home will tell you to start with canned crushed pineapple and/or strain off the excess liquid (pineapples are 85%+ water by weight). Both are wrong. The “secret ingredient” to good pineapple cakes is just real fruit and having the patience to reduce the filling rather than pouring off the flavor.
The one piece of equipment you’re probably missing to make these are molds. Pineapple cakes are molded pastries. You can find molds in a variety of shapes on Amazon from no-name Chinese brands. You want something no bigger than 2″x2″. Square vs rectangle/etc isn’t important.
You want to select a pineapple that is properly ripe. It should be mostly or entirely golden-yellow, have a nice sweet fruity smell at the base, and the leaves in the crown should look green and healthy. It should not be orange, have any externally visible mildew, or be either rock hard or excessively soft. Pineapples don’t ripen after they are picked from the fields so you need to get one that’s good at the store.
A single pineapple, depending on its size, how aggressively you trim it, and the size of your pastry molds, will produce enough filling for 15-30 pineapple cakes.
Filling (the night before)
1 pineapple, properly cut
1/2 cup of sugar (white vs brown doesn’t matter, I typically use half white and half brown)
(Optionally) 1 tsp or less of lemon/lime juice
If you’ve never cut a pineapple before, here’s a helpful video on how to do it. If you’re not concerned about waste, instead of cutting out each individual row of eyes you can simply remove more of the side flesh. Regardless, when you’re done, you should have pineapple with no eyes and no core. You can chop this up into small pieces with a knife or crush it with your hands and reserve the knife for any long fibrous bits. Either way it will produce a lot of juice, so be sure to do this last part in or over a pot.
You need to reduce and cook the pineapple until it becomes like a jam. If there is too much moisture remaining in the filling it will either be too difficult to work with when filling pastries or it will seep into the pastry filling after cooking making it soggy. Heat the pineapple until the juices are boiling, stirring and using a spoon to break up any remaining larger pieces of pineapple. The heat will slowly break down intact bits of pineapple and release the moisture from inside them. For a while it will not seem as if the pineapple is getting any drier but it will get smaller in volume as pieces break down. Keep going.
Once the pineapple has all broken down into a mush add the sugar. Sugar helps draw more water out of the fruit and promotes caramelization (aka flavor). Continue cooking the filling until there is so little free water that you can pile the jam on one side of your pot with a spoon and the both a) the jam will stay there and b) water will not seep out of the jam to cover the bare areas of the pot. That is when you are done.
Optionally, to balance the added sugar, you can put a bit of lemon or lime juice into the filling to add some tartness back in. Do this at the very end and mix it in thoroughly.
Once the right consistency is achieved, remove from heat and allow it to cool to room temperature before refrigerating. Alternatively, freeze it to keep it for longer periods of time.
Dough
1 stick of (room temperature) butter
1/4 cup of confectioners/icing sugar
(Optionally if using unsalted butter) a pinch of salt
1 egg
1/4 cup of milk powder
1 tsp of baking powder
(Optionally) a few drops of vanilla extract
Around 1 1/4 cups of pastry flour
The pastry dough used is similar to shortcake. The fat and the low-protein pastry or cake flour combine to produce a low amount of gluten and thus a “crumbly” texture. All purpose flour is an acceptable substitute though it will produce a different texture. What makes this dough particularly Taiwanese in flavor is the milk powder. That is not optional! Milk powder can be found at many Targets if your regular grocery does not carry it.
Leave the butter out overnight to soften it. Cream the butter and sugar together as if you are making icing. This is the most important step in making the pastry dough, so watch the video even if you think you know what you’re doing. You can keep going more than 2 minutes, it’s essentially impossible to overbeat the mixture at this stage.
Once that is complete, add any salt, the egg, the milk powder, the baking powder, and any vanilla, in that order. The order is not important for flavor or anything but milk powder and baking powder are very light and have a tendency to move and splatter if you drop something heavy like an egg on them in the wrong way. Mix until uniform, taking time to scrape down the sides of the bowl.
Finally, add the flour. The ideal amount will vary a little bit depending on your specific flour. You want the final product to be dry enough to work with but not too heavy on the flour. With the flour I use 1.25 cups is sufficient. Add the flour and mix until just combined but no more. Unlike making bread, you do not want to develop gluten. Turn the dough out onto a work surface (I use wax paper for this) and combine and shape it into a log. Chill for an hour or so to make the dough harder and easier to work with.
Pastries
Preheat your oven to 325F (if you have a convection over) or a bit short of 350F (if you do not).
The exact amount of dough to use depends on the size of your molds, so some trial and error is involved here. You should aim for a dough to filling ratio of 10:6 if you have a “simple” mold shape like a rectangle, square, or circle or 10:5 if you have a “complex” shape with more surface area (like a pineapple shaped mold). Press the dough flat with your hand (or roll it out with a small pin). Place the filling in the center of the dough and mold the dough around it to completely enclose the filling. Then place the dough ball into your pastry mold and gently press it into the correct shape. If the filling breaks through the dough anywhere take some excess dough and mold it over the hole.
The warmth of your hands will soften the dough and make it more pliable so use that to your advantage, but also work at a reasonable pace before the dough becomes too soft. You want the cakes to be a bit smaller than the mold because they will expand somewhat when baking.
Bake the pastries for 10-12 minutes on one side, then flip them and give them 8-10 minutes on the other side. You want to develop some color on the sides that are not facing the mold itself. Keep an eye on them the first time until you learn what works in your oven.
Although these pineapple cakes are traditionally eaten completely cool, they’re even better when they’re slightly warm!
1 note · View note
khuey · 7 years ago
Text
Rust in 2018
The Rust project is soliciting wishlists for 2018. Rather than list big things that everybody wants like NLL or a more stable tokio, I’m going to list some things that I think are small and could be knocked out in relatively short order.
Prefetching:
Rust has a `std::intrinsics::prefetch_read_data`, but it’s marked as experimental.  Because prefetching only changes the performance of a program and not its correctness, it should be easy to standardize a version of this with a “might do nothing if its not supported on this platform/compiler/etc” caveat.
Debugging:
The DWARF generated by rustc is pretty bad and I think there’s a lot of easy wins here.
Thin references should become `DW_TAG_reference`. This has been stalled on figuring out what to do with fat references but it shouldn’t be blocked by that.
const/mut should be noted in the generated DWARF (see issue 32921)
Tom Tromey is already working on enums so yay!
Stdlib:
Stabilize `TryFrom`
Stabilize `RangeArgument`
1 note · View note
khuey · 7 years ago
Text
Assorted Docker tips and tricks for local development and CI
I have been working on a lot of Docker-related stuff lately.  I have learned a few tricks that I thought were worth writing down in one place.
Use a local user inside a container:
For local development, if you want to have the container write to a local directory you can bind mount said directory into the container. But if the container creates files they’ll have the wrong owner. This can be fixed by running as the current user inside the container.  Just add
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u $( id -u $USER ):$( id -g $USER)
to the container’s run command, or the equivalent to docker-compose.
Run rr inside Docker:
You can run rr inside Docker. All that’s required is relaxing Docker’s default security policies. Add
--cap-add SYS_PTRACE \
--security-opt seccomp=unconfined
Like most debuggers, rr needs to have the ptrace capability. It also needs to use some esoteric syscalls including ptrace and perf_event_open which are forbidden by Docker’s default seccomp policy.
Mounting the Docker socket into a Docker container:
There are two ways to run Docker commands inside a Docker container.  One is to mount the Docker socket into the container to give the container access to the parent’s Docker daemon.  The other is to start a completely new Docker daemon inside the container.  If you choose to do the former, you might run into two problems that I did.
If you’re not running as root in the container, you may need to explicitly add the `docker` group on your system to the user in the container.
--group-add $( getent group docker | cut -d: -f 3 )
If you try to run containers from inside a container, any paths for bind mounts that you provide must be the paths visible to the daemon, outside all the containers. This is not a problem for things that are never mapped to a different path inside the container (e.g. /etc/passwd or /var/run/docker.sock) but for things that are you may need to smuggle knowledge about the paths outside the container into its environment.
Run Docker in Gitlab CI’s Docker:
If you’re using Docker it’s nice to be able to test it in CI, but some CI systems themselves use Docker.  Here, rather than mounting the parent daemon’s socket into the container, you can set up “Docker in Docker” or dind.  Add
services:
  - docker:dind
variables:
  DOCKER_HOST: “tcp://docker:2375″
to your .gitlab-ci.yml. The Docker daemon runs in another container and is accessed over TCP, hence the DOCKER_HOST environment variable.
If your containers you run expose ports, what would normally be on localhost is actually exposed on the `docker` container, since that is localhost to the daemon. You may need to change the host your tests are looking for in your .gitlab-ci.yml.
0 notes
khuey · 8 years ago
Text
Lazy Initialization in Rust
Today I published lazy-init, a Rust crate that scratches an itch I’ve had for a while.  lazy-init is designed for when:
you want to do some work (a computation, disk I/O, etc) lazily,
the product of this work is immutable once it is created,
and you want to share this data across threads.
Rust has a good built-in solution if you only require #s 1 and 2: the Option type.  But requirement #3 makes things much harder.  Both of the built-in, thread-safe primitives for interior mutability have significant drawbacks, as we’ll see later.  But first, the API!
impl<T> Lazy<T> {    /// Construct a new, uninitialized `Lazy<T>`.    pub fn new() -> Lazy<T>;
   /// Get a reference to the contained value, invoking `f` to create it    /// if the `Lazy<T>` is uninitialized.  It is guaranteed that if multiple    /// calls to `get_or_create` race, only one will invoke its closure, and    /// every call will receive a reference to the newly created value.    ///    /// The value stored in the `Lazy<T>` is immutable after the closure returns    /// it, so think carefully about what you want to put inside!    pub fn get_or_create<'a, F>(&'a self, f: F) -> &'a T        where F: FnOnce() -> T;
   /// Get a reference to the contained value, returning `Some(ref)` if the    /// `Lazy<T>` has been initialized or `None` if it has not.  It is    /// guaranteed that if a reference is returned it is to the value inside    /// the `Lazy<T>`.    pub fn get<'a>(&'a self) -> Option<&'a T>; }
There’s a constructor and two methods, one to get an existing value and another to get_or_create the value if it does not already exist.  get_or_create will ensure that the closure is invoked only once even if multiple threads race to call it on an uninitialized Lazy<T>.  Simple enough, right?
Lazy<T> is actually a degenerate version of a more generic LazyTransform<T, U> included in the crate which is initialized with a T that is later converted to a U.  Lazy<Foo> is essentially LazyTransform<(), Foo>.  For simplicity, I’ll refer to them interchangeably.
Rust provides two primitives for threadsafe interior mutability, std::sync::Mutex and std::sync::RwLock.  Lazy<T> is better than both of them because:
Unlike the locking types, Lazy<T> guarantees immutability after the value is created. This also means you can hold an immutable reference to the interior value without having to hold the lock.
Unlike std::sync::Mutex, Lazy<T> does not exclude multiple readers after the value is created, and a panic while reading the value will not poison the Lazy<T>.
Lazy<T> is at least no worse in performance compared to either locking type, and likely much better.
The first two are self-explanatory, so lets dive into the third one.  On Unix systems, std::sync::Mutex and std::sync::RwLock boil down to pthread_mutex_t and pthread_rwlock_init respectively.  Lazy<T> meanwhile, becomes a single std::sync::Mutex and a std::sync::atomic::AtomicBool.
The (slightly simplified, to elide details not relevant to synchronization) code inside get_or_create looks like
if !self.initialized.load(Ordering::Acquire) {     // We *may* not be initialized. We have to block to be certain.     let _lock = self.lock.lock().unwrap();     if !self.initialized.load(Ordering::Relaxed) {         // Ok, we're definitely uninitialized.         // Safe to fiddle with the UnsafeCell now, because we're locked,         // and there can't be any outstanding references.         let value = unsafe { &mut *self.value.get() };         *value = f(value);         self.initialized.store(true, Ordering::Release);     } else {         // We raced, and someone else initialized us. We can fall         // through now.     } }
// We're initialized, our value is immutable, no synchronization needed. *self.value.get()
Where self.value is an UnsafeCell.  This is a standard double-checked locking pattern.  Jeff Preshing has a great explanation of how this pattern actually works, and why the various Ordering values are what they are here.  The simple explanation is that the AtomicBool::store call with Ordering::Release after the closure synchronizes with the AtomicBool::load call with Ordering::Acquire at the top of the function.  So if a thread sees the write to self.initialized it must also see the write to self.value.  If a thread doesn’t see that write, it grabs the lock. Memory accesses cannot be reordered across a lock acquisition or release (because, internally, a mutex uses semantics that are at least as strong as the acquire/release semantics mentioned before) so self.initialized is now a definitive source of truth.  The lock also ensures that only one thread invokes the closure no matter how many threads are racing.
The code inside get is even simpler
if self.initialized.load(Ordering::Acquire) {     // We're initialized, our value is immutable, no synchronization needed.     Some(&*self.value.get()) } else {     None }
We use the same acquire semantics as before to check if we are initialized.  You’ll notice that we don’t have to acquire a lock at all here.  Even in get_or_create, we only have to acquire the lock if self.initialized appears to be false.  Once that write propagates to all threads, Lazy<T> allows lock-free access to the underlying value at the cost of a single load-acquire check.
Best of all, with x86′s strong memory model, every load from memory has acquire semantics.  The atomic operations here really just tell the compiler not to do anything crazy with reordering.  This is not true on other architectures with weaker memory models.  On ARM, for instance, getting load-acquire semantics does require a DMB instruction.  Peter Sewell maintains a list of what the various atomic orderings map to on different architectures.
Depending on your pthreads implementation and architecture the performance of pthread_mutex_t and pthread_rwlock_t can vary wildly.  But as any sort of read-write lock needs to, at the bare minimum, both ensure there are no outstanding writers and increment the read count, a pthread_rwlock_t is never going to be any faster than the single load-acquire that Lazy<T> performs.
I hope others find this crate useful.  Bug reports and pull requests are always welcome!
EDIT: Thanks to Huon for pointing out that I needed to bound the contained type with Sync
1 note · View note
khuey · 9 years ago
Text
IndexedDB Logging Subsystems
IndexedDB in Gecko has a couple different but related logging capabilities. We can hook into the Gecko Profiler to show data related to the performance of the database threads. These are enabled unconditionally in builds that support profiling.  You can find the points marked with a PROFILER_LABEL annotation in the IndexedDB module.
We also have a logging system that is off by default.  By setting the preference dom.indexedDB.logging.enabled and NSPR_LOG_MODULES=IndexedDB:5 you can enable the IDB_LOG_MARK annotations that are scattered throughout the IndexedDB code.  This will output a bunch of data (using the standard logging subsystem) to show DOM calls, transactions, request, and the other mechanics of IndexedDB.  This is intended for functional debugging (instead of looking at performance problems, which the PROFILER_LABEL annotations are for).  You can even set dom.indexedDB.logging.detailed to get extra verbose information.
0 notes
khuey · 9 years ago
Text
The Threading Model of the DOM
I’m often asked to explain the threading model of the DOM. From the perspective of JavaScript (today), execution is essentially single threaded. But beneath the hood things are a bit more complicated. Some of this discussion is applicable across browsers, but a fair bit is Gecko specific. I made a picture but can’t seem to embed it usefully in my blog :P
The first thread of interest is the “main thread”. This is the thread that JS embedded in documents runs on. In Gecko, it’s also the thread that does most of the layout work, processes native UI events from the operating system, does bookkeeping for a lot of the networking code, and much more. Tying up this thread for long periods of time starves these other tasks and makes the browser “jank”, or not respond promptly to input. And compounding this problem, JS uses a “run to completion” scheduling mechanism where execution is not preempted to do other work.
For web pages that want to do long running background processing without yielding we created Web Workers, an abstraction for threads that communicate solely through asynchronous message passing. Workers have access to some APIs such as XMLHttpRequest and IndexedDB but not others, such as Nodes or DOMParser. This allows limited forms of background processing but little UI interaction (although that is changing a bit!).
The main thread and worker threads are the only threads within the browser that run JS. But there are many more threads involved in the implementation of the DOM and related APIs.
All of our network I/O (opening, closing, and polling sockets) has long happened on a pool of network threads. For similar reasons, database and storage operations such as those for localStorage or IndexedDB also happen on a pool of background threads. The UI (main) thread needs to remain responsive no matter how slow your network connection or disk is! We’ve also sought to move long running tasks with little interaction with the rest of the browser off the main thread. The HTML parser usually runs on its own thread. And for some types of loads (such as normal page loads), the networking threads talk directly to the parser thread, while for other types (such as XHR or Fetch) the networking threads talk to the main thread.
There’s also a single IPC thread dedicated to passing IPDL messages between threads and processes via pipes. This thread connects both the main thread and worker threads to PBackground-based protocols, and connects the main thread to PContent-based protocols. The simplified picture linked at the beginning of this article diagrams how all these threads link together.
There are also some other threads that I didn’t include in the diagram. Inside the JS engine Spidermonkey also does threaded script parsing and garbage collection. And HTML media elements connect to machinery for audio and video playback that involves off-main-thread decoding and rendering pipelines. These are generally outside the scope of the “DOM team” at Mozilla, because those subsystems are maintained by other teams (the JS team and the Media Playback team respectively) and have little effect on most of the DOM.
While the threading model visible to JS is pretty simple, inside Gecko there’s a lot more complexity. This is why we do things like document which thread code is expected to run on, assert that objects are refcounted on the correct thread even in opt builds, and generally structure threaded code to use message passing and event loops (via nsIRunnable) rather than using complex systems of shared memory and locks. Relatively modern threaded code, such as our Web Worker or IndexedDB implementations demonstrate many of these best practices. And of course, I and other experienced engineers are always happy to answer questions.
0 notes
khuey · 9 years ago
Text
What is PBackground?
All new code written in Gecko today is designed to be e10s ready.  PBackground exists to solve some common problems that arise when writing e10s aware code which involves multiple threads in the chrome process.  But before we dive into that, some background:
What is IPDL?
IPDL is a language used to define “protocols”, essentially formalized versions of how two things can communicate.  It is similar in some respects to IDL and WebIDL.  At build time the IPDL compiler automatically generates a large amount of C++ glue code from the IPDL files.  IPDL protocols are used to specify how the chrome and content processes talk to each other and verify that a content process is “following the rules” so to speak.  When used at run time, every protocol has two “actors”, a parent and a child.  These actors can live in different processes, and IPDL automatically serializes calls on one side and transmits them to the other side.  This is the foundation of how chrome and content processes talk to each other in e10s.
The actor tree
All live actors exist in a tree.  The root of the tree is known as the top-level actor, and it is an instance of one of a small number of top-level protocols.  All actors in a tree “live” on the same thread, and can only be used safely from that thread.  The top-level actor for most things is PContent, which connects the main thread of the chrome process to the main thread of a child process.  For most things this is great, because the DOM is already bound to the main thread.  But in some cases we don’t necessarily want to talk to the main thread.
What if I want to talk to a different thread?
Not every communication between the chrome and content processes necessarily wants to go through the main threads of both (or even either) processes.  When we are uploading textures from the content process we don’t need to go through the main thread of the parent process.  Instead we can go directly to the compositor thread in the parent process by creating a new top-level protocol that connects the compositor thread in the parent process to the main thread of a child process.  This protocol is called PCompositor.  Using PCompositor allows us to bypass the main thread of the parent process, which trims the latency of texture uploads since they will not get bogged down if that thread is busy.
Writing code that works in both chrome and content processes
A lot of code “just works” in the content process.  But some things, such as APIs that need to manipulate files, or device settings (e.g. geolocation) cannot run in the content process because it has restricted privileges.  Instead these APIs must ask the chrome process to do something on their behalf via an IPDL protocol.  In some parts of Mozilla this has lead to a rather awkward pattern like so:
if (XRE_GetProcessType() == GeckoProcessType_Default) {
        DoTheThing(arguments);
} else {
        mIPDLProtocol->SendDoTheThing(arguments);
}
This can get unwieldy.
PBackground
PBackground exists to solve both of these problems.  It connects a designated “background” thread in the chrome process to any other thread in any process.  PBackground
allows bypassing the chrome process’s main thread if used from the content process’s main thread.
allows bypassing both main threads if used from a non-main thread in a content process
allows writing process agnostic code because it can be used even from another thread in the parent process
The “parent” side of a PBackground actor pair is always on the designated background thread, while the child side is on the thread that chooses to use PBackground.  The background thread is designed to be responsive (nobody is allowed to do long running computation or file I/O on it) to guarantee better latency than going through the main threads (which can run arbitrary JS, GC, etc) can provide.
Examples
IndexedDB was rebuilt on top of PBackground last year.  This allows the DOM code to generally not worry about what thread (main vs. worker) or process (single vs. multiple) it is running in.  Instead it simply turns all of its operations into IPDL calls on a set of PBackground based actors and the processing, file I/O, etc, can be controlled from the PBackground thread regardless of where the DOM calls are being made.  The logic is all implemented in ActorsParent.cpp and ActorsChild.cpp in the dom/indexedDB folder.  These are not small files (ActorsParent.cpp is 27k lines of code as of this writing!) but the logic that needs to run in the parent is very clearly separated from the DOM code no matter what thread it’s running on.
0 notes
khuey · 11 years ago
Text
DOM Object Reflection: How does it work?
I started writing a bug comment and it turned out to be generally useful, so I turned it into this blog post.
Let's start by defining some vocabulary:
DOM object - any object (not just nodes!) exposed to JS code running in a web page.  This includes things that are actually part of the Document Object Model, such as the document, nodes, etc, and many other things such as XHR, IndexedDB, the CSSOM, etc.  When I use this term I mean all of the pieces required to make it work (the C++ implementation, the JS wrapper, etc).
wrapper - the JS representation of a DOM object
native object - the underlying C++ implementation of a DOM object
wrap - the process of taking a native object and retrieving or creating a wrapper for it to give to JS
IDL property - a property on a wrapper that is "built-in". e.g. 'nodeType' on nodes, 'responseXML' on XHR, etc.  These properties are automatically defined on a wrapper by the browser.
expando property - a property on a wrapper that is not part of the set of "built-in" properties that are automatically reflected.  e.g. if I say "document.khueyIsAwesome = true" 'khueyIsAwesome' is now an expando property on 'document'. (sadly khueyIsAwesome is not built into web browsers yet)
I'm going to ignore JS-implemented DOM objects here, but they work in much the same way: with an underlying C++ object that is automatically generated by the WebIDL code generator.
A DOM object consists of one or two pieces: the native object and potentially a wrapper that reflects it into JS.  Not all DOM objects have a wrapper.  Wrappers are created lazily in Gecko, so if a DOM object has not been accessed from JS it may not have a wrapper.  But the native object is always present: it is impossible to have a wrapper without a native object.
If the native object has a wrapper, the wrapper has a "strong" reference to the native.  That means that the wrapper exerts ownership over the native somehow.  If the native is reference counted then the wrapper holds a reference to it.  If the native is newed and deleted then the wrapper is responsible for deleting it.  This latter case corresponds to "nativeOwnership='owned'" in Bindings.conf.  In both cases this means that as long as the wrapper is alive, the native will remain alive too.
For some DOM objects, the lifetimes of the wrapper and of the native are inextricably linked.  This is certainly true for all "nativeOwnership='owned'" objects, where the destruction of the wrapper causes the deletion of the native.  It is also true for certain reference counted objects such as NodeIterator.  What these objects have in common is that they have to be created by JS (as opposed to, say, the HTML parser) and that there is no way to "get" an existing instance of the object from JS.  Things such as NodeIterator and TextDecoder fall into this category.
But many objects do not.  An HTMLImageElement can be created from JS, but can also be created by the HTML parser, and it can be retrieved at some point later via getElementById.  XMLHttpRequest is only created from JS, but you can get an existing XHR via event.target of events fired on it.
In these cases Gecko needs a way to create a wrapper for a native object.  We can't even rely on knowing the concrete type.  Unlike constructors, where the concrete type is obviously known, we can't require functions like getElementById or getters like event.target to know the concrete type of the thing they return.
Gecko also needs to be able to return wrappers that are indistinguishable from JS for the underlying native object.  Calling getElementById twice with the same id should return two things that === each other.
We solve these problems with nsWrapperCache.  This is an interface that we can get to via QueryInterface that exposes the ability to create and retrieve wrappers even if the caller doesn't know the concrete type of the DOM object.  Overriding the WrapObject function allows the derived class to create wrappers of the correct type.  Most implementations of WrapObject just call into a generated binding function that does all the real work.  The bindings layer calls WrapObject and/or GetWrapper when it receives a native object and needs to hand a wrapper back to a JS caller.
This solves the two problems mentioned above: the need to create wrappers for objects that we don't know the concrete type of and the need to make object identity work for DOM objects.  Gecko actually takes the latter a step further though.  By default, nsWrapperCache merely caches the wrapper stored in it.  It still allows that wrapper to be GCd.  GCing wrappers can save large amounts of memory, so we want to do it when we can avoid breaking object identity.  If JS does not have a reference to the wrapper then recreating it later after a GC does not break a === comparison because there is nothing to compare it to.  The internal state of the object all lives in the C++ implementation, not in JS, so don't need to worry about the values of any IDL properties changing.
But we do need to be concerned about expando properties.  If a web page adds properties to a wrapper then if we later GC it we won't be able to recreate the wrapper exactly as it was before and the difference will be visible for that page.  For that reason, setting expando properties on a wrapper triggers "wrapper preservation".  This establishes a strong edge from the native object to the wrapper, ensuring that the wrapper cannot be GCd until the native object is garbage.  Because there is always an edge from the wrapper to the native object the two now participate in a cycle that will ultimately be broken by the cycle collector.  Wrapper preservation is also handled in nsWrapperCache.
tl;dr
DOM objects consist of two pieces, native objects and JS wrappers.  JS wrappers are lazily created and potentially garbage collected in certain situations.  nsWrapperCache provides an interface to handle the three aspects of working with wrappers:
Creating (or recreating) wrappers for native objects whose concrete type may not be known.
Retrieving an existing wrapper to preserve object identity.
Preserving a wrapper to prevent it from being GCd when that would be observable by the web page.
And certain types of DOM objects, such as those with native objects that are not reference counted or those that can only be constructed, and never accessed through a getter or a function's return value, do not need to be wrapper cached because the wrapper cannot outlive the native object.
2 notes · View notes
khuey · 12 years ago
Text
Cycle Collector on Worker Threads
Yesterday I landed Bug 845545 on mozilla-inbound.  This is the completion of the second step of our ongoing effort to make it possible to write a single implementation of DOM APIs that can be shared between the main thread and worker threads.  The first step was developing the WebIDL code generator which eliminated the need to write manual JSAPI glue for every DOM object in workers.  The next step will be discarding the separate worker thread DOM event implementation in favor of the one used on the main thread.  Once complete these changes will allow us to make DOM APIs such as WebGL, WebSockets, and many others work in web workers without writing a separate C++ implementation.
Prior to the great worker rewrite of 2011 exposing DOM objects in workers was pretty easy.  The same XPIDL/XPConnect layer that was used for most of the DOM worked off the main thread, and as long as the underlying implementation of your DOM object was threadsafe and you added a way to create the object in a worker or transfer it in (via postMessage) things just worked.  In fact, you could even send arbitrary threadsafe XPCOM objects to ChromeWorkers.  But it turns out that XPConnect and the JS engine were not really threadsafe, and that we couldn't share XPConnect and the JS engine state between threads.  So bent rewrote workers to use a separate JSRuntime.
XPConnect didn't come along for the ride though.  Instead we took the opportunity to prototype the next iteration of quickstubs.  XPConnect does a lot of work in each function call that can be done at compile time, at the cost of larger codesize.  As performance has become more and more important and codesize has mattered less, we have been moving towards using more and more generated code in the "binding" layer and less dynamic conversion and dispatch via XPConnect.  So workers got the first iteration of what would become the WebIDL bindings: a handwritten JSAPI binding layer.
This binding layer ended up diverging from what we did for the main thread in a number of ways.  Most notably memory management is handled quite differently.  One of the hardest problems in web browsers is how to handle cross-language cycles.  References from JS to the language the browser is implemented in (C++ in existing browsers) are ubiquitous.  But references from browser code to JS can exist too (e.g. event listeners).  Once references can exist in both directions, there's the possibility for cycles.  There are 3 options for dealing with these cycles:
Use a separate memory management system for C++, and have special purpose code that glues these two systems together.  This is what Gecko does on the main thread with the cycle collector.  C++ side objects are usually reference counted, and the cycle collector breaks both pure C++ and C++-JS cycles.  It also handles tracing the JS objects owned by C++.
Use the JS garbage collector to manage the lifetimes of the C++ objects too.  This is what we chose to do on the worker threads.  In this setup there are never any C++->C++ edges, a C++ object owns the JS reflection of the C++ object it wants to keep alive.
Implement something hacky for event listeners and hope that nobody ever wants to implement an API that requires other sorts of C++->JS edges.  This is the WebKit approach.
Option 3 is obviously a non-starter.  Option 2 is potentially what we would do on both threads if we were writing a browser from scratch, but it is a pretty severe impedance mismatch with the rest of Gecko.  Code that wanted to work on both threads would have to be written to own C++ and cycle collect on one thread and own JS and manually trace it on another.  This, combined with the lack of automated binding generation, has made adding new APIs to workers essentially impossible for people who aren't already seasoned DOM hackers.
The WebIDL code generator solved the problem of writing the binding layer manually.  At the DOM meetup in London back in February we had a meeting where the rest of us overruled bent and decided to move workers to the cycle collector.  This involved some pretty serious refactoring work.  We had to separate a large amount of code from XPConnect so it could be used on both threads, while adding some customization hooks so that XPConnect could insert some of the strange behavior that only the main thread needs.  Then we were finally able to stand up a worker thread cycle collector implementation and port ImageData to use it.
Unfortunately this work was repeatedly delayed.  First by travel, then by the length of bent's review queue, later by some Firefox OS work, and then by the effort needed to unbitrot the patches from some other cycle collector changes.  All of this led to a Q2 goal slipping a third of the way through Q3.  But it is finally done, and future work should allow more opportunities for parallelism.  Some of the existing worker thread objects can be converted to use the cycle collector without waiting for events.  Other APIs can be ported to worker threads now if they don't use events.  And Olli can work on events now that I am no longer blocking him.
Best of all, we have some plans in the works to port some very interesting stuff to workers.  Keep your eyes peeled for more.
1 note · View note
khuey · 12 years ago
Link
http://people.mozilla.com/~cpearce/cycle-collector-khuey-Taipei-23May2013.mp4
I gave a talk about the cycle collector at the Rendering Meetup in Taipei last month.  Chris Pearce had the foresight to record it, and now it is available on people.m.o for anyone who wants to watch it.
0 notes
khuey · 13 years ago
Text
Refcounting thread-safety assertions are now fatal on mozilla-central
Gecko has long had assertions to verify that XPCOM objects are AddRefed/Released on the right thread.  Today I landed Bug 753659 which makes those assertions fatal (using MOZ_ASSERT).  This makes these assertions noticeable on test suites that do not check assertion counts (namely mochitest).  It also ensures that developers will notice these assertions when testing locally.  Remember that any time you see one of these assertions you are seeing a potential sg:crit (via a use-after-free on an object that's reference count is too low due to AddRef racing with another operation) and should file and fix it immediately.
0 notes
khuey · 13 years ago
Text
Cycle Collection
We don't really have a comprehensive and current overview of the cycle collector and how to use it anywhere, so I wrote this.  This is probably part 1 of a multipart series, as I've only convered the simple cases here.
What?
The cycle collector is sort of like a garbage collector for C++.  It solves the fundamental problem of reference counting: cycles.  In a naive reference counting system, if A owns B and B owns A, neither A nor B will ever be freed.  Some structures in Gecko are inherently cyclic (e.g. a node tree) or can very easily be made cyclic by code beyond our control (e.g. most DOM objects can form cycles with expando properties added by content script).
The cycle collector operates on C++ objects that "opt-in" to cycle collection and all JS objects.  It runs a heavily modified version of Bacon and Rajan's synchronous cycle collection algorithm. C++ objects opt-in by notifying the cycle collector when they may be garbage.  When the cycle collector wakes up it inspects the C++ objects (with help from the objects themselves) and builds a graph of the heap that participates in cycle collection.  It then finds the garbage cycles in this graph and breaks them, allowing the memory to be reclaimed.
Why?
The cycle collector makes developing Gecko much simpler at the cost of some runtime overhead to collect cycles.  Without a cycle collector, we would have to either a) manually break cycles when appropriate or b) use weak pointers to avoid ownership cycles.  These add significant complexity to modifying code and make avoiding memory leaks and use-after-free errors much harder. 
When?
C++ objects need to participate in cycle collection whenever they can be part of a reference cycle that is not guaranteed to be broken through other means.  C++ objects also need to participate in cycle collection if they hold direct references to objects that are managed by the JavaScript garbage collector (a jsval, JS::Value, JSObject*, etc.).
In practice, this means most DOM objects need to be cycle collected.
Does the object inherit from nsWrapperCache (directly or indirectly)?  If so, it must be cycle collected.
Does the object have direct references to JavaScript values (jsval, JS::Value, JSObject*, etc)?  If so, it must be cycle collected.  Note that interface pointers to interfaces implemented by JavaScript (e.g. nsIDOMEventListener) do *not* count here.
Does the object hold no strong references (e.g. it has no member variables of type nsCOMPtr or nsRefPtr, it has no arrays of those (nsTArray<nsCOMPtr>, nsTArray<nsRefPtr>, or nsCOMArray), no hashtables of them (nsInterfaceHashtable, nsRefPtrHashtable), and does not directly own any object that has these (via new/delete or nsAutoPtr))?  If so, it does not need to be cycle collected.
Is the object threadsafe (e.g. an nsRunnable, or something that uses the threadsafe AddRef/Release macros)?  Threadsafe objects cannot participate in cycle collection and must break ownership cycles manually.
Is the object a service or other long lived object?  Long lived objects should break ownership cycles manually.  Adding cycle collection may prevent shutdown leaks, but it will just replace that with a leak until shutdown, which is just as bad but doesn't show up on our tools.
Does the object hold strong references to other things that are cycle collected?  If so, and the object does not have a well-defined lifetime (e.g. it can be accessed from Javascript) it must be cycle collected.
Does the object have strong references only to other things that are not cycle collected (e.g. interfaces from XPCOM, Necko, etc)?  If so, it probably does not need to be cycle collected.
Can the object be accessed from Javascript?  Then it probably needs to be cycle collected.
The last two are kind of vague on purpose.  Determining exactly when a class needs to participate in cycle collection is a bit tricky and involves some engineering judgement.  If you're not sure, ask your reviewer or relevant peers/module owners.
How?
C++ objects participate in cycle collection by:
Modifying their reference counting to use the cycle collector.
Implementing a "cycle collection participant", a set of functions that tell the cycle collector how to inspect the object.
Modifying their QueryInterface implementation to return the participant when asked.
Like many things in Gecko, this involves lots of macros.
The reference counting is modified by replacing existing macros:
NS_DECL_ISUPPORTS becomes NS_DECL_CYCLE_COLLECTING_ISUPPORTS.
NS_IMPL_ADDREF becomes NS_IMPL_CYCLE_COLLECTING_ADDREF.
NS_IMPL_RELEASE becomes NS_IMPL_CYCLE_COLLECTING_RELEASE.
The cycle collection participant is a helper class that provides up to three functions:
A 'Trace' function is provided by participants that represent objects that use direct JavaScript object references.  It reports those JavaScript references to the cycle collector.
A 'Traverse' function is provided by all participants.  It reports strong C++ references to the cycle collector,
An 'Unlink' function is provided by (virtually) all participants.  It clears out both JavaScript and C++ references, breaking the cycle.
The cycle collection participant is implemented by placing one of the following macros in the header:
NS_DECL_CYCLE_COLLECTION_CLASS is the normal choice.  It is used for classes that only have C++ references to report.  This participant has Traverse and Unlink functions.
NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS is a version of the previous macro for classes that multiply inherit from nsISupports.
NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS is used for classes that have JS references or a mix of JS and C++ references to report.  This participant has Trace, Traverse, and Unlink methods.
NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_AMBIGUOUS is the ambiguous version of the previous macro.
And by doing one of the following in the cpp file:
For very simple classes, that don't have JS references and only have nsCOMPtrs, you can use the NS_IMPL_CYCLE_COLLECTION_N macros, where N is the number of nsCOMPtrs the class has.
For classes that almost meet the above requirements, but inherit from nsWrapperCache, you can use the NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_N macros, where N is the number of nsCOMPtrs the class has.
Otherwise, use the NS_IMPL_CYCLE_COLLECTION_CLASS macro and separate macros to implement the Traverse, Unlink, and Trace (if appropriate) methods.  To implement those, use the NS_IMPL_CYCLE_COLLECTION_[TRAVERSE|UNLINK|TRACE]_* macros to construct Traverse, Unlink, and Trace methods.
0 notes
khuey · 13 years ago
Text
Fixing the Memory Leak
The MemShrink effort that has been underway at Mozilla for the last several months has substantially decreased the memory usage of Firefox for most users.  There are still some remaining issues that lead to pathological memory use.  One of those issues is leaky addons, which Nick has identified as the single most important MemShrink issue.
In Firefox, the JavaScript heap is split into compartments.  Firefox's UI code, which is written in JS, lives in the privileged "chrome" compartment.  Addon code also usually lives in the chrome compartment.  Websites live in different, unprivileged compartments.  Exactly how compartments are allocated to websites is beyond the scope of this article, but at the time of writing there is roughly one compartment per domain.  Code running in the chrome compartment can hold references to objects in the content compartments (much like how a page can hold references to objects in an iframe).
For example of how this might look in practice, lets imagine we have Firefox open to three tabs: GMail, Twitter, and Facebook, and we have some sort of social media addon installed.  Our compartments might look something like this:
Where the blue lines are the references the Firefox UI is holding and the red lines are the references the addon is holding.
The problems start to arise if these references aren't cleaned up properly when the tab is navigated or closed.  If the Facebook tab is closed, but not all of those references are cleaned up, some or all of the memory the Facebook tab was using is not released.  The result is popularly known as a zombie compartment, and is a big source of leaks in Firefox.
Chrome (privileged UI or other JS) code that leaks is particularly problematic because the leak usually persists for the lifetime of the browser.  When chrome code leaks, say, facebook.com, it leads to dozens of megabytes of memory being lost.  It turns out that writing chrome code that doesn't leak can actually be quite difficult.  Even the Firefox front end code, which is worked on by a number of full time engineers and has extensive code review, has a number of leaks.  We can find and fix those, but addons are a much harder problem, and we can't expect addon authors to be as diligent as we try to be in finding and fixing leaks.  The only defense we have had is the AMO review team and our list of best practices.
That changed last night when I landed Bug 695480.  Firefox now attempts to clean up after leaky chrome code.  My approach takes advantage of the fact that chrome code lives in a separate compartment from web page code.  This means that every reference from chrome code to content code goes through a cross-compartment wrapper, which we maintain in a list.  When the page is navigated, or a tab is closed, we reach into chrome compartment and grab this list.  We go through this list and "cut" all of the wrappers that point to objects in the page we're getting rid of.  The garbage collector can then reclaim the memory used by the page that is now gone.
The result looks something like:
Code that accidentally (or intentionally!) holds references to objects in pages that are gone will no longer leak.  If the code tries to touch the object after the wrapper has been "cut", it will get an exception.  This may break certain code patterns.  A few examples:
Creating a DOM node from a content document and storing it in a global variable for indefinite use.  Once the page you created the node from is closed your node will vanish.  Here's an example of code in Firefox that used to do that.
Creating a closure over DOM objects can break if those objects can go away before the closure is invoked.  Here's some code in Firefox that did that.  In one of our tests in our test suite the tab closed itself before the timeout ran, resulting in an exception being thrown.
Addon authors probably don't need to bother changing anything unless they see breakage.  Breakage should be pretty rare, and the huge upside of avoided leaks will be worth it.  It's a little early to be sure what effects this will have, but the amount of leaks we see on our test suite dropped by 80%.  I expect that this change will also fix a majority of the addon leaks we see, without any effort on the part of the addon authors.
20 notes · View notes
khuey · 13 years ago
Text
Address Space Layout Randomization now mandatory for binary components
This evening I landed Bug 728429 on mozilla-central.  Firefox will now refuse to load XPCOM component DLLs that do not implement ASLR.  ASLR is an important defense-in-depth mechanism that makes it more difficult to successfully exploit a security vulnerability.  Firefox has used ASLR on its core components for some time now, but many extensions that ship with binary components do not.
ASLR is on by default on modern versions of Visual Studio, so extension authors will only need to ensure that they haven't flipped the switch to turn it off.  MSDN documentation on ASLR options is available here.  Further reading about the benefits of ASLR is available here.
If no unexpected problems arise, this change will ship in Firefox 13.
12 notes · View notes
khuey · 14 years ago
Text
Pushing Compilers to the Limit (and Beyond)
At the end of the first week of December Firefox exceeded the memory limits of the Microsoft linker we use to produce our highly optimized Windows builds.  After the problem was identified we took some emergency steps to ensure that people could continue to land changes to parts of Firefox not affected by this issue by disabling some new and experimental features.  Once that was complete we were able to make some other changes that reduced the memory used by the linker back below the limits.  We were then unable to undo those emergency steps and turn those features back on.
This will have no lasting impact on what is or is not shipped in Firefox 11.  The issues described here only affected Firefox developers, and have nothing to do with the memory usage or other performance characteristics of the Firefox binaries shipped to users.
Technical Details
Recently we began seeing sporadic compilation failures in our optimized builds on Windows.  After some debugging we determined that the problem was that the linker was running out of virtual address space.  In essence, the linker couldn't fit everything it needed into memory and crashed.
The build configuration that was failing is not our normal build configuration.  It uses Profiled Guided Optimization, fancy words meaning that it runs some benchmarks that we give it and then uses that information to determine what to optimize for speed and what optimizations to use.  It also uses Link-Time Code Generation, which means that instead of the traditional compilation model where the compiler generates code and the linker glues it all together the linker does all of the code generation.  These two optimization techniques are quite powerful (they generally win 10-20% on various benchmarks that we have) but they require loading source code and profiling data for most of Firefox into RAM at the same time.
Once we identified the problem we took emergency steps by disabling SPDY support and the Graphite font subsystem, both new features that had been landed recently and were turned off by default (in other words, users had to use an about:config preference to turn them on).  This allowed us to reopen the tree for checkins that did not touch code that ends up in xul.dll (this allowed work to proceed on the Firefox UI, the Javascript engine, and a few other things).
We then disabled Skia (which is being used as an experimental <canvas> backend) and separated video codecs and parts of WebGL support into a separate shared library.  This work decreased the linker's memory usage enough to resume normal development and turn SPDY back on.  The medium term solution is to start doing our 32 bit builds on 64 bit operating systems so that the linker can use 4 GB of memory instead of 3 GB of memory, and to separate pieces of code that aren't on the critical startup path into other shared libraries.
Frequently Asked Questions:
Why don't you just get machines with more RAM? - The problem is not that the linker was running out of physical memory, but that it was running out of virtual memory.  A 32 bit program can only address 2^32 bytes (4GB) of memory, regardless of how much memory is in the machine.  Additionally, on 32 bit Windows, the last 1 GB is reserved for the kernel, so a program is really limited to 3 GB of memory.
Ok, so why don't you just use a 64 bit linker? - Unfortunately there is no 64->32 bit cross compiler provided with the Microsoft toolchain so you can't generate binaries that run on 32 bit systems with a 64 bit compiler.
Sure you can, just use -MACHINE:X86 on the linker! - You can have the 64 bit linker link 32 bit binaries, but this is incompatible with Link-Time Code Generation.
Is Firefox bloated? - Firefox's size and linker memory usage compares favorably with other browsers. These problems are not a reflection on which browsers are or are not bloated, but rather on how resource intensive it is to do whole program optimization across a large C++ codebase.
5 notes · View notes
khuey · 14 years ago
Text
Using XHR.onload/etc in addons
I just landed https://bugzilla.mozilla.org/show_bug.cgi?id=687332 on mozilla-central which makes some changes to how .onfoo event listeners are handled on some DOM objects (including XHR).  These changes mean it is no longer possible to use .onfoo event listeners from JS scopes where the global object is not a Window, or from C++.  The correct way to listen for events from these scopes is to use .addEventListener.
This will likely affect a number of addons (particularly for XHR).  Addons that use XHR in XPCOM components should check to see if they are affected.  We may consider implementing some sort of a compatibility hack for XHR if that number is large.
1 note · View note
khuey · 14 years ago
Text
xpidlc is dead. Long live pyxpidl.
Today I landed Bug 458936 which moves from using xpidlc to generate xpcom typelibs to new python code.  With that, and other work by people including Ted Mielczarek, Mike Hommey, and Benjamin Smedberg, Firefox is now built without ever invoking the binary xpidl.
The remaining pieces of work here are:
Migrate comm-central to the new python tools (interfaces in comm-central are still compiled with xpidlc)
Package the python xpidl into the Gecko SDK.
Stop building the binary xpidl entirely and remove it from the tree.
Remove our build time dependencies on libIDL, etc.
1 note · View note