#planetpython
Explore tagged Tumblr posts
Text
GridRoyale - A life simulation for exploring social dynamics
GridRoyale - A life simulation for exploring social dynamics
Another day, another project :)
This is a project that I wanted to do for years. I finally had the opportunity to do it. Check out the GridRoyale readme on GitHub for more details and a live demo.
GridRoyale is a life simulation. It's a tool for machine learning researchers to explore social dynamics.
It's similar to Game of Life or GridWorld, except I added game mechanics to encourage the players to behave socially. These game mechanics are similar to those in the battle royale genre of computer games, which is why it's called GridRoyale.
The game mechanics, Python framework and visualization are pretty good-- The core algorithm sucks, and I'm waiting for someone better than me to come and write a new one. If that's you, please open a pull request.
0 notes
Quote
From Twitter: Graham Dumpleton: Launching applications in Docker containers. http://bit.ly/1zmyC1i— Planet Python (@planetpython) December 16, 2014
↬@planetpython
0 notes
Text
Live-coding a music synthesizer
Live-coding a music synthesizer
After so much work and waiting, the video of my EuroPython talk is finally released!
youtube
This is a fun live-coding session using NumPy and SoundDevice. The goal of this talk is to make the computer produce realistic-sounding instrument sounds, using nothing but math.
The talk starts with creating a simple sound using a sine wave. We gradually make it sound more like a real instrument, learning a little bit about music theory on the way. We add features one-by-one until by the end of the talk, we hear our synthesizer play a piece of classical music.
0 notes
Text
Improving Python exception chaining with raise-from
Improving Python exception chaining with raise-from
This is going to be a story about an esoteric feature of Python 3, and how I spent the last few months reviving it and bringing it into the limelight.
Back in the yee-haw days of 2003, Raymond Hettinger wrote an email to the python-dev mailing list, sharing an idea for improving the way that Python handles exceptions that are caught and replaced with other exceptions. The goal was to avoid losing information about the first exception while reporting the second one. Showing the full information to the user would make debugging easier, and if you've followed my work before, you know there's nothing I love better than that.
That idea was polished and refined by many discussions on python-dev. A year later, Python core developer Ka-Ping Yee wrote up a comprehensive PEP that was then known as PEP 344, later to be renamed to PEP 3134. That idea was detailed there, with all the loose ends, potential problems and solutions. Guido accepted the PEP, and it was implemented for the infamous Python 3.0, to be used... By no one. For a long time.
If there's one thing I don't miss, it's waiting 10 years for the Python ecosystem to adopt Python 3. But finally, it happened. Almost all the packages on PyPI support Python 3 now, and getting a job writing Python 3 code is no longer a luxury. Only a few days ago, NumPy finally dropped Python 2 support. We live in good times.
When a modern Python developer catches an exception and raises a new one to replace it, they can enjoy seeing the complete information for both tracebacks. This is very helpful for debugging, and is a win for everybody.
Except... For one thing.
Two cases of exception chaining
There was one interesting detail of PEP 3134 that was forgotten: It has to do with the question, "What does it mean when one exception is replaced with another? Why would someone make that switcheroo?"
These can be roughly divided into two cases, and PEP 3134 provided a solution for each case.
The first case is this:
"An exception was raised, we were handling it, and something went wrong in the process of handling it."
The second case is this:
"An exception was raised, and we decided to replace it with a different exception that will make more sense to whoever called this code. Maybe the new exception will make more sense because we're giving a more helpful error message. Or maybe we're using an exception class that's more relevant to the problem domain, and whoever's calling our code could wrap the call with an except clause that's tailored for this failure mode."
That second case is quite a mouthful, isn't it? It didn't help that the first case was defined as the default. The second case ended up falling by the wayside. Most Python developers haven't learned how to tell Python that the second case is what's happening in their code, and to listen when Python is telling them that it's happening in code that they're currently debugging. This resulted in a Catch 22 situation, not that different from the one that slowed down Python 3 adoption in the first place.
Before I tell you what I did to break that Catch 22, I'll bring you into the fold and show you how to make that feature work in your project.
Exception causes, or `raise new from old`
I'm going to show you both sides of this feature: How to tell Python that you're catching an exception to replace it with a friendlier one, and how to understand when Python is telling you that this is what's happening in code that you're debugging.
For the first part, here's a good example from MyPy's codebase:
try: self.connection, _ = self.sock.accept() except socket.timeout as e: raise IPCException('The socket timed out') from e
See the from e bit at the end? That's the bit that tells Python: The IPCException that we're raising is just a friendlier version of the socket.timeout that we just caught.
When we run that code and reach that exception, the traceback is going to look like this:
Traceback (most recent call last): File "foo.py", line 19, in self.connection, _ = self.sock.accept() File "foo.py", line 7, in accept raise socket.timeout socket.timeout The above exception was the direct cause of the following exception: Traceback (most recent call last): File "foo.py", line 21, in raise IPCException('The socket timed out') from e IPCException: The socket timed out
See that message in the middle, about the exception above being the direct cause of the exception below? That's the important bit. That's how you know you have a case of a friendly wrapping of an exception.
If you were dealing with the first case, i.e. an exception handler that has an error in it, the message between the two tracebacks would be:
During handling of the above exception, another exception occurred:
That's it. Now you can tell the two cases apart.
What I did to push this feature
I found that almost no one knows about this feature, which is sad, because I think it's a useful piece of information when debugging. I decided I'll do my part to push the Python community to use this syntax.
I wrote a little script that uses Python's ast module to analyze a codebase and find all instances where this syntax isn't used and should be. The heuristic was simple: If you're doing a raise inside an except then in 99.9% of cases you're wrapping an exception.
I took the output from that script and used it to open PRs to a slew of open-source Python packages. Some of the projects I fixed are: Setuptools, SciPy, Matplotlib, Pandas, PyTest, IPython, MyPy, Pygments and Sphinx. Check out my GitHub history for the full list.
I then added a rule to PyLint, now known as W0707: raise-missing-from. After the PyLint team makes the next release, and the thousands of projects around the world that use PyLint upgrade to that release, they will all get an error when they fail to use raise from in places they should.
Hopefully, in a few years' time, this feature of Python will become more ingrained in the Python community.
What you can do to help
Do you maintain a Python project that already dropped Python 2 support? Install the latest version of PyLint from GitHub. You can do this in a virtualenv if you'd like to keep your system Python clean. Run this to install:
pip install git+https://github.com/PyCQA/pylint
Then, run this line on your repo:
pylint your_project_path | grep W0707
You'll get a list of lines showing where you should add raise from in your code. If you're not getting any output, your code is good!
0 notes
Text
Symlinks and hardlinks, move over, make room for reflinks!
Symlinks and hardlinks, move over, make room for reflinks!
If you've been around Linux for a while, you know about symlinks and hardlinks. You've used them and you know the differences between how each of them behaves. Besides being a useful filesystem tool, they're also a favorite interview question, used to gauge a candidate's familiarity with filesystems.
What you might not know is that there's also a thing called reflink. Right now it's supported only on a handful of filesystems, such as Apple's APFS, used on all Apple devices, XFS which is used on lots of Linux file-sharing servers, Btrfs, OCFS2, and Microsoft's ReFS.
If a symlink is a shortcut to another file, and a hardlink is a first-class pointer to the same inode as another file, what's a reflink, and when is it useful?
A reflink is a tool for doing copy-on-write on the filesystem.
If you've heard the term copy-on-write before, I'm willing to bet that it was in the context of the Linux fork call. Let's talk a bit about that.
Copy-on-write when forking a process
When you fork a process in Linux, the new process has a new copy of the old process's memory. This is essential, because if the new process shared the old process's memory, either process could crash if the other process was making an unexpected change to their shared memory. Therefore, Linux needs to make a copy.
However, Linux is smart, and it knows better than to just make a naive copy. Making a naive copy could be a waste of memory, especially if your process has several gigabytes of memory allocated, and you're forking lots of processes for small tasks. If Linux were to make naive copies, you could find yourself with an out-of-memory crash very quickly.
When you fork a process, Linux uses copy-on-write to create the new process's memory. This means that it holds off on making actual copies of the existing memory pages until the last possible moment; which means, the moment when the two processes start having different ideas on what the content of these memory pages should be. In other words, as soon as one of these processes start writing to these memory pages, Linux makes a copy of it, assigning the original page to the original process, and the new copy to the newly-forked process.
This is a huge boon, because most of the time, the new process will either only be reading the memory, or not even that. So many copy actions are avoided thanks to this technique. The beauty part is that these shenanigans are completely transparent to the process, and to the developer who's writing the logic that this process performs. The new process behaves as if it has its own copy of the parent's memory pages, and the floor is being paved ahead of it as it walks forward, so to speak. It'll never even know that copy-on-write was performed.
Now we're ready to talk about reflinks.
Reflinks are copy-on-write for the filesystem
If you read the section above, you already know 90% of what you need to know to understand and use reflinks.
A reflink is a copy of a file, except that copy isn't really created on the hard-drive until the last possible moment. Like the forking version, this logic is invisible. You could do a reflink of a 10 gigabyte file, and the new "copy" would be created immediately, because the 10 gigabytes wouldn't really be duplicated. They'll only be duplicated once you start modifying one of the copies.
All the while, you could treat the reflink as if it was a completely legitimate copy of your original file.
How do you create reflinks?
On Linux, run the following:
$ cp --reflink old_file new_file
On Mac, there's a different flag for some reason:
$ cp -c old_file new_file
If you're creating reflinks programmatically, you could also use dedicated libraries such as this one for Python.
When are reflinks useful?
Here's an example of where I've used reflinks for a client of mine years back. They had a tool for developers that takes their entire codebase and copies it into a Docker container to run tests on it. (Don't ask.)
That recursive copying took a while, and the developers couldn't change their code in the meanwhile, or checkout any other branches, because then an inconsistent version of their code would be copied into the container. That was pretty annoying for me personally, because I was twiddling my thumbs whenever I started the test process.
I figured, why not use reflinks?
I wrote some Python code that creates reflinks to the code in a temporary folder, and then does a real copy from that temporary folder to the Docker container. The big advantage here is that as soon as the reflinks were created, I could modify the original code as much as I wanted, without affecting the tests.
Fortunately, all the developers were using Macs in that company, so I knew I didn't have to worry about filesystem support.
How can reflinks go wrong?
You might be thinking, "What happens if I create a reflink of a huge file, that's bigger than the amount of space I have available on the harddrive?"
I've never tried this, but here's what I heard: The reflink will be created, but then you'll get an error as soon as one of the copies will be changed, and an actual copy will need to be created. I haven't tested this, but this is something you should take into account if you're relying on reflinks in your business logic.
0 notes
Text
PySnooper: Never use print for debugging again
PySnooper: Never use print for debugging again
I just released a new open-source project!
https://github.com/cool-RR/PySnooper/.
0 notes
Text
Mike Driscoll interviewed me on his blog
Mike Driscoll interviewed me on his blog
I recently had the pleasure of being interviewed by Mike Driscoll on his blog, The Mouse vs. The Python.
Mike is well-known in the Python world and especially in the wxPython user group. He often posts tutorials for beginners on his blog, and it's happened serveral times that I googled a technical question and found the answer in one of his tutorials.
Head over to Mike's blog to read the interview.
2 notes
·
View notes
Text
PythonTurtle makes it into Saudi Arabia's official state curriculum
PythonTurtle makes it into Saudi Arabia’s official state curriculum
I just heard some very exciting news.
Six years ago, when I was just starting out my development career, I made a little program called PythonTurtle. It’s a program that helps children learn how to program in the Python programming language, which is the programming language that I use it my day-to-day work as a web developer. I created PythonTurtle as a side project, because I saw there wasn’t a viable solution for children to learn how to program in Python. I figured there should be a solution, so I spent roughly two months of hard work building and releasing PythonTurtle.
Screenshot from the program:
What’s special about PythonTurtle is that it lets children learn programming in an exciting way that puts emphasis on fun and creativity rather than technical details.
When using the program, an illustrated turtle is displayed on the screen, and the children can program it to move around the screen and draw lines. The more programming concepts the children learn, the more impressive drawings they can create with the program. This gives them motivation to learn and improve their skills without feeling that it’s being forced on them by their schoolteacher.
PythonTurtle is based on an educational program called LOGO that was developed in the eighties; what I made is in fact a modern version, so instead of teaching programming in a didactic language, it taught programming in the Python programming language, which is a real language used in the industry today. The idea is to bring children closer to the techniques used in the real world, and possibly plant the seeds of a career in software development.
Because I was just starting out as software developer back then, I didn’t have the skills that I have today, and developing this software was hard for me. There were technical challenges (specifically modifying the wxPython shell to be able to command an auxiliary process.) These challenges were so hard, that it looked like I wasn’t going to solve them, and at a few points I considered giving up on the project entirely. I was asking myself, why am I even doing this? No one even knew I was working on this program, and no one seemed to care.
But I told myself that I’m creating something big here, and it’s important that I see this through to the end. So I did, and I overcame the technical problems.
I released the program as open-source under the MIT license, which means that every person on Earth could download it and use it free of charge. I decided to release it that way rather than as commercial software because I figured more children could use it if it was free, and that seemed more important to me than making a few bucks. I also liked the idea of contributing back to the open-source community, because so much of the software that I use every day is built on open-source software that was made by volunteers, so I was happy to contribute my share of open-source software.
I released the software for download and I submitted a link to the website to tech forums such as HN and Reddit, and over the next few days, the story blew up, and thousands of people visited the website. I was very happy and proud that people liked my project so much.
Over the six years since I’ve released the program, I’ve gotten many happy emails from teachers and parents who used the program to teach their children to program. It’s always heartwarming to get these emails. They come from all over the world: From the States, from the UK, from Africa, Australia, South America… I would occasionally also get emails from children themselves, and one time even from a 80-year-old man who said that he used my program to learn to program himself. I got more reports of adults enjoying using the program. Looking at the analytics for the website, I saw that PythonTurtle was downloaded almost 100,000 times, which made me very proud.
But last year, I’ve noticed something odd. I was checking how the site is doing on Google Analytics and saw that I’m getting a disproportionally large number of hits from Saudi Arabia.
Specifically, there was a big peak of Saudi visitors around January 2014, and than that peak appeared again in January 2015. I also got more feedback emails from people with Arab-sounding names. I investigated why, and found a Saudi forum where PythonTurtle was mentioned. The text was in Arabic, and I tried translating it to English using Google Translate, but the result was too hard to understand, so I let it go and didn’t investigate further.
Until a couple of days ago, I got an email from a teacher from Saudi Arabia about PythonTurtle. He told me that PythonTurtle is being used in all high-schools in Saudi Arabia! The ministry of education of Saudi Arabia has put PythonTurtle into the official state curriculum! This means that it’s being used by more than 4,000 schools which teach more than 700,000 students!!!
I’m very excited to have made a program that has helped so many students, and especially the students in Saudi Arabia. I’m an Israeli, and there are no diplomatic relations between Israel and Saudi Arabia. I’m an ignorant regarding the political affairs between the countries, but I’m happy to see that open-source software has no borders; if a developer in one country makes a program that can help people, it can be used everywhere and help people all around the world, regardless of the political situation.
2 notes
·
View notes
Text
Christoph Gohlke's awesome collection of Windows binaries for Python packages
Christoph Gohlke's awesome collection of Windows binaries for Python packages
Today I needed to upgrade the psycopg2 package on a Django app of one of my clients. Without giving it a second thought, I fired up my browser and started typing goh in the omnibar. I quickly got Christoph Gohlke’s page, which is on my favorites:
http://www.lfd.uci.edu/~gohlke/pythonlibs/
What is this? It’s a page where you can find Windows binaries of many popular Python packages. Whenever you need to install a Python package that requires compilation, and that package’s maintainers haven’t made Windows binaries available on PyPI, you could usually find it on Christoph’s page, categorized by package, Python version and 32bit/64bit.
I’ve never met Christoph. Never even spoken to him online. But he’s saved me, and thousands of other developers, from doing countless of hours of dreadful work compiling PyPI packages. His page is a godsend for anyone who does Python development on Windows.
Thank you Christoph!
1 note
·
View note
Text
Code comments that I find helpful
Code comments that I find helpful
I’m a huge believer in code quality. When I write code, I put in a lot of effort to make it be as easy to read and understand as possible. Because code is read much more often than it’s written, by writing your code in a way that’s easy to understand you’re saving lots of time for the developers who are going to read your code in the future (one of them being future you) and you’re making it easy for them to build on your code and extend it.
As we all know, a big part of writing good code is comments that explain crucial points about the code. This is what this blog post is about.
Making good code comments is not a trivial thing. We’ve all seen comments like this:
x += 1 # Increment x by 1
Comments like that are not only unhelpful but they are outright harmful, because they add noise to the code; they grab our attention but then don’t give us any useful information. Attention is a scarce resource that shouldn’t be wasted.
Because adding comments to code adds noise, we need to make sure that our comments deliver the maximum amount of useful information in the minimum amount of noise. In this blog post I’ve listed a few kinds of comments that I put in my code to make it as clear as possible with as little noise as possible.
Comment braces
Here is a style of comments I picked up a few years ago by reading someone else’s code, and which proved really helpful since then:
def calculate_length_of_recurrent_perm_space(k, fbb): # ... ### Doing phase one, getting all sub-FBBs: ################################ # # levels = [] current_fbbs = {fbb} while len(levels) < k and current_fbbs: k_ = k - len(levels) levels.append( {fbb_: fbb_.get_sub_fbbs_for_one_key_removed() for fbb_ in current_fbbs if (k_, fbb_) not in cache} ) current_fbbs = set(itertools.chain(*levels[-1].values())) # # ### Finished doing phase one, getting all sub-FBBs. ####################### ### Doing phase two, solving FBBs from trivial to complex: ################ # # for k_, level in enumerate(reversed(levels), (k - len(levels) + 1)): if k_ == 1: for fbb_, sub_fbb_bag in level.items(): cache[(k_, fbb_)] = fbb_.n_elements else: for fbb_, sub_fbb_bag in level.items(): cache[(k_, fbb_)] = sum( (cache[(k_ - 1, sub_fbb)] * factor for sub_fbb, factor in sub_fbb_bag.items()) ) # # ### Finished doing phase two, solving FBBs from trivial to complex. #######
I call these “comment braces” because they look like huge vertical braces that have code in them. Even though these comments are quite bulky, I still really love them because they divide the code into different segments, which is very helpful when you have a piece of code that can be logically separated into different segments. (Of course, if you can refactor these different segments into different functions, that would be ideal, but in many cases it’s not practical.)
This style of commenting makes it easier to read the code casually, because when you’re reading a line in the code you only need to think how it relates to the lines in its section, and not how it relates to the lines in the different sections.
Until challenged
“Until challenged” is a short two-word comment that communicates a common programming idiom:
def __lt__(self, other): found_strict_difference = False # Until challenged. all_elements = set(other) | set(self) for element in all_elements: if self[element] > other[element]: return False elif self[element] < other[element]: found_strict_difference = True return found_strict_difference
We’ve all written algorithms where we set a variable to a boolean, and then later we might flip it and we might not depending on some condition, and eventually we’re going to check the value of that variable and possibly return it. Writing a comment “Until challenged” after a variable assignment communicates that we’re using this idiom.
It might be unclear to people who aren’t familiar with it, so it’s a compromise between brevity and understandability. If you want to make it more universally understandable, you can add a few words like “Set to False until we possibly discover a difference and set it to True”.
Establishing current state
Also known as, “the manual assert.” Sometimes it’s useful to make a comment somewhere in the middle of the function that describes what we’ve accomplished by this point, what is the current state, and what we’re going to do now. Example:
# ... if self.is_degreed and (perm.degree not in self.degrees): raise ValueError # At this point we know the permutation contains the correct items, and # has the correct degree. Now, to calculate its index number. if perm.is_dapplied: return self.undapplied.index(perm.undapplied) # ...
I love this kind of comment. What all good comments have in common is that they’re saying what your internal monologue would say if you tried to read the code and understand it without comments, and this comment is no different.
Clarifying else keyword
One thing that can be confusing when reading Python code is when looking at the else part of a long if-else clause, and not being sure what condition it is. This is where I like to add a comment reiterating the condition:
if actual_item_test is None: if isinstance(single_or_sequence, collections.Sequence): return tuple(single_or_sequence) elif single_or_sequence is None: return tuple() else: return (single_or_sequence,) else: # actual_item_test is not None if actual_item_test(single_or_sequence): return (single_or_sequence,) elif single_or_sequence is None: return () else: return tuple(single_or_sequence)
Note the comment after the middle else. It tells you which condition should be true for this else clause to be executed, so you don’t have to trace it back to the original if line.
That’s all I’ve got for now. I’ll be happy to hear your code commenting tips!
0 notes
Text
My new open-source project: Combi, the Pythonic package for combinatorics
My new open-source project: Combi, the Pythonic package for combinatorics
I’m proud to announce the first release of my new open-source project: Combi!
Combi is a combinatorics package for Python.
Combi on GitHub.
Combi on PyPI.
Combi documentation.
Combi is awesome. It’s like a marshmallow that was slowly and carefully roasted at just the right temperature to make it melt inside, but not too hot as to burn it; except instead of being a marshmallow, it’s a Python package.
Installation:
$ pip install combi

What is Combi good for? Combi lets you explore spaces of permutations and combinations as if they were Python sequences, but without generating all the permutations/combinations in advance. It also lets you specify a lot of special conditions on these spaces. This is helpful both for scientific computing, and for general-purpose programming, as combinations and permutations are concepts that come up when solving many different kinds of programming problems.
(I developed Combi while doing research for a bigger project of mine that’s going to remain a secret for a while. I call it Project SK. If you want to get updates on it when it becomes public, sign up here.)
Let’s look at the simplest example of using Combi. Check out this $5 padlock in the picture. I use this padlock for my gym locker, so people won’t steal my stuff when I’m swimming in the pool. It has 8 buttons, and to open it you have to press down a secret combination of 4 buttons. I wonder though, how easy is it to crack?
>>> from combi import * >>> padlock_space = CombSpace(range(1, 9), 4) >>> padlock_space <CombSpace: range(1, 9), n_elements=4>
padlock_space is the space of all possible combinations for our padlock. At this point, the combinations weren’t really generated; if we’ll ask for a combination from the space, it’ll be generated on-demand:
>>> padlock_space[7] <Comb, n_elements=4: (1, 2, 4, 7)>
As you can see, padlock_space behaves like a sequence. We can get a combination by index number. We can also do other sequence-y things, like getting the index number of a combination, or slicing it, or getting the length using len. This is a huge benefit because then we can explore these spaces in a declarative rather than imperative style of programming. (i.e. we don’t have to think about generating the permutations, we simply assume that the permutation space exists and we’re taking items from it at leisure.) Let’s try looking at the length of padlock_space:
>>> len(padlock_space) 70
Only 70 combinations. That’s pretty lame… At 3 seconds to try a combination, this means this padlock is crackable in under 4 minutes. Not very secure.
In the example above, I used CombSpace, which is a space of combinations. It’s a thin subclass over PermSpace, which is a space of permutations. A combination is like a permutation, except order doesn’t matter.
Now, because the permutations/combinations are generated on-demand, I can do something like this:
>>> huge_perm_space = PermSpace(1000) >>> huge_perm_space <PermSpace: 0..999>
This is a perm space of all permutations of the numbers between 0 and 999. It is ginormous. The number of permutations is around 10**2500 (a number that far exceeds the number of particles in the universe.) I’m not even going to show its length in the shell session because the length number alone would fill this entire blog post. And yet you can fetch any permutation from this space by index number in a fraction of a second:
>>> huge_perm_space[7] <Perm: (0, 1, 2, 3, 4, ... 997, 996, 999, 998)>
Note that the permutation huge_perm_space[7] is a sequence by itself, where every item is a number in range(1000).
Combi lets you specify a myriad of options on the the spaces that you create. For example, you can make some elements be fixed:
>>> fixed_perm_space = PermSpace(4, fixed_map={3: 3,}) >>> fixed_perm_space <PermSpace: 0..3, fixed_map={3: 3}> >>> tuple(fixed_perm_space) (<Perm: (0, 1, 2, 3)>, <Perm: (0, 2, 1, 3)>, <Perm: (1, 0, 2, 3)>, <Perm: (1, 2, 0, 3)>, <Perm: (2, 0, 1, 3)>, <Perm: (2, 1, 0, 3)>)
This limits the space and makes it smaller. This is useful when you’re making explorations on a huge PermSpace and want to inspect only a smaller subset of it that would be easier to handle.
There are many more variations that you could have on a PermSpace or a CombSpace. You can specify a custom domain and a custom range to a space. You can constrain it to permutations of a certain degree (e.g. degrees=1 to limit to transformations only.) You can do k-permutations by specifying the length of the desired permutations as n_elements. You can have the permutation objects be of a custom subclass that you define, so you could provide extra methods on them that fit your use case. You can provide sequences that have some items appear multiple times and Combi would be smart about it and consider multiple occurrences of the same item to be interchangable. You can also toggle that behavior so it would treat them as unique. It’s very customizable :)
Combi has a bunch more useful features that are beyond the scope of this blog post (click for links to documentation):
MapSpace is like Python’s builtin map, except it’s a sequence that allows index access.
ProductSpace is like Python’s itertools.product, except it’s a sequence that allows index access.
ChainSpace is like Python’s itertools.chain, except it’s a sequence that allows index access.
SelectionSpace is a space of all selections from a sequence, of all possible lengths.
The Bag class is like Python’s collections.Counter, except it offers far more functionality, like more arithmetic operations between bags, comparison between bags, and more. (It can do that because unlike Python’s collections.Counter, it only allows natural numbers as keys.)
Classes FrozenBag, OrderedBag and FrozenOrderedBag are provided, which are variations on Bag.
I hope that the Combi package will be useful for you!
0 notes
Text
Another silly Python riddle
Another silly Python riddle
Do you think of yourself as an experienced Python developer? Do you think you know Python’s quirks inside and out? Here’s a silly riddle to test your skills.
Observe the following Python code:
def f(x): return x == not x f(None)
The question is: What will the call f(None) return?
Think carefully and try to come up with an answer without running the code. Then check yourself :)
0 notes
Text
SmartGit: My favorite GUI interface for Git
SmartGit: My favorite GUI interface for Git
I wanted to give a shoutout to one of my favorite tools that I’ve been using for the last few years: SmartGit.
SmartGit is a GUI interface to the Git version control system. (I assume that Git itself needs no introduction, but if you’re not familiar with it: It’s one of the best version control systems used in software development.) Actually, the full name of the software is SmartGit/HG, because it’s also able to handle Mercurial repositories.
In the last few years that I’ve used SmartGit, I found it to be a highly reliable, efficient and customizable piece of software. I use it to manage my work repositories, the repositories of my personal projects, and the repositories of open-source projects that I occasionally contribute to; all in all around 30 repositories. I do still use the git command outside of the GUI, for the few custom Git scripts that I have, and of course SmartGit has no problem with that: you can use git in the shell in parallel to SmartGit with no issue.
I can say in confidence that using the keyboard shortcuts provided in SmartGit, I’m able to execute Git commands faster than if typing them in the command line. (I hope not to get into a CLI vs. GUI discussion here… Here’s my take on the CLI vs. GUI matter.)
I’m not going to go over all the features of the program since you can find a good overview of those on the official website, but I’ll add my own two cents: The makers of SmartGit, a German company called Syntevo, have proven themselves to be a great vendor of software development tools. They keep pushing out new versions consistently with new meaningful features. (The most recent one is support for making GitHub comments within the GUI.) Whenever I send questions to their support email their response is quick and helpful, and when I report bugs they are usually fixed quickly.
In conclusion: If you’re a regular Git user, and you’re open to using a GUI interface for Git, I’d strongly recommend downloading the trial version of SmartGit and giving it a try.
0 notes
Text
Support Py2+3 in two separate codebases: How to do it and why it's great
Support Py2+3 in two separate codebases: How to do it and why it's great
Lately there’s been a lot of discussion about whether Python 3 is working out or not, with many projects reluctant to move to Python 3, especially big, mature projects that are in the “if it’s not broken don’t touch it” phase.
I still fully believe in Python 3, but this blog post is not about discussing 2-vs-3; I’d like to make my own modest contribution to the Python 3 cause by sharing with you my method of supporting both Python 2 and Python 3 which I use in my open-source project python_toolbox.
When I originally read about the different ways to support both Python 2 and 3, I was appalled. There seemed to be 3 ways, and all 3 had properties that made me not want to even consider them.
The 3 approaches seem to be:
Maintain 2 completely separate codebases. Pros: Complete control over each copy of the code. Cons: You have to maintain 2 codebases.
Maintain one codebase, targeting Python 2, and use 2to3 to automatically generate a second codebase that supports Python 3. (Or vice versa.) Pros: You need to maintain only one codebase. Cons: You’re now dealing with autogenerated code, which is hard to edit or debug.
Support both Python 2 and Python 3 in the same codebase (like Django does) by using compatibility libraries like six. Pros: Not having to maintain two different codebases, or autogenerate code. Cons: Your code is ugly as shit because it has to support a wide range of Python versions.
I’ve spent quite some time thinking which approach to take, and I’ve settled on the first approach. I’ve implemented it a few months ago, and it’s been working really well.
Why is two codebases the best approach?
Autogenerated code sucks too much. I like my code to be an actual text file that I can always edit, especially when debugging it, not an ephemeral file created by autogeneration from a different file and using a set of algorithms.
A codebase that supports Python 2 and Python 3 forces you to only use features that are in both versions of Python, and makes your code ugly. I don’t know about you, but one of the big perks about programming in Python has always been the elegance and clarity of the code. If you’re using compatibility libraries, then instead of specifying metaclass=MyType you need to specify six.with_metaclass(MyType), instead of using str you need to use six.text_type. That’s not what Python is about. It’s critical for me to have the code be as succinct as possible.
Having two separate codebases is the only solution that gives you full control of both codebases. You can tweak each codebase to fit the Python version it’s serving, and use its features in the most idiomatic way.
How to make a dual codebase approach painless?
Now the big question is, how do you deal with having two separate codebases? I gave this question some thought. The main problem seems to be this: If I’m adding a feature in the Python 2 version of the library, I want to have that feature in the Python 3 branch, (or vice versa) but I don’t want to type the code again, nor to copy-paste. That’s the crux of the problem, and if that’s solved, having 2 codebases becomes less of an issue. (It’s not like we’re trying to save on diskspace.)
So, when developing a feature for the Py2 version and having it appear in the Py3 version I have to do something like a merge between the two codebases, because the two codebases are different. Normally I would use git merge, but I can’t do that in this case because both codebases are in the same repo. (I considered using git submodules and having each codebase on a different submodule, but the path leading up to submodules is littered with the corpses of desperate developers who regretted ever touching them.)
I came up with a solution that works great. All you’ll need is to get a merge program that supports 3-way merging (I use the excellent but proprietery Araxis Merge, but open source alternatives are available), and follow the instructions below. They’re a bit lengthy, but after you get used to it, you can do them quickly enough that it’s not a big toll on the development cycle.
Create a folder structure similar to mine:
python_toolbox/ <--- Repo root source_py2/ python_toolbox/ __init__.py (All the source files, in their Python 2 version.) source_py3/ python_toolbox/ __init__.py (All the source files, in their Python 3 version.) setup.py README.markdown (All the usual files...)
My setup.py file contains this simple snippet:
if sys.version_info[0] == 3: source_folder = 'source_py3' else: source_folder = 'source_py2'
Then, the rest of the code in setup.py refers to source_folder instead of a hardcoded folder. This way a Python 2 user gets the Python 2 version installed, while a Python 3 user gets the Python 3 version installed. So far so good.
Now you’re asking, how do you deal with the in-repo merge problem?
How to deal with merges
First, before making the split to support Python 3, ensure that you’re starting from a commit where all the code works great and the test suite passes. Then, use 2to3 just one time to create a copy of your code that supports Python 3. Put that in source_py3, and put the original code in source_py2. Debug the test suite on the Python 3 version and edit it until all the tests pass. Fix your setup.py files to take the correct source folder using the snippet I gave above, and confirm that it works by creating a source distribution and installing it on empty virtualenvs of both Python 2 and Python 3.
So far so good; you now have a working version of your code that works for both Python versions. What you do at this point is create a Git branch called parity pointing to this commit. You push it to your Git remote, of course. You make the following rule, either with yourself in case of a single developer or with your fellow developers: You merge code to parity only if the Python 2 codebase and the Python 3 codebase are equivalent. Equivalent means that if a feature has been implemented in one, it was merged (more about how later) to the other. If a bug was fixed in one codebase, it was merged to the other. Never let anyone push code to the parity branch if that code doesn’t have parity between Python versions.
Now, how do you actually do the merge? Say that on your development branch you’ve developed a new feature in the Python 3 codebase, and you want to merge it into the Python 2 version. (If you want to go the other way, just flip 2 and 3 in my explanation below.) What you do is this: First you ensure that you committed your change. Then, you create a local clone of your Git repo, with the parity branch checked out. (Do a git pull to be sure that you have the latest version.) Fire up your merge program and do the following three-way folder merge:
Set the first column to the source_py3 folder in the clone, which has the parity branch checked out, without your new feature.
Set the second column to the source_py3 folder in the original git repo, which has the development branch checked out, and does include your new feature.
Set the third column to the source_py2 folder in the original git repo, which has the development branch checked out, but does not include your new feature because it’s the Python 2 folder.
The merge you’re doing can be verbally described as: “Take the difference between the old Python 3 codebase and the new Python 3 codebase, and apply it to the Python 2 codebase.” In practice, it’s done like so: You go over the list of files, looking for files which changed between column 1 and column 2. For each file like that, you open it for file merging. Your merge program will show the 3 different versions of the file, with differences between each two columns clearly marked. You put the caret on the middle column, and page through the differences. (Preferably using a keyboard shortcut like alt-down, consult your merge program documentation.)
As you go down the file, you’ll see three kinds of differences. Differences between column 1 and 2, between column 2 and 3, and between all columns.
When you see differences between column 1 and 2, merge that snippet from column 2 to column 3, probably by using a keyboard shortcut like ctrl-right. (This takes new code from the Python 3 codebase and copies it over to the Python 2 version.) Do take a brief look at the code you’re merging to ensure it’s Python 2 compatible.
When you see differences between column 2 and 3, ignore them and move on. These are existing differences between your two codebases which you’ve already approved before.
When you see differences between all three columns, it’s time to wake up from your merge-induced coma. You’ve hit upon a sensitive line, which is different in Python 2 and Python 3, and was modified. You’ll probably want to manually edit the Python 2 version to add the same functionality in a Python 2 compatible way.
Keep going over all files like that, until you’ve finished with all of them. Save all the files. Then run the test suite on both Python versions, and if there are any bugs, fix them until the suite passes.
Congratulations! You’ve achieved parity again. Commit your changes and push them to the parity branch. If you wish to make a PyPI release at this point, you’re good to go and your code will work on both Python versions.
You don’t have to do this process on every feature; you can do it once in a while, or every time before you merge changes to master.
Notes:
You can also create branch-specific parity branches, for example if you have a fix-foo-bug branch you can create a temporary fix-foo-bug-parity branch to use as your parity branch, so you won’t have to use the same parity branch for all branches.)
If you’re using an IDE, it’s recommended you create two separate IDE projects, one for each Python version, and in each one exclude the files belonging to the other Python version. That way you’ll be sure you’re never editing files of the wrong Python version.
——-
That’s it. The process is a bit complex, but in my opinion the results are worth it; you have 2 completely separate codebases, you don’t depend on either code generation or compatibility libraries, and you can enjoy writing Python 3 idiomatic code on the Python 3 codebase.
1 note
·
View note