makovich-blog - Tumblr blog

makovich-blog · 8 years ago

Text

Fooled by Ants

I’d been watching one of the philosophical talks about modern AI before heard of ants that fooled scientists for over 10 years. This reminiscent me of something what I couldn’t figure out for some time. Eventually, I saw the parallels. A parallel with managing software projects by those who have little to no knowledge in computer science and programming. They are like researchers or explorers who are looking at ants trails and blind themselves with an idea of intelligent ant. Oh! I forgot to tell you the story!

There was created an experiment in which a territory have been split up in two parts linked together with a string. If you, as an ant, will pull the line at the one side, the other side would open for a second a honey depot. However, you, as an ant, would not get any feedback about this event — the depot is too far from you. You can learn from this act literally nothing. In reality, thought, ants broke themselves up into units, and while one group takes the honey to a new home, another guys twitch the switch as a mad. These observations lead scientists to make a conclusion that ants are smart enough to communicate such a complex environmental setup to each other. This was wrong. For 10 years*.

As an ant researcher, you mainly interested in the ant’s mind. You are trying to explain, how the creature with a dozen of neurons are capable to demonstrate such a complex human-like interactions. You could even spent years with your false believes and base a theory on them. Later on, you might find out that your assumption was wrong and it’s fine — this is how science works. However, one may decide to apply the same strategy to manage a software project. You may focus yourself on a seemingly intelligent and purposeful behavior of your colleagues and rely upon this demonstration when doing a decision making.

If one, as a project manager or a software developer, is not looking at the project (computer program itself and its surrounding things), not measuring its internal state and quality, but only trust the demonstration, how that could help anyone to manage and to lead the project at all in the right direction? Don’t they do a death mill rather than an anthill (or better saying Death march)? It is still hard to find reliable metrics and means to measure internal quality of a software written in any language on any platform. Nevertheless, it is possible to shift your mind toward more objective and accurate thinking on what your project may allow you to do.

Trust is important. We only need to remember about miscommunication which can easily turn friendly environment into unhealthy one.

* The answer, by the way, is quite simple. There are several groups in an ant colony. One group always love to twitch something, so they just pull the strings everywhere while exploring an area. Another group does what they have to do when they met any food. This video explains how their communication language allows them to solve complex tasks altogether.

0 notes

makovich-blog · 8 years ago

Text

Miscommunication

youtube

Sometimes miscommunication can be a very funny thing to watch at. However, when you’re doing software it might cost you high.

Customer: We need to finish refactoring for a new version and new features. Me: How have you estimated your ROI of the refactoring? C: We were not able to implement new features with a current database structure. We have to adopt it to meet requirements. M: Aha, I see the point!

Actually, what I did see was not the reality: there were two distinct applications – v1 and vNext – which have very little in common. Except the stakeholders confidence that apps are... one app refactored. If you’re an engineer your expectations are that if someone uses technical jargon – refactoring – he knows what he’s saying. Well, it ought to be like that.

The idea of refactoring needs from one understanding that there are external and internal in a software. Imagine geeky-lady that built a robot but later found out that each time it falls from stairs down to the floor it smashes his head. The head, however, contains a very expensive thing which leads our smart lady to the idea of moving this thing right to the robot’s bottom. From the external view point there were no changes – robot is still walking, falling down and asking you to bite his shiny metal part. The internal composition has been changed, though.

When one do code refactoring (or database refactoring) she basically operate on internal things without making changes to the external behavior that people whom are using the software usually see. That means that in order to make any robust change, you have to have a way to test your change. By placing thing in the robot’s butt you may lose his ability to balance. In the software world you are using any kinds of tests (made by human or automated, functional or non-functional, database, UI, end-to-end, acceptance, unit tests, smoke, performance, etc). You have to decide and find your own way of making things work again in an expected for the system’s user manner and in conformity with your SLAs.

From another view point, refactoring is a way of paying down Technical Debt without rewriting (remaking) the entire application (lady’s robot) from the ground up. As an engineer you’re solving a problem by expressing it with programming language to a computer and to your colleagues. Sometimes you don’t have clear understanding of a domain that you’re working in but you understand that you need to throw software out of the door in order to get the feedback or knowledge from the environment in which it have to operate. This half-baked decision is a source of debt that you have to pay later. You may postpone for some time switch-head-with-butt operation for your robot but you know that at some point you would pay for that delay fully.

Refactoring also a tool helping you to fight with cruft. This concept is related to technical debt and often used interchangeably. Static code analysis will help you to measure this kind of debt. Right now you just need to get the idea and feeling of what cruft actually is and how refactoring helps (BTW, it can be too late to bring in any order or structure):

https://images.google.com/

As you can imagine, working with such a code base might be very, very hard to do (or even impossible) mental exercise for human being. Every project have to watch out hypothetical tipping point after which it would not be possible to make any further movement either for adding a new feature or fixing an old bug. The only exit would be is to rewrite the whole system from scratch but you have to know that this lesson learned already by industry and oftentimes this effort does not succeed. At the moment, tipping point hunting is more an art than a science.

Time to first commit as a measure of a technical debt

If you have a working application, then, in an ideal world, time to your first fix or feature plan might look like this:

Grab the code from source control system;

Spin-up application within your local development environment;

Understand the fix/feature requirements and its owner’s intention;

Get things done and verified;

Push your changes toward responsible for the code integration and deployments person.

There is, however, one more bullet in the list above that is spread across all the others: understand your colleagues�� ideas behind solution and infrastructure implementation. In order to get up to speed with a solution structure and find the right places for enhancements or updates, one have to understand the big picture which includes hard to change architectural style decisions, used design principles and methodologies, patterns language and its concrete implementation, critical to success algorithms and data structures and many more things at different levels of a computer program.

Good design and proper abstractions allow you to meet you change requests relatively fast as in canonical “change this button’s color to blue”. Rigid design may stick you with nothing. This rigidness, by the way, could be made for someone’s purpose. Non-removable toothbrush’s battery or a system that is known by only a small group of people who want to benefit from this knowledge are examples of that intention.

Sometimes a domain (sphere of knowledge in which business are operating) itself is so complex, huge, or maybe still in flux, that you need to have quite sophisticated organization-wide framework to help you and your company continuously encode rules within software in order to make them evolvable through the time. If so, as a software engineer, you need to understand either technical jargon (The-Pipe-Of-Abstract-Singleton-Factories-With-Composite-Filter-Builder-Strategies) and specific to your company profile language (physics, banking, medicine, sport, software).

If these complexity (essential and accidental) was not handled properly by software developers, time to the first code commit might take unexpectedly long, especially if you dealing with The Big Ball of Mud project. Your goal is to identify and address the sources of such a complexity be it feature creep, imperfect software design and development skills, or company’s need to survive.

In conclusion, you can control things if you know what to measure and how to do it. Just be sure that miscommunication problems are solvable in your surrounding culture.

0 notes

makovich-blog · 8 years ago

Text

Exploring Git History

Here are useful tips that allow me to read between the fresh-cloned repository lines.

git branch --all --verbose

List all available branches with the most recent commit messages. Your main goal is to identify a branching strategy. Meditate a bit on these names. Try to see the patterns.

You can go deeper and see the total number of commits for each branch.

git rev-list --count origin/master

Find out a number of merge commits (in our example, near 25% of history messages are merge commit ones).

git rev-list --count --no-merges origin/master

The higher percentage could probably point at some auto-merge policy used (for example, Pull Request contribution style).

Using simple shell script and useful git for-each-ref command it is possible to iterate through all the branches.

git for-each-ref --sort=-committerdate refs/remotes/ --shell --format='%(refname:short)' | xargs -I REF sh -c 'printf "%-30s %10s %10s\n" "REF" `git rev-list --count REF` `git rev-list --count --no-merges REF`'

Yet another goody will show you the last commit dates and author names so one can see outdated branches.

git for-each-ref --sort=-committerdate refs/remotes --format='%(align:35)%(refname:short)%(end) %(committerdate:short) %(align:20)%(authorname)%(end) %(objectname:short)'

To get amount of work made per repository contributor for a single branch use git shortlog -sn command either with --no-merges hint or without it.

Let’s get more from history. Let’s find more frequently updated files in the branch. Be sure to play with --since= option of git log command if you have huge history and/or interested only in recently done changes.

git log --pretty=format: --name-only | sort | uniq -c | sort -r | head -30

This list sometimes provides you insights about what matter most for your company/project because here first domain terminology comes up. These files are also helping you to see developers habits and possible design issues. Look at the first line with 154 changes. Maybe it’s Fat Controller anti-pattern (which is a sort of God Object). This might lead you that short OOAD training for the team would be a good investment. This fatty file living in the repository, of course, could be okay thing – maybe you’re just seeing a throw away prototype or auto generated code. Simple review would not take much time but you’ll become more informed and prepared person.

Another way to reveal team’s habits are reading single file history. Take one from the list above and see how it evolved and why. The clean history will help you understand the direction and purpose for any code change. Dirty and useless history may signal you that requirements change requests and code changes were orbiting different stars.

You can start from making alias for:

git log --all --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative <filename>

However, there are number of tools that you can fall in love with.

On Windows machine I prefer TortoiseGit. It can be well run from command line for any path with TortoiseGitProc.exe which can be wrapped to a shortcut like tgit. Do not bypass statistics carefully collected by TortoiseGit.

Tig is another wonderful application that does all the things in a pure text mode. Amazing git helper for shell lovers.

For Mac OS users, there is GitUp tool which is very simple yet powerful repository visualizer. If you want to understand your branching issues, this is the best choice in my opinion. Like from a bird’s-eye view you’ll see project’s highway interchange.

These small steps in the very beginning still play well in situations where I meet the new project with its own history. It’s worth it.

0 notes

makovich-blog · 10 years ago

Text

What is a Branching Strategy?

We should start from somewhere, so let me begin from an overview of the most popular branching strategies or models. This will help us define a basic vocabulary in order to draw a context around the opening question. However, we need any working definition and very nice one was made by Derick Bailey:

A branching strategy is nothing more than an understanding of when you should branch your code and when you should merge that branch, where.

Branching for entertainment

Stable Trunk

Working on a release branch and when your acceptance criteria have been met, you’re able to merge updates to the mainline. Depending on a type of a software that you are working on (web application, desktop, embedded or distributed systems) and your release plans, you may publish/deploy a new version.

https://paulhammant.com/

Integration Branch

As name implies, this kind of strategies stimulates any sort of branching. But this model locates furthest from Continuous Integration ideas (and as some of you noticed Continuous Delivery/Deployment too). I don’t say that it is not possible to have a good CI solution with many branches. I’m saying that here you just postpone real integration procedure by working on a separate branch and delaying your feedback loop.

This model is almost inevitable if you have several teams with their own release schedules. Probably component based development would be better choice in this case.

https://www.slideshare.net/

Unstable Trunk or Trunk-based development

Day to day commits on a trunk branch. When you’re ready to release just cut a branch. Simple. Straightforward. CI-compatible. But needs some sort of an experienced guy who are able to “see” when it is needed to cherry-pick some commits to or from a release branch.

https://paulhammant.com/

Promiscuous Integration

This name was proposed by Martin Fowler:

With this approach they only push to the mainline at the end, as before. But they merge frequently with each other, so this avoids the Big Scary Merge. (…) So is this more ad-hoc integration a form of CI or a different animal entirely? I think it is a different animal, again a key point of CI is everyone integrates to the mainline every day. Integrating across feature branches, which I shall call promiscuous integration (PI), doesn’t involve or even need a mainline.

https://martinfowler.com/

GitFlow

Often people tend to believe that they are using something like GitFlow (a.k.a. A successful Git branching model). But in the real world of extreme situations, they might violate the rules implied by this strategy, and in the end, they could have something ugly.

https://continuousdelivery.com/

GitFlow has many nuances and asks from a team the Spartan’s discipline.

Moar!

Feature Branching

Branch by Team

Branch by Sprint

Branch by Abstraction

Let’s take one step back

Why do we need branching at all? Karl Bielefeldt says:

Unless you are all working out of the same working tree, you are using branches, whether you call them that or not.

So see in the branches a potential releasable version of your program. While any team works on a project, each member have been working on its own version. And the longer the work goes, the more different versions they all will have. “It works on my machine!” — have you heard that before?

However, only one single version can be run at a time and that’s the essence of Continuous Integration. Single version integrating all the changes from all the sources that can be verified for any requirements compliance and be ready for release. One of the key characteristics of a good CI solution is a short feedback loop. You’ve made a commit, build server makes his routines and viola — you are receiving report about what was done and where you’re screwed up.

Again, even if you try to make the world better making a super-refactoring in a feature branch, that brave new world do not stand still and your chances of meeting the Big Scary Merge are hugely increased.

There are several other reasons why teams may choose to branch their code.

Physical: branching of the system’s physical configuration — branches are created for files, components, and subsystems.

Functional: branching of the system’s functional configuration — branches are created for features, logical changes, both bugfixes and enhancements, and other significant units of deliverable functionality (e.g., patches, releases, and products).

Environmental: branching of the system’s operating environment — branches are created for various aspects of the build and runtime platforms (compilers, windowing systems, libraries, hardware, operating systems, etc.) and/or for the entire platform.

Organizational: branching of the team’s work efforts — branches are created for activities/tasks, subprojects, roles, and groups.

Procedural: branching of the team’s work behaviors — branches are created to support various policies, processes, and states.

But a bit earlier, the same authors of Continuous Delivery book — Jez Humble and David Farley — wrote that only three of above are worth it:

There are three good reasons to branch your code. First, a branch can be created for releasing a new version of your application. This allows developers to continue working on new features without affecting the stable public release. When bugs are found, they are first fixed in the relevant public release branch, and then the changes are applied to the mainline. Release branches are never merged back to mainline. Second, when you need to spike out a new feature or a refactoring; the spike branch gets thrown away and is never merged. Finally, it is acceptable to create a short-lived branch when you need to make a large change to the application that is not possible with any of the methods described in the last chapter — an extremely rare scenario if your codebase is well structured. The sole aim of this branch is to get the codebase to a state where further change can be made either incrementally or through branch by abstraction.

Avoid Continuous Disintegration

We have different flavors of continuous integration practices. Starting from CI itself than going through Continuous Delivery and ending by Continuous Deployment.

All of them united by an idea of multi-level feedback loop that provide to a human in a relatively short time observations about work that they have just done. TDD (or some guys calling it Design by Testing) allows you to see how your programming language can express your design ideas. Unit test gives you understanding that your basic algorithm or happy/sad paths are valid. Integration testing brings you ability to see how groups of classes, modules, or components interacting with each other, or how well 3rd parties (database or email SaaS, for example) are integrated with your application.

On a higher level, you could check the code quality by using various static code analysis tools. Functional application quality can be verified by going through packs of automatic regression and acceptance tests. Another level are performance, load and stress tests which also provide you good information about different sides and slices of your project.

Put model simple: You’ve made a modification of your software (no matter application code, database scheme, environment configuration) and publish your update to the real working directory that automatically insures that all the things are still stable by using CI/CD mechanism.

Each time you create a branch you are staying away from CI. More over, the longer branch lives, the greater the likelihood of subsequent integration errors. Until of course you know what you’re doing. So ask yourself: Do you need that feature branch? How long is it going to live? What is the cost of merging? Do all the team members know what your branch are created for? Do you have all the tests that reassure your in that all existing functionality still working fine and as expected? Did you and all of yours colleagues really understand your branching strategy?

Branching is not the problem, merging is the problem. — Jez Humble

Life without branching

Reasonable question: If branching is a potential step back from a continuous stability and integrity, how can one deal with releases when epic-like task or a huge refactoring takes place on a road?

There are four strategies to employ in order to keep your application releasable in the face of change:

Hide new functionality until it is finished.

Make all changes incrementally as a series of small changes, each of which is releasable.

Use branch by abstraction to make large-scale changes to the codebase.

Use components to decouple parts of your application that change at different rates.

The first one is clear: Just use application settings and condition variables in your code or feature toggling frameworks. Second option is about more careful release planning, backlog grooming, or deep understanding of how to splitting unsplittable. Third, is a well known pattern proposed by Paul Hammant. And the last one bullet means a good knowledge of your domain and change rates of individual modules of designed system.