Ionrock Dot Org

by Eric Larson

My Weblog

Rewriting Code

We have a large project that is going to be going through a pretty large change. There will be a new incarnation of the project that hopefully sets the bar for the future. Seeing as that is a pretty big lofty goal and the real world is rarely big and lofty, it seemed like a good idea to write down some specific reasons we need to make rather major changes.

I’m not going to deny the reason for this explanation is to help me feel better about effectively rewriting our application. After all, rewriting software is a really bad idea. I heartily agree that a big rewrite is rarely going to solve problems. But, in this case, the goal is to improve the system. How is that different? When you write software you have bugs and bugs have assumptions. Take persistence for example. You might be getting errors every so often that happen because the database layer is a bottleneck. Fixing the bug might be to include some caching or simply throwing hardware at the issue. The system on the other hand could include a completely different design of the data such that writes can be made incrementally and later compiled into a complete object. The difference here is that you’ve starting changing the assumptions and in doing so opened up a different set of opportunity that could be the difference between constant frustration and actually getting new features.

Our code base, while very successful, has begun to show these sorts of systemic issues that will prevent us from expanding far beyond where we are currently. I say “far” beyond because that is real goal. We want to handle 1000x the load with 1/10th the hardware and that can’t happen given the current system assumptions. Likewise, if we continue to focus on fixing our bugs, we’ll never be able to radically change the system.

The biggest systemic issue that we have is speed. We need to be faster. Our response times need to be much lower and our ability to develop exciting features needs to be faster as well. The current assumption is that there is one data store. The single data store implies that you write to one “place” and read from the same “place”. The problem is that as we’ve grown, the realization has come that we don’t read/write to the same place. There can be an intermediary. Along similar lines we don’t read everything in the same fashion, yet the vast majority of data is in fact read only. Our persistence goals then are to make sure our updates are fast and don’t impact our read performance. There will be a trade off that much of our data will be slightly more stale than before.

Another speed issue deals with development speed. There are infinite possibilities for asking questions, but it is not simple to create another way of asking questions very quickly. This problem become more complex as we enter different platforms (mobile). Here the solution is not as sweeping but simply involves producing some API to our storage that any client can utilize. While this seems simple (GET the question foo, POST the answer bar) in reality there is huge set of assumptions the system makes throughout that have never been truly codified. By codified I mean they have not been defined in a publishable manner in addition to lacking consistency through out the code. This improvement will mean providing a true API that we publish along with tools to make things easier to work with. From there, our hope is that we can have a platform for more customized questions that help us move beyond check boxes and into rich interfaces.

Finally, we need to make our authors faster. I’ve mentioned before that we have a custom language we use for writing questionnaires. These “scripters” as we call them are in fact a mix of programmer, designer, statistician and project manager all rolled into one. As such there is a wide variety of skill levels that need to be supported. In this case our goal will be to extend our scripting environment to better support our more basic users while giving the advanced users tools that can improve everyone’s workflow.

The reality is that while we are effectively rewriting our application there is a clear direction that we want to go. Our first iterations have been naive as to the problems at scale. Now we have an opportunity to take some systemic bottlenecks and hopefully improve things for the foreseeable future. We’re pretty confident that we’ll see a whole new set of problems but hopefully by that time we will have gained enough understanding that we can make another shift to keep growing.


Posted Wed Jan 5 00:12:16 2011 by Eric Larson

Enjoying Sunny Oceanside

I’m at a meeting for work in Oceanside, CA. We are staying a beautiful beach house right on the water, talking tech and writing code. Thanks YouGov!


Posted Thu Jan 13 01:07:22 2011 by Eric Larson

Mercurial Workflows

I have an idea for a workflow using mercurial where you have a simple tool hide things like merging and dealing with multiple heads. I felt pretty good about it until I talked to my manager about it and he mentioned the difficulties of communicating status based on the state of the source repo. Even though I disagree, it is clear that one must be careful not to lose the communicative nature of source control.

My idea for a workflow is based on this video. It is a good idea to go ahead and watch it first before moving on.

The biggest thing to draw from this is the idea that it is not the facilities of your VCS that makes development work with complicated code trees, but rather the etiquette prescribed for the team. If you have a vision for the flow of code, it is best to create that system and use tools instead of allowing the tool to decide your version control strategy. There reason is because your etiquette is a protocol with interop between each developer. In other words, it is a UI that you have to live with for a potentially long time.

With that in mind, my specific workflow does cater to ease of use over expertise. My goal is that you avoid running into trouble using this system. Where does trouble come from? In a word, merges. By merges I mean taking one code line and merging it into another. This distinction is important because it differs greatly depending on your tools. The point is though, most all VC systems contain the idea of a branch and merging is taking those branches, no matter what they contain, and merging them into another. The difference in my workflow is that you avoid taking branches and merging them in favor of working with changesets. Abstract thoughts aside, here is my proposal.

You have a “project”. This the traditional repo for your code. Somewhere there is a canonical version of that code that you do releases from. In my system you have a canonical remote repo and you have a canonical local repo. We’ll call these the remote and local mainlines. The idea is that the remote mainline must always be at a reasonable level of stability. I’m defining “reasonable” here as a developer should be able to clone it and “run” the package successfully on the main branches.

The reasonable level of stability is important because it prescribes a condition such that you shouldn’t be pushing code for others that hasn’t reached a certain level of stability. That doesn’t mean you will never break the build, but it means you understand it is a serious problem when you do. In order to keep this possible problem to a minimum, we utilize a local mainline to stage your pushes.

Traditionally when you have a bug or feature you need to work on you will branch. That will create a copy of your local mainline in a directory called “branches”. You then point your environment at that branch and start working on fixing the bug. The next day you “sync” the branch you are working on. A “sync” is a process that pulls in the stable changes found in the public repo and immediately adds them to your “unstable” feature branch. You are then forced to handle any issues at that time. Likewise, the “sync” process rebases your branch to the new stable mainline. In this way your changes are always going to be easy to apply to mainline.

When you finish your feature or bug fix, the next step is to get the changes into mainline. The first step is to apply those changes to the local mainline. This intermediate step is important because it gives you a change to have staged other code that might not be public yet. For example, if you have a bug that requires changes to other parts of the code, you could work on each and them as they are ready to the local mainline. When you are all done and everything seems to be working correctly, you can then push your changes to the remote mainline.

Doing this with mercurial, the process looks something like this.


% cd $project
% hg clone . branches/foo
% cd branches/foo
% python setup.py develop # point the virtualenv at the branch package

# work on the code
# ok we're don

% cd ../../ # back to the project dir
% hg pull branches/foo
% run-tests
% hg push ssh://mainline/repo/$project

The essential bit is that you “pull” into the mainline branch. Assuming that the branch and the mainline are in “sync”, that makes sure the branch changesets end up on the top of the mainline stack of changesets. The result is that you appear to have a sequence of changesets that can be viewed atomically. You also have not “merged” anything. It is as if you perfectly applied all your changes with the result being a very simple stack of changes ending up on the source tree.

This simple list is advantageous because it removes the complexity of dealing with parents and ancestors. If something has causes a regression, the solution can be to simply pop off changesets until the regression is gone. There is also little confusion if a merge potentially pulled changes that were undesirable.

Going back to my original paragraph where my manager mentioned that by not having the branch in the remote code repo you lose track of what the developers are doing. I do see the benefit. If you want to collaborate on something, you just switch to the branch that work is happening on. If you don’t want the changes, then don’t pull or push them to your working branch.

The problem is when you have to follow that tree of changes. This happens when you have a break in production and you have to ask yourself what the correct baseline is to move back to. This is when trying to understand the parallel lines in your graphical log becomes hopeless. Once you do get a picture of what happened, how do you back it out? Where is the best place to apply the changes? How they propagate between all those parallel branches? I’m not saying that my workflow is totally correct, but I believe when it counts the most, simplicity will make life easier.

An answer to the collaboration question is also very possible. Most developers have a desire to pull at the beginning of the day and push at the end. It is trivial then that you’d always push to a personal repo or branch. Here is an example of a potential filesystem for a suite of projects.


/sourc
├── main
│   ├── project1
│   ├── project2
│   └── project3
└── users
    ├── eric
    │   ├── project2
    │   └── project2-feature-branch
    └── mike
        └── project1

If you want to see what people are working on, then using the idea of the “local mainline” mentioned above, you could keep that copied on the server. The point being is that the etiquette defined by the protocol is the most important function of the version control workflow because it is that protocol that guarantees the release process is exactly the same and that developers can be confident in their actions using the version control system.

This workflow does make the assumption that it is trivial to point the environment at different branches. If that is difficult then I’d argue that there is something wrong. If you want to make sure deployments are simple and the same across N servers, you need to be sure you can create that environment from scratch at the push of a button. Therefore this workflow makes that assumption on the source code.

Lastly, I’m sure different tools would make the workflow different. My understanding is that git gets branching right such that some of the problems that this workflow solves might not be issues. In my mind it doesn’t matter what tool you are using. All that matters is that when you get a source tree you can easily see the obvious path the code took getting from point a to point b. The analogy is like publishing. You keep drafts private, sometimes you share them with select people and collaborate for different sections, but at some point you publish it and at that point you can’t simply change things. If you ask me, the same technique and process applies to source control.


Posted Sat Jan 29 00:44:02 2011 by Eric Larson

Twitter

Links

Reading

Created using Python, jQuery and Emacs