Lately I've been thinking about my experiences as a developer learning new systems. Having done a little job hopping as of late, it has become apparent that there are many issues involved when learning a new system, especially when the system is written in a dynamic language. This all started learning an XSLT based system (with some C#), then Ruby/Rails and now Python. Before this, I worked for a company that had a relatively standard C# desktop application. Thinking back, I found it relatively easy to start coding on the C# application. I would find where the issue was happening and start working my way out until I found where a fix should be. I had an IDE to help me along the way and rather specific stack trace to follow. The language was the biggest frustration in that I felt I had to constantly be reading documentation and simple operations took a great deal of work. Python, on the other hand, has been the least of my issues learning this new system.
The hardest aspect is seeing how the different objects and data interacts within the scope of the system. It is a difficult detail to verbalize because in so many ways, the system seems clear. The problem is that with language features such as duck typing and in our case, utilizing exec, following the interactions becomes complicated. Also, at my current job, we have created specific languages for our users, which means that there is a further abstraction of taking the language and executing Python. While it is not insurmountable, it could feasibly be too difficult to do without an original author present to discuss the details. This is not because the code has issues, but rather simply because following the flow of data and the execution stack is just too difficult.
There is not an obvious solution to this problem but there are definitely ways to help work around the issues. When I wrote C#, I was able to follow the stack rather easily using Visual Studio. While this seems like huge win, that stack could be enormous and when a branch could have occurred, things became complicated. In Python, often times the stack has been very small and easy to manage with the difficulties being in finding where functions are and when they get called. For me, etags have been a very helpful way to find the code I'm looking for and it also helps a great deal in understanding the code base. In fact, Emacs' rgrep and tag search have made the physical challenge of finding code very manageable.
Past finding functions and classes, the hardest thing is finding couplings. When the original authors look at the code they see where aspects overlap and interact. Learning to find the relationships is a matter of communicating what variable names actually mean and what sort of patterns you might find within the scope of the design. What I mean is that there are often specific development patterns that are expected with certain objects. This is not always easy to spot because the patterns might rely on subtle requirements that have been developed, where to the untrained eye it appears to be a potentially unoptimized piece of code. These kinds of issues are largely a communication problem and as such have some obvious options for solving.
Communication is never easy, so there isn't an easy fix. Some obvious things to consider are comments and documentation. While these help, it can also make the code harder to maintain. As soon as a comment is out of date, it runs the risk of leading things in the wrong direction. A better option is simple glossary of variable names. This allows you to make a simple lookup file that can be queried for potential object types based on the variable name. It also can help in specifying a variable naming scheme that keeps the readability, while supporting discoverability. I'm sure there are many other means of helping to improve the learnability of large Python applications, so please leave a comment with ideas.