Thursday, August 06, 2009

least-invasive development improvement

LIDI - Least Invasive Development Improvement (team-oriented)

Attempting to coin a term for what I have been attempting to do for the last 8 years in maturing a very small development team that supports many projects.

Definitions
Small development team defined as under 10 people, including UI, Server, DB, and internal dev QA.

Many projects defined as 10-25 active, supported solutions, with >50% of them being unique solutions, while the rest may be re-tooled/variations of existing solutions.

Small to medium projects defined (from a LOC standpoint) of between 10k and 500k. Most are web-based, but some are thick client. Most are 3-tier/n-tier, some are 2-tier thickclient-DB type solutions.

Key Words
*logging
*unit testing
*performance testing and review
*scalability testing and review
*security testing and review
*configuration management
*runtime management
*runtime dependency checking/management (i.e. notification of issues)
****Business problem solved
****Expectations met

Prefix
I put the last two, business problem solved and expectations met with many asterisks because, like many developers have experienced, doing all the performance/scalability/security testing in the world won't help you if you have to recode/redesign it again and then have to re-do all the performance/scalability/security testing and review again.

Discussion
When working with a small development team that is already fighting with project priority conflicts, short deadlines, short requirements, and constant support and change-requests, the last thing on your or their mind is ADDING more work.

The above term, least-invasive, is on purpose -- there is no free lunch, there is no silver bullet. There will be compromises, but if you maintain a goal of trying to make it as least invasive as possible, and able to show and provide reasons and results that are touchable/matter, you will mature and progress!

Experiences

Step 1: Baseline

I know your first thought - I don't have time to come up with baseline metrics, we are already going nuts! Guess what, I'm *not* talking about baseline metrics! I am talking about getting the development **process* repeatable and stable -- that is your base for everything you do.

*Baseline: Convention

Yes, I borrowed this term from the Maven team. Make sure all your projects follow a similar folder layout, for example all java code is in /src/main/java, all html/jsp is in /src/main/webapp, etc. Get the team to the point where someone can checkout a project they never touched before and be able to know where to go/what to do.

*Baseline: Independent builder

Either find a person who will always be an independent builder, or setup some type of continuous integration system. Having someone/something ELSE do builds than the developers will greatly stabalize the process and document/flush out any outstanding issues you have in the build process. This is a deadly experience I learned from my VB6, and now that I'm in java I've choosen http://hudson.net as an independent build tool, while continuum/cruisecontrol are other alternatives.

*Baseline: Build system/dependency management/versioning

I'm sure you just ran into a snag -- with an independent builder, you are learning there are different ways people are building, or worse, they are relying entirely on an IDE for the builds. Moving to ANT or Maven2 build system in java, in my case Maven2, helped to ensure that the builds are *consistent* and any gotcha's can actually get caught EARLIER than later. Let me say that again with an example - "This maven2 project is not building on my desktop, what a piece of crap." actually translates to "the project doesn't build on JDK1.4/JDK5/Windows/Linux/needs a library I forgot to add, lets fix it now while we're actively on the project instead of when we check it out 6 months later to fix a different issue.".

Maven2 also helps with the dependency management problem, and the versioning problem. If you are always renaming your jars to be mylibrary.jar to include in your application, and you aren't sure which version that library is after-the-fact and trying to identify an issue, you know the problem.

*Baseline: Promotion process

This will be the most difficult baseline to adjust - a promotion process. What I mean by this is, based on my experience that seems to be working, you develop and deploy to a DEV environment. Work out the kinks as you know it. If you are lucky enough to have an internal QA, have them review it on DEV. Then, when things look o.k., *promote* to a STAGING environment (including different DB, server, everything). NEVER make custom tweaks on Staging, instead always modify your promotion, or migration, scripts/process as those migration scripts/process is exactly what you are also testing that will be used when you promote to Live. On Staging, you do UAT/Customer Acceptance, have them push it back if needed, make fixes on DEV, them promote back up to Staging for another review. THEN promote to live.

Step 2: Improve ability to identify and fix basic stuff
What I mean by this, is let the developers use the tools they are already comfortable with. Unit Testing. Diagnostic Logging (or normal logging if you aren't familiar with the different logging types).

Unit Tests: Junit is a great. Nunit exists for the other side. Having a way to test the code is doing what you want to, AND BE ABLE TO REPEATABLY AND AUTOMATICALLY run those tests is the goal. This is not integration testing, just basic module/unit testing that the code is behaving as expected to whatever business expectations can be resolved in the code.

Diagnostic Logging: Making sure your code can log somewhere, that you can retrieve, and provide useful information to make a correction, is this goal. "It broke!" well, you need to know what caused it to break, and the *quicker* you can do that, the more time you'll have for other things. Rather than re-testing manually with system outs, get your logging taken care of. This will not fix all your issues, but if you can get the easy 80% out of the way, that's huge. My experience we are still having some challenges, as there are some custom ways that are already in place, and people have a hard time breaking out of the sysout approach. I think I'm satisfied with using SLF4J, then just letting the Log4j implementation for logging and controlling the log verbosity (and the formatting of the logs....nothing worse than custom logging that has many different outputs, get it consistent!).

Step 3: Your walking, your walking, lets try jogging.
By this point, you should be o.k. now, and taking care of business. Now should be able to look at some more technical things to improve the development process.

Codability: This is where the static code analysis tools come in, that are, again, not that invasive to use. Whether you do it from a report generation standpoint, or integrated into the IDE, tools like PMD, Checkstyle, FindBugs have the potential to ferret out potentially poor code. This is no replacement for peer review in any fashion at all, this is just a convenient way to identify common issues (note: these tools are not stone-cold rules, there are times things have to be coded a certain way).

Testability: Coverage tools like Clover, Emma, Jcoverage can help on your unit testing side to see if you can increase the amount of testing, and catch certain flow-changes (if/else/case/etc) in the code that aren't tested as well right now.

Step 4: The in-deep stuff
Once you reached step 4, you can look at the items I listed back at the top of this page. Notice that I really didn't hit certain items --

Profiling is actually an invasive process most of the time, I haven't found a tool that can easily identify memory issues or performance bottlenecks *for* you, instead they, rightfully, require you to review the information and come to your own conclusions. Tools like TPTP, Jprobe, jprofiler, etc aren't quick-fix tools, you need to learn them and understand them, and are useful in different scenarios.

Multi-tier profiling/review: Tracking from the Web tier, through the server tier (rules/workflow/business logic), to the database tier (sql/db, or sproc) to help identify where in the tiers a particular slowdown or issue is occuring from. Not something easy to do - some tools, like deprecated InfraRED and Glassbox, attempted to make this easier for us, but they don't seem to be active.

Integration testing: Actually being able to do business testing across an entire integrated system programatically, performance testing SOA or the full cycle of pressing a button in the UI are all desireable goals, but not easy to setup and do.

Automated UAT/User Interface testing: Some neat tools, like Selenium, can help step through testing a website and ensure things continue to work as expected. It's a great tool, but if you are constantly making changes, keeping those Selenium tests up to date can get time consuming. Also, need to know they do NOT identify blemishes/non-intuitive interface, only that the interface is continuing to work as expected.

Scalability testing: testing 10-years later equivalent worth of data, testing 5,10,50, 1000 concurrent users, evaluating estimate load ability per setup (proxy/multi-app servers/single db, db clustering, etc). You can also throw in disaster/recovery as part of the scalability testing as well. These are all very manual, very concious development purposes and is definitely invasive and time consuming.


Conclusion
Well, this looks more like a brain-dump than an organized blog, but sometimes just dropping information can be helpful to other people, and could solicit useful feedback!

Post-Edits:
A good article on related subject: http://www.ddj.com/architect/184415470