Yesterday, at dtrace.conf, Jarod Jenson gave a presentation on why he thinks dtrace has not seen greater adoption by system administrators across the spectrum of varied IT departments around the world. (Jarod starts his presentation at the 40 minute mark.) At the beginning of his talk, Jarod mentioned he didn’t think dtrace’s syntax was a problem and I largely agree with that. Like any language syntax, dtrace becomes more familiar the more you use it. I believe Jarod hinted upon the correct answer a little later in his presentation but in my opinion he missed the mark a bit.
At my last job, I worked as the manager of a team of Solaris administrators for a large public university in Texas. This was my first exposure to an operating system that was dtrace enabled. At the beginning of my tenure, the shop ran mostly Solaris 9 systems with one or two Solaris 10 boxes hanging around looking cool. By the time I left the university five years later, we had migrated nearly our entire catalog of supported applications into zones running on Solaris 10 with ZFS backing the whole enchilada. During this migration, my cohorts and I rarely practiced the dark arts of dtrace. Only when things went really wonky did we start writing D to figure out why.
Herein lies the mark I think we all missed while discussing Jarod’s points yesterday. First and foremost, yes, dtrace is difficult to learn but so is system administration and that hasn’t stopped a lot of smart people from doing it everyday. Clearly, being difficult is not an insurmountable barrier. There is a second issue which I believe is the real deal breaker for dtrace adoption; the fact that dtrace isn’t truly needed that often. I mean, “hard to learn” and “don’t need it that often” is a hard sell to any resource constrained system administrator. My team at the university and I fell back on dtrace only when all other tools failed to do the job. In our view, the vast majority of system administration problems can be sovled with well worn tools like iostat, prstat, mpstat, snoop, vmstat, mdb, etc.
I believe it was Bryan who mentioned that you become a dtrace convert once it pulls your ass out of a fire once or twice. That was certainly the case for myself. I solved some rather wonky problems and got to be the hero a couple of times. Thusly, I fell in love with all the power dtrace affords and became a convert. Behind the scenes, however, I had to work damn hard to earn a pulls asses out of fires with dtrace achievement. The majority of the time I spent earning the achievement was at home and not during work hours. Things like like LDAP clusters needing attending and new Oracle DB instance builds constantly get in the way of learning.
My dtrace abilities are founded on spending a lot of time with my nose stuck in Richard McDougall and Jim Mauro’s Solaris Internals book. Serious dtracing requires serious understanding of the OS you are instrumenting. I don’t think many professional administrators in companies that are just trying to keep the lights on have the time nor the inclination to climb the wall of dtrace since they can solve the vast majority of their system problems with well worn tools (or hire Jarod). Quite frankly, I don’t believe I’ve ever met another administrator that was interested in learning dtrace that didn’t first have an obsessive devotion to computing that started at a young age. This describes me and probably most of the other folks at dtrace.conf yesterday.
(An entertaining thought entered my head shortly after I wrote this post. If the Oracle guys get their Linux dtrace port working well, we may see a significant uptick in dtrace adoption due to the fact that the well worn Linux stat tools are not as complete as their Solaris counterparts.)
I’d like to hear what other folks in the community think. I’ll open up the comments and make an attempt to keep the spammers down.