My Linux-based Large-Scale Cloning Grid experiment, which I’ve implemented four times now (don’t ask, unless you are ready for a few looooooong stories), has three main components to it.  The first, which provides the magic by which the experiment can even exist, is the z Systems hypervisor z/VM.  The second is the grist on the mill, both the evidence that the thing is working and the workload that drives it, the cluster monitoring tool Ganglia.  The third, the component that keeps it alive, starts it, stops it, checks on it, and more, is a Perl script of my design and construction.  This script is truly multifunctional, and at just under 1000 lines could be the second-most complex piece of code [1] I’ve written since my university days [2].

I think I might have written about this before — it’s the Perl script that provides an IRC bot which manages the operation of the grid.  It schedules the starting and stopping of guests in the grid, along with providing stats on resource usage.  The bot sends grid status updates to Twitter (follow @BODSzBot!), and recently I added generation of files which are used by some HTML code that generates a web page with gauges that summarise the status of the grid.  The first version of the script I wrote in Perl simply because the first programming example of an IRC bot I found was a Perl script; I had no special affinity for Perl as a language and, despite me finding a lot of useful modules that have helped me develop the code, Perl actually doesn’t really lend itself to the task especially well.  So it probably comes as little surprise to some readers that I’m having some trouble with my choice.

Initially the script had no real concern over the status of the guests.  It would fire off commands to start guests or shut them down, or check the z/VM paging rate.  The most intensive thing the script did was to get the list of all users logged on to z/VM so it could determine how many clone guests are actually logged on, and even this takes a fraction of a second now.  It was when I decided that the bot should handle this a bit more effectively, keeping track of the status of each guest and taking some more ownership over maintaining them in the requested up or down state, that things have started to come unstuck.

In the various iterations of this IRC bot, I have used Perl job queueing modules to keep track of the commands the bot issues.  I deal with sets of guests 16 or 240 at a time, and I don’t want to issue 2000 commands in IRC to start 2000 guests.  That’s what the bot is for: I tell it “start 240 guests” and it says “okay, boss” and keeps track of the 240 individual commands that achieve that.  This time around I’m using POE::Component::JobQueue, the main reason being that the IRC module I started to use was the PoCo (the short name for POE::Component) one.  It made sense to use a job queueing mechanism that used the same multitasking infrastructure that my IRC component was using.  (I used a very key word in that last sentence; guess which one it was, and see if you’re right by the end.)

With PoCo::JobQueue, you define queues which process work depending on the type of queue and how it’s defined (number of workers, etc).  In my case my queues are passive queues, which wait for work to be enqueued to them (the alternative is active queues, which poll for work).  The how-to examples for PoCo::JobQueue show that the usual method of use is a function that is called when work is enqueued, and that function then starts a PoCo session (PoCo terminology for a separate task) to actually handle the processing.  For the separate tasks that have to be done, I have five queues that each at various times will create sessions that run for the duration of the item of work (expected to be quite short), one session running the IRC interaction, and one long-running “main thread” session.

The problems I have experienced recently include commands queued but never being actioned, and the updating of the statistics files (for the HTTP code) not being updated.  There also seems to be a problem with the code that updates the IRC topic (which happens every minute if changes to the grid have occurred, such as guests being started or stopped, or every hour if the grid is steady-state) whereby the topic gets updated even though no guest changes have occurred.  When this starts to happen, the bot drops off IRC because it fails to respond to a server PING.  While it seems like the PoCo engine stops, some functions are actually still operating — before the IRC timeout happens the bot will respond to commands.  So it’s more like a “graceful degradation” than a total stop.

I noticed these problems because I fired a set of commands to the bot that would have resulted in 64 commands being enqueued to the command processing queue.  I have a “governor” that delays the actual processing of commands depending on the load on the system, so the queue would have grown to 64 items and over time the items would be processed.  What I found is that only 33 of the commands that should have been queued actually were run, and soon after that the updating of stats started to go wrong.  As I started to look into how to debug this I was reminded about a very important thing about Perl and threads.

Basically, there aren’t any.

Well that’s not entirely true — there is an implementation of threads for Perl, but it is known to cause issues with many modules.  Basically threading and Perl have a very chequered history, and even though the original Perl::Thread module has been replaced by ithreads there are many lingering issues.  On Gentoo, the ithreads use flag is disabled by default and carries the warning “has some compatibility problems” (which is an understatement as I understand it).

With my code, I was expecting that PoCo sessions were separate threads, or processes, or something, that would isolate potentially long running tasks like the process I had implemented to update the hash that maintains guest status.  I thought I was getting full multitasking (there’s the word, did you get it?), the pre-emptive kind, but it turns out that PoCo implements “only” cooperative multitasking.  Whatever PoCo uses to relinquish control between sessions (the “cooperative” part) is probably not getting tickled by my long-running task, and that is interrupting the other sessions and queues.

So I find myself having to look at redesigning it.  Maybe cramming all this function into something that also has to manage real-time interaction with an IRC server is too much, and I need to have a separate program handling the status management and some kind of IPC between them [3].  Maybe the logic is sound, and I need to switch from Perl and PoCo to a different language and runtime that supports pre-emptive multitasking (or, maybe I recompile my Perl with ithreads enabled and try my luck).  I may even find that this diagnosis is wrong, and that some other issue is actually the cause!

I continue to be amazed that this experiment, now in its fourth edition, has been more of an education in the ancillary components than the core technology in z/VM that makes it possible.  I’ve probably spent only 5% of the total effort involved in this project actually doing things in z/VM — the rest has been Perl programming for the bot, and getting Ganglia to work at-scale (the most time-consuming part!).  If you extend the z/VM time to include directory management issues, 5% becomes probably 10% — still almost a rounding error compared to the effort spent on the other parts.  z Systems and z/VM are incredible — every day I value being able to work with systems and software that Just Work.  I wonder where the next twist in this journey will take me…  Wish me luck.

—-

[1] When I was working at Queensland Rail I worked with a guy who, when telling a story, always used to refer to the subject of the story as “the second-biggest”, or “the second-tallest”, or “the second-whatever”.  Seemed he wanted to make his story that little bit unique, rather than just echoing the usual stories about “biggest”, “tallest”, or “whatever”.  I quizzed him one day, when he said that I was wearing the second-ugliest tie he’d ever seen, what was the ugliest…  Turned out he had never anticipated anyone actually asking him, and he had no answer.  Shoutout to Peter (his Internet pseudonym) if you’re watching 😉

[2] Despite referencing [1], and being funny, it’s actually an honest and factual statement.  I did write one more complex piece of code since Uni, being the system I’d written to automatically generate and activate VTAM DRDS decks (also while I was at QR) for dynamically updating NCP definitions.  Between the ISPF panels (written in DTL) and the actual program logic (written in REXX) it was well over 1000 lines.  The other major coding efforts that I’ve done for z/VM cloning have probably been similarly complex to this project but had fewer actual lines of code.  Thinking about it, those other coding efforts drew on comparatively few external modules while this Perl script uses a ton of Perl modules; if you added the cumulative LOC of the modules along with my code, the effective LOC of this script would be even larger.

[3] The previous instanciation of this rig had the main IRC bot and another bot running in a CMS machine running REXX for doing stuff in DirMaint on z/VM.  The Perl bot and the REXX bot sent messages to each other over IRC as their IPC mechanism.  It was weird, but cute at the same time!  This time around I’m using SMAPI for the DirMaint interface, so no need for the REXX bot.

Tags: , , , , , , , , , , ,