Programming decisions

My Linux-based Large-Scale Cloning Grid experiment, which I’ve implemented four times now (don’t ask, unless you are ready for a few looooooong stories), has three main components to it.  The first, which provides the magic by which the experiment can even exist, is the z Systems hypervisor z/VM.  The second is the grist on the mill, both the evidence that the thing is working and the workload that drives it, the cluster monitoring tool Ganglia.  The third, the component that keeps it alive, starts it, stops it, checks on it, and more, is a Perl script of my design and construction.  This script is truly multifunctional, and at just under 1000 lines could be the second-most complex piece of code [1] I’ve written since my university days [2].

I think I might have written about this before — it’s the Perl script that provides an IRC bot which manages the operation of the grid.  It schedules the starting and stopping of guests in the grid, along with providing stats on resource usage.  The bot sends grid status updates to Twitter (follow @BODSzBot!), and recently I added generation of files which are used by some HTML code that generates a web page with gauges that summarise the status of the grid.  The first version of the script I wrote in Perl simply because the first programming example of an IRC bot I found was a Perl script; I had no special affinity for Perl as a language and, despite me finding a lot of useful modules that have helped me develop the code, Perl actually doesn’t really lend itself to the task especially well.  So it probably comes as little surprise to some readers that I’m having some trouble with my choice.

Initially the script had no real concern over the status of the guests.  It would fire off commands to start guests or shut them down, or check the z/VM paging rate.  The most intensive thing the script did was to get the list of all users logged on to z/VM so it could determine how many clone guests are actually logged on, and even this takes a fraction of a second now.  It was when I decided that the bot should handle this a bit more effectively, keeping track of the status of each guest and taking some more ownership over maintaining them in the requested up or down state, that things have started to come unstuck.

In the various iterations of this IRC bot, I have used Perl job queueing modules to keep track of the commands the bot issues.  I deal with sets of guests 16 or 240 at a time, and I don’t want to issue 2000 commands in IRC to start 2000 guests.  That’s what the bot is for: I tell it “start 240 guests” and it says “okay, boss” and keeps track of the 240 individual commands that achieve that.  This time around I’m using POE::Component::JobQueue, the main reason being that the IRC module I started to use was the PoCo (the short name for POE::Component) one.  It made sense to use a job queueing mechanism that used the same multitasking infrastructure that my IRC component was using.  (I used a very key word in that last sentence; guess which one it was, and see if you’re right by the end.)

With PoCo::JobQueue, you define queues which process work depending on the type of queue and how it’s defined (number of workers, etc).  In my case my queues are passive queues, which wait for work to be enqueued to them (the alternative is active queues, which poll for work).  The how-to examples for PoCo::JobQueue show that the usual method of use is a function that is called when work is enqueued, and that function then starts a PoCo session (PoCo terminology for a separate task) to actually handle the processing.  For the separate tasks that have to be done, I have five queues that each at various times will create sessions that run for the duration of the item of work (expected to be quite short), one session running the IRC interaction, and one long-running “main thread” session.

The problems I have experienced recently include commands queued but never being actioned, and the updating of the statistics files (for the HTTP code) not being updated.  There also seems to be a problem with the code that updates the IRC topic (which happens every minute if changes to the grid have occurred, such as guests being started or stopped, or every hour if the grid is steady-state) whereby the topic gets updated even though no guest changes have occurred.  When this starts to happen, the bot drops off IRC because it fails to respond to a server PING.  While it seems like the PoCo engine stops, some functions are actually still operating — before the IRC timeout happens the bot will respond to commands.  So it’s more like a “graceful degradation” than a total stop.

I noticed these problems because I fired a set of commands to the bot that would have resulted in 64 commands being enqueued to the command processing queue.  I have a “governor” that delays the actual processing of commands depending on the load on the system, so the queue would have grown to 64 items and over time the items would be processed.  What I found is that only 33 of the commands that should have been queued actually were run, and soon after that the updating of stats started to go wrong.  As I started to look into how to debug this I was reminded about a very important thing about Perl and threads.

Basically, there aren’t any.

Well that’s not entirely true — there is an implementation of threads for Perl, but it is known to cause issues with many modules.  Basically threading and Perl have a very chequered history, and even though the original Perl::Thread module has been replaced by ithreads there are many lingering issues.  On Gentoo, the ithreads use flag is disabled by default and carries the warning “has some compatibility problems” (which is an understatement as I understand it).

With my code, I was expecting that PoCo sessions were separate threads, or processes, or something, that would isolate potentially long running tasks like the process I had implemented to update the hash that maintains guest status.  I thought I was getting full multitasking (there’s the word, did you get it?), the pre-emptive kind, but it turns out that PoCo implements “only” cooperative multitasking.  Whatever PoCo uses to relinquish control between sessions (the “cooperative” part) is probably not getting tickled by my long-running task, and that is interrupting the other sessions and queues.

So I find myself having to look at redesigning it.  Maybe cramming all this function into something that also has to manage real-time interaction with an IRC server is too much, and I need to have a separate program handling the status management and some kind of IPC between them [3].  Maybe the logic is sound, and I need to switch from Perl and PoCo to a different language and runtime that supports pre-emptive multitasking (or, maybe I recompile my Perl with ithreads enabled and try my luck).  I may even find that this diagnosis is wrong, and that some other issue is actually the cause!

I continue to be amazed that this experiment, now in its fourth edition, has been more of an education in the ancillary components than the core technology in z/VM that makes it possible.  I’ve probably spent only 5% of the total effort involved in this project actually doing things in z/VM — the rest has been Perl programming for the bot, and getting Ganglia to work at-scale (the most time-consuming part!).  If you extend the z/VM time to include directory management issues, 5% becomes probably 10% — still almost a rounding error compared to the effort spent on the other parts.  z Systems and z/VM are incredible — every day I value being able to work with systems and software that Just Work.  I wonder where the next twist in this journey will take me…  Wish me luck.


[1] When I was working at Queensland Rail I worked with a guy who, when telling a story, always used to refer to the subject of the story as “the second-biggest”, or “the second-tallest”, or “the second-whatever”.  Seemed he wanted to make his story that little bit unique, rather than just echoing the usual stories about “biggest”, “tallest”, or “whatever”.  I quizzed him one day, when he said that I was wearing the second-ugliest tie he’d ever seen, what was the ugliest…  Turned out he had never anticipated anyone actually asking him, and he had no answer.  Shoutout to Peter (his Internet pseudonym) if you’re watching 😉

[2] Despite referencing [1], and being funny, it’s actually an honest and factual statement.  I did write one more complex piece of code since Uni, being the system I’d written to automatically generate and activate VTAM DRDS decks (also while I was at QR) for dynamically updating NCP definitions.  Between the ISPF panels (written in DTL) and the actual program logic (written in REXX) it was well over 1000 lines.  The other major coding efforts that I’ve done for z/VM cloning have probably been similarly complex to this project but had fewer actual lines of code.  Thinking about it, those other coding efforts drew on comparatively few external modules while this Perl script uses a ton of Perl modules; if you added the cumulative LOC of the modules along with my code, the effective LOC of this script would be even larger.

[3] The previous instanciation of this rig had the main IRC bot and another bot running in a CMS machine running REXX for doing stuff in DirMaint on z/VM.  The Perl bot and the REXX bot sent messages to each other over IRC as their IPC mechanism.  It was weird, but cute at the same time!  This time around I’m using SMAPI for the DirMaint interface, so no need for the REXX bot.

Tags: , , , , , , , , , , ,

Good Luck: a dream

This is a fictionalised retelling of a dream I had in the wee hours of this morning.  It flows basically the same as the action from the dream, but some of the thinking and interpretation occurred in the couple of hours after I woke up and still had the dream in my head. I also left out of the retelling some action from the dream before and after this main part; if you dream like me, you’ll know that often there’s a bunch of crap that happens, then the good bit, then it kind of trails of with more crap after the good bit.

I’ve created a new category on this blog, entitled Fiction, because I’ve been doing a bit of writing recently and I figured I perhaps should start to put some of it out there.  This is the first entry in this new category.

Anyway, here it is.  Please enjoy.

“Do you want to start something?” he shouted at me.
A minute or so before we had passed each other walking, and had even exchanged an almost cheerful greeting.  In that moment I don’t think I fully understood who he was though.  But I understood now.
“Do I want to start something?  Yes, in fact…  Let’s START something,” I shouted back.
We were in her, now their, front yard, and he was holding the end of a garden hose that, for some reason, ran along the lawn in an almost straight line right to where I was standing.  He waved the end of the hose, like you do when you’re trying to loosen a kink or dislodge it from under a tyre when you’re washing the car, but the wave didn’t reach me.  He threw down the hose and looked around; there was a pile of garden rakes near where I stood, and we both made to arm ourselves.
I reached the pile first.  The rakes were old, their wooden handles worn and weathered. The rake ends were the plastic fan-shape kind and pretty useless for the purpose they were about to be applied to.  An old metal rake head like Mum used to have at the old house, the kind that cartoon characters stand on and smack themselves in the face with the handle, would have been much more intimidating.  As I picked up one of those sad garden implements I thought to myself that this fight, whatever happened, was going to look pretty bloody ridiculous.
He was far enough away from me that by the time I’d picked up one of the rakes he’d only just managed to complete his short sprint to the pile.  He was enraged, his face contorted by a paroxysm of hate and possibly other emotions.  He bent down, grabbed a rake, and started swinging it randomly at me while yelling wordlessly.  A thought came to me, unbidden: why are you so angry at me?
After all, you’re the one playing Happy Families with my ex-wife.
After fending off a few of his swings I actually had to block one that was almost well aimed.  The handle of his rake snapped, and he stumbled forward with the unexpected change in momentum.  He froze, stooped over the pieces of the rake.  “Oh god,” I heard, “oh god,” and as he slowly straightened it looked for a moment like one of the pieces of the broken rake had impaled his left forearm, but he was actually just kind-of cradling it.  He moved gingerly, as if he thought it might change its mind and impale him after all if he moved too suddenly, or maybe he thought he might get a splinter.
Through rage-clenched teeth he said “I’ll get another…  I’ll break one for you so you can have two pieces…”
“You don’t want to do that.”
The voice was mine, but I only knew that because I had felt my mouth moving along with the sound of the words.  The words were spoken so calmly that I doubted it was me who had spoken: I simply didn’t think I was capable of speaking so serenely.
My hand was on his shoulder.  In an instant all of his rage was gone, replaced with something almost like, I don’t know, grief.
“It’s just that she’s…”
To be honest, I didn’t really hear what he said.  Something about how his kids are everything and that she is so good with them and they are so good together and she completes him and whatever.  He was justifying himself to me, for what reason I could only imagine.  Maybe it was guilt — she’d protested when we separated that she wasn’t having an affair, but that doesn’t preclude them having planned to offload me to be together.  Maybe he realised that I was the one man who had been where he was now, and instead of a threat I was suddenly a confidante.  Whatever it was had left him drained and deflated, and after finishing his speech he hung his head, silently weeping.
In that moment I knew how he felt, for she had once had that hold, that power, over me.  He was in her thrall.  There are some women with enough of that power to hold many men that way, but not her — I mean she’s a great girl, but not that great.  So when I saw that he was spellbound by her, I knew what it meant for me.
It meant I was free.
I took a step back, and after a moment he raised his head to look at me.  I held his gaze for a few seconds, then smiled a wry grin that I tried to make look more “I’m glad for you, no hard feelings” than “sucked in, dickhead”.
“Good luck,” I said.  Finally putting down the rake, I turned and walked away.  I had a life to get back to.

Tags: , , , , ,

Beginning Again

The last couple of weeks have seen a radically positive shift in my state of mind, ironically triggered by finding out that my ex-wife has her boyfriend moving in. I’ve realised that life is too short to live in the past (even though I wasn’t conscious of doing so) and that looking forward is the only way to go.  I’ve made the first few small steps to meeting people.  I actually had a mini-date last weekend, and I think I managed not to completely mess it up.

I watched a movie called “Begin Again” last weekend.  It stars Keira Knightley and Mark Ruffalo, and it tells a story of new beginnings in the face of what would seem to be the most trying circumstances.  For Knightley’s singer-songwriter character Gretta, it is moving on from being dumped by a rock star boyfriend; for Ruffalo’s record-producer character Dan it is trying to maintain relevance in an industry which he helped create but now seems to have changed beyond his recognition.  It might seem trite, but this film has touched me in ways I cannot count.  In one scene, after Gretta and Dan have convinced Dan’s daughter Violet to play on one of Gretta’s tracks, Violet finally overcomes her nerves and starts to play and…  well I don’t want to spoil, but it becomes a beautiful rendition of the pride of a father for his daughter.  Even though it’s not the kind of movie I would usually choose, I am profoundly grateful to myself that I did (thank you, iTunes $0.99 rental special) — the music alone is ringing in my mind like nothing I’ve heard in ages.

I’ve also been spending a lot of time in the last few years thinking that I was alone.  It’s taken a while, but I know now that nothing is further from the truth.  There are people around me who care.  Some more actively than others, but such is the nature of friends — all are important.  From those that actively seek you out and talk, to the barista at the coffee shop who still greets you by name even though you only come in once a month nowadays — all are important.

I feel like I should have felt after surviving my heart attack — I feel like I have won my life back.  I feel like I’m at a gala ceremony, like I’m in Grauman’s Chinese Theatre or the Crown Palladium, and they’ve just called my name as the recipient of “Most Unlikely Resurrection of 2015”, and everyone in the place is cheering and applauding.  Befitting such an august venue, it would be proper to make a speech…

I could not possibly have done this alone.  There are people around me who keep me sane, keep me grounded, yea verily who breathe life into me; I have let them go unrecognised for far too long.  I have to mention a few names, but if I don’t say yours don’t think that your place is so much less.

Peter T, you have a knack of drawing things out and getting to what really matters — you, sir, are truly a Man Among Men and I feel honoured to know you.  Leanne, you are a subtle dose of realism when the world seems devoid of reality; our workplace, and the lives of all around you, are richer that you are there.  Grant, Gav, Tex; your touch is more reserved but no less profound and I appreciate it no less.

There are people in far corners of the world who have also helped me tremendously.  Eduardo and Peter McC: I met the two of you in a New York summer in 2012 and despite me being old enough to be your father you let me tag along while you celebrated your youth in those Hudson Valley clubs; to this day, you include me still.  Gentlemen I thank you for the camaraderie you showed then and the fellowship you show now.  For all its negative aspects, I have words to say to anyone who doubts the power of social media to be a positive influence in people’s lives.

To my family, who I have scorned and made suffer while I let my circumstances overcome me, I don’t have words to describe my pain and regret.  I have not been there for you, not because I’d stopped loving or caring but because I didn’t think I was worthy of you.  I’m sorry.

My oldest and dearest friend (although it must not seem like it) is someone who has been in my life for longer than any other person with whom I don’t share blood.  You likely have no idea what a difference you make, Brad… but you’re like a line in my palm: unchanging, always there.

To my former wife: now I know you were right.  Now I understand you, now I forgive you… now I thank you.

Finally, the most precious people in my world are my two children.  They enrich me, they teach me, they drive me, they inspire me — all of which I wish I’d opened myself to sooner.  Their energy, their laughter, their pure spirit, all are infectious.

At the start of this I said that I could not have “done” this, like it was a job finished.  Far from it, I know that I’m still on a journey, and while I’m on a mountaintop right now I know there will likely be canyons and valleys ahead.  The thing about mountaintops though is that they are unforgettable — and I never thought I’d see this one.  So now that I know it’s possible to climb the mountain, I’m damned if I’m going to accept being at the bottom of a cave ever again.  Let’s go find another mountain.  Only taller.

 Thank you.  As you have all been there for me, know that I am there for you also.

UPDATE — 12 Dec 2015, 1205 AEST:

Eduardo made a really nice comment on Facebook (thanks!) that reminded me about how easy it can be, when you’re fighting your own personal dragons, to disregard the positive effect you can still have on others.  Even when you’re at the bottom of your own metaphorical cave, there might be something in your struggle that provides inspiration, or motivation, or hope, to another.  For me, that I have been even in some small way able to help others is an honour; that it has happened even while I fought my own issues is incredibly enriching.

Oh, and I added to the mountain/cave line in the “speech”…  and added to the thank-you at the end.

Tags: , , , , , ,


Among the coffee mugs in my cupboard at home is one I’ve had for over 20 years.  It was a gift; if I remember right, a semi-joke gift in an office “Secret Santa”.

"Works and plays well with others"

“Works and plays well with others”. O RLY?

The slogan on it reads “Works and plays well with others”, and it’s a reference to one of the standard phrases seen on children’s school report cards.  It’s one of the standard mugs in my hot beverage rotation, and every time I use it I can’t help but think back to when it was new, and of how much has changed since those days.

It’s easy to treat a silly slogan on a coffee mug as little more than just a few words designed to evoke a wry grin from a slightly antisocial co-worker.  Sometimes it can take on a deeper meaning, if you let it.

For the last 6 months or more I’ve been working on transferring the function of our former demonstration facility in Brisbane to a location in Melbourne.  This has been fraught with problems and delays, not the least of which was an intermittent network fault into the network our systems are connected to.  Steady-state things would be fine; I could have an IRC client connected to a server in our subnet for days at a time.  When I actually try to do anything else (SSH, HTTP, etc), within about 5 minutes all traffic to the subnet would stop for a few minutes.  When traffic would pass again, it would stay up for five or so minutes then fail.  Wash, rinse, repeat.

It looked like the problem you get when Path MTU Discovery (PMTUD) doesn’t work and you have an MTU mismatch[1].  I realised that we had a 1000BaseT network that was connected to a 100BaseT switch port, so went around all my systems and changed where I was trying to use jumbo frames, but that made no difference to the network dropouts.  I found Cisco references to problems with ARP caches filling, but I couldn’t imagine that the network was so big that MAC address learning would be a problem (and if general MAC learning was constrained, why no-one else was having a problem).

Everything I could think of was drawing blanks.  I approached the folks who run the network we uplink through, and all they said was “our network is fine”.  I was putting up with the problem, thinking that it was just something I was doing and that in time we would change over to a different uplink and we wouldn’t have to worry any more.  My frustration at having to move everything out of the wonderful environment we had in Brisbane down to Melbourne, with its non-functional network, multiplied every time an SSH connection failed.  I actually started to rationalise that it was pointless to continue with setting up the facility in Melbourne; I’d never be able to re-create what I’d built in Brisbane, it would never be as accessible and useful, and besides no-one other than me had ever made good use of the z Systems gear in the Brisbane lab anyway.  Basically, I had lost confidence in myself and my ability to get the network fixed and the Melbourne lab set up.

Confidence is a mental strength, like our muscles which provide our physical strength.  Just like muscle, confidence grows from active use and wastes if underused.  Chemicals can boost it, and trauma can damage it.  Importantly though, confidence can be a huge barrier to a person’s ability to “work and play well with others” — too little confidence and one lacks conviction and decision-making; too much confidence and they appear overbearing and dictatorial.

Last week I was in Singapore for the z/VM, Linux on z, and KVM “T3” event.  Whenever I go to something like this I get fired up by all of the things that I’d like to work on and have running to demo.  The motivation to get new things going in the lab overcame my pessimism about the network connection (and lack of confidence), and I got in touch with the intern in charge of the network we connect through.  All I need, I said, is to look and see what the configuration of the port we connect into looks like.  We agreed to get together when I returned from Singapore, and try to work out the problem.

We got into the meeting, and I went over the problem in terms of how we experience it — a steady state that could last for days, then activity leading to three-minute lockouts.  I asked if I could see the configuration of the port we attached to… after a little bit of discussion about which switch and port we might be on, a few lines of Cisco CatOS configuration statements appeared in our chat session.  Straight away I saw:

switchport port-security

W. T. F.

Within a few minutes I had Googled what this meant.  Sure enough, it told the switch to monitor the active MAC addresses on that port and disable the port if “unknown” MACs appear.  There were no configured MACs, so it just remembered the first one it saw.  It explained why I could have a session running to one system (the IRC server) for ages, and as soon as I connected to something else everything stopped — the default violation mode is “shutdown”.  It explained why the traffic would stay down for three minutes and then begin again — elsewhere in the switch configuration was this:

errdisable recovery cause psecure-violation 180

If the switch disabled a port due to port-security violation, it would be automatically recovered after 180 seconds.

The guys didn’t really understand what this all meant, but it made sense to me.  Encouraged by my confidence that this was indeed the problem, they gave me the passwords to log on to the switch and do what I thought was needed to remove the setting.  A couple of “no” commands later and it was gone… and our network link has functioned perfectly ever since.

The real mystery for the other network guys was: why has this suddenly become a problem?  None of them had changed the network port definition, so as far as anyone knew the port was always configured with Port Security.  The answer to this question is, in fact, on our side.  To z/VM and Linux on z Systems people, networks come in two modes: “Layer 3” or “IP” mode, where the system only deals with IP addresses, and “Layer 2” or “Ethernet” mode, where the system works with MAC addresses.  In Layer 3 mode, all the separate IP addresses that exist within Linux and z/VM systems actually exist behind the MAC address of the mainframe OSA card.  In Layer 2 mode however, each individual Linux guest or z/VM stack gets its own MAC address.  When we first set up this network link and the z/VM and Linux systems there the default operating mode was Layer 3, so the network switch only saw one or two MAC addresses.  Nowadays though the default mode is Layer 2.  When I built new systems for moving everything down from Brisbane, I built them in Layer 2 mode.  Suddenly the network switch port was seeing dozens of different MAC addresses where it used to only see one or two, and Port Security was being triggered constantly.

This has been a learning experience for me.  Usually I don’t have any trouble pointing out where I think a problem exists and how it needs to be fixed.  Deep down I knew the issue was outside our immediate network and yet this time for some reason I lacked the ability, motivation, nerve, or whatever, to chase the folks responsible and get it fixed.  The prospect of trying to work with a group of guys who, based on their previous comments, really strongly thought that their gear was not the problem, was so daunting that it became easier to think of reasons not to bother.  Maybe it’s because I didn’t know for certain that it wasn’t something on our side — there is another problem in our network that definitely is in our gear — so I kept looking for a problem on our side that wasn’t there.

For the want of a 15-minute phone meeting, we had endured months of a flaky network connection.

On this occasion it took me too long to become sufficiently confident to turn my thoughts into actions.  Once I got into action though, it was the confidence I displayed to the network team that got the problem fixed.   For me, lesson learned: sometimes I need a prod, but I am one who “works and plays well with others”.


[1] I get that all the time when using the Linux OpenVPN client to connect to this lab, and got into the habit of changing the MTU manually.  Tunnelblick on the Mac doesn’t suffer the same problem, because it has a clever MTU monitoring feature that keeps things going.

Tags: , , , , , , , ,

Simultaneous Multi-Threading at McDonalds

Keeping on the analogy theme…  This time, it’s an explanation of Simultaneous Multi-Threading (SMT).  SMT was introduced to the z Systems architecture with the z13, and many technical specialists (myself included) have struggled with the standard explanations of how SMT is meant to improve overall system performance.  Here’s yet another attempt at an explanation!  Some folks might be a bit affronted at the “compare and contrast” of z Systems and a fast food drive-through, but it’s just an analogy…

So in Brisbane just about every McDonalds has a drive-through.  They used to have a single lane, with the menu boards and a speaker for the operator inside the restaurant to take your order.  As the customer, once you placed your order you then would drive forward to the “first window” where the cashier would take payment, then you’d drive to the “next window” to receive your order and proceed away.  Apologies to anyone offended by me feeling the need to explain how a drive-through works, but I don’t know how they work in your part of the world so I’m just covering how they work in mine.

Many of these drive-throughs have been redeveloped to add a second lane and order taking station — complete with a second set of menu boards.  They didn’t duplicate anything else in the process though: same payment and collection windows, even in most cases a single cashier taking orders alternately from both stations.

A dual-lane McDonalds drive thru

A dual-lane McDonalds drive thru, AKA CPUs 0x00 and 0x01

Why did McDonalds do this?  Without duplicating anything else in the whole chain, what benefit does adding a queue provide?  If two cars arrive at the stations at the same time there’s going to be contention for the cashier.  They then have contention to enter the single lane going past the windows.  Not only that, the restaurant had to give up physical space to install the second station — perhaps they lost a few parking spaces, or a garden.

had passed through this kind of drive-through a few times, and never clearly saw the benefit.  Sometimes I’d drive up to the station with the shortest queue, only to be stuck behind someone who seemed to be ordering one of everything from the menu…  Other times I’d pull up to an empty station, next to the only other car in the system (at the other station), but because the car at the occupied station was already placing their order I still had to wait the same amount of time as I would have in a single-lane system.

Then I finally realised.  The multiple queues aren’t for me as a customer — they’re for the restaurant.  Specifically, they’re for the food production stations in back-of-house.  To understand why it makes sense to have two lanes, it’s critical to realise that the drive-through is not just the speakers and the lane and the windows, it’s the method by which instructions are given to the many and various individuals that make the food and package the orders.  Each of those individuals has their own role and contribution to the overall task of serving the customer; from the grillers to the fryers to the wrappers to the packers (sorry, I’ll bet there’s McDonalds formal names for each of the team member roles but I don’t know them).

Having multiple order stations means that the orders get to the burger makers and packers faster, making them more efficient and improving their throughput.  The beverage orders go to the little automatic drink-pouring machine instantly, so that everyone’s Cokes and Fantas are ready and waiting sooner.  One car wants a Chicken McWrap, the next just wants a McFlurry?  No contention there, those orders can be getting made at the same time.

Maybe you’re asking “so what does this have to do with SMT?”  Well, the order stations are our threads.  The cashiers and the packers are the fetch-and-store units, the parts of the processor that fetch instructions from memory and store the results back.  The cashier’s terminal is the instruction decode unit.  The food preparers in the back-of-house, they are the processor execution units; the integer units, the DFP and BFP, the SIMD unit, the CPACF, and more — that’s where the real work is done.  To a large extent all of those execution units operate independently, just like our McD food preparers.  SMT, like our two drive-through lanes, makes sure that all those execution units are as busy as possible.  One thread issues an integer add instruction, the other thread is doing a crypto hash using the CPACF?  They can be happening simultaneously.

We’ve been saying all along that SMT will likely decrease the perceived speed of an individual unit of work, but overall more work will get done across all units of work.  When I’ve been in a two-lane drive-through and placed my order, and then had to wait while I merged with the cars coming from the other lane, I have to agree that it seemed like the merging delayed me.  However, if that had been a single-lane drive-through, chances are I would have been in a longer queue of cars before even reaching the order station, and that metric isn’t even measured by the queue management built into McDonalds’ terminals.  Likewise, on a busy system without SMT, it’s difficult to say how long instructions are getting queued in the operating system scheduler before even making it to the processor to be dispatched.  Basically, I’m saying that we may see OS scheduler queuing reduce, and therefore improved “performance” at the OS level over and above the actual benefit of improved processor throughput, even if our SMT ratio doesn’t get anywhere near the impossible 2:1.

If ten cars line up at the two windows and they all want a Big Mac and a Hot Apple Pie then there’s probably not going to be much gain there.  Today’s McDonalds menu is quite diverse though, which means the chances of orders having “non-intersecting overlaps” are greatly improved.  On z Systems, ensuring a variety of workloads and transaction types would help to ensure a diversity in the instruction stream that would give SMT a good opportunity to yield benefit.  This means mixing applications as much as possible, and using Large Memory and CPU Pooling support in z/VM 6.3 to bring lots of different workloads into heterogenous LPARs.

I’ll bet that McDonalds worked out that simply adding an extra entry lane meant that they can move more food items in a given time — and McDonalds business is to sell food on a massive scale.  In the same way, the goal of z Systems has never been to make one single workload run as fast as possible, but to make the hundreds or thousands of workloads across an enterprise run at highest efficiency.

Analogy can be found anywhere

This post may come across as self-serving, semi-advertorial, promotional, or just plain crappy (or all of the above).  I don’t apologise, it’s my blog and I’ll write what I want to.  However, because it’s the Internet and it’s almost guaranteed that someone reading this will think I should have warned them… consider yourself warned, fair reader.

My recent post about experiencing things for the last time started me off on a somewhat interesting train of thought.  There I was, sitting on an aircraft that was being retired, which must happen fairly often around the world–after all we don’t see too many 707s or TriStars in the skies any more.  Qantas used to have a lot of 767s, and I picked up the inflight magazine to see the numbers today.

As at September 2014, Qantas had 6 Boeing 767s in their fleet (down from 13 at 30 June 2014, further down from 20 as at July 2013, according to the Qantas Data Book 2014).  Then I looked at the total fleet size: just over 200 aircraft in total (again, looking at the Data Book 2014, 203 as at 30 June 2014).  The numbers started wandering around in my head, and soon put me in mind of another piece of hardware requiring large investment, and just as close to my heart as jet aircraft — mainframe computers.

I started to do some research into the numbers I looked at in the flight magazine.  According to the registration data available from CASA, there is only one 767 in Australia (a 767-381F freighter) not registered to Qantas.  Therefore, during 2014, Qantas was the operator of the only dozen-odd Boeing 767 aircraft in Australia.  Thousands of people every day, travelling on an aircraft of which there was only a dozen working examples in the country–in fact, by the time I had my last 767 flight, I wonder how many of the September Six were left?  Maybe VH-OGO was the last in service by then…?

Okay, you might say, the B767 doesn’t count as it’s old and Qantas was retiring them.  Righto, point taken.  Lets look at what is the mainstay of domestic inter-capital air travel in Australia then–the B737.  Qantas lists 70 as at June 30 (57 owned and 13 leased) while Virgin Australia shows 74.  CASA lists some freighters and a half-dozen registered to “Nauru Air Corporation”, but lets stick to QF and VA (apart from a couple of B787s Jetstar’s fleet is all Airbus and much smaller than Qantas or Virgin).  The most widely-used commercial jet aircraft in the country, and there’s only 140-ish of them?  So what, you might say: they’re jet planes, of course there won’t be many.

The numbers continue: again as at 30 June 2014 the total number of Boeing 747s and Airbus A380s and A330s in the Qantas fleet was 36 aircraft, and by now some of the B747s have been retired.  Think about that for a moment: Qantas is able to service all of its international routes, including covering maintenance intervals, using less than forty aircraft?  It’s not like Qantas has a small network… yes they extend their reach through alliances and codeshare just like all airlines do, but Qantas services Los Angeles direct from Brisbane, Sydney, and Melbourne, daily (you’d have to think that’s at least six planes by itself) as well as daily flights into cities across Asia and the few routes into Europe that haven’t been taken over by Emirates.  Three-dozen planes seems light…

A popular criticism of mainframes (once you get past the “old, room-sized, punch card” nonsense) is that there aren’t many of them.  Apparently if it was such a good system everyone would use it, and the fact that not many companies do is proof that it isn’t.  Also, apparently it’s risky to use a system that comparatively few other businesses use.

Imagine for a moment if airlines around the world started subscribing to the same kind of thinking that seems to have taken hold in IT:

Operations Manager: It’s too risky for us to use these large, expensive aircraft.  We don’t have enough of them to justify training pilots to operate them, and it costs a fortune when we have to service one.  Plus, did you know each one costs $100million?

C-suite: The last OM said these aircraft are the best fit for our operations, that we get value in return for the cost.  Are you saying there’s an alternative?

OM: You bet!  Did you know we can buy hundreds of light aircraft for what it costs to buy one jet?

C-suite: Really?  Sounds complicated…

OM: No way!  It’s simple, light aircraft are much less complicated to operate and maintain, and it’s much cheaper and easier to get pilots that know how to fly them.

C-suite: I’ve seen a light aircraft, they’re… small.  Won’t we need more of them to carry the load of our jets?

OM:  Maybe… ah but it won’t be that bad: how often are we running those big jets half-empty anyway?

C-suite: Hmm…  I assume you’ve done some projections?

OM: Yes, the acquisition cost of a fleet of light aircraft is a fraction of that of a fleet of jets!

C-suite: Acquisition cost…  I seem to recall that we should be worried about more than cost of acquisition…

OM:  Did I mention the acquisition cost of a fleet of light aircraft is a fraction of that of a fleet of jets?

C-suite: I guess that was all!  Okay, sounds like a great plan!

It seems ludicrous, and would never happen in real life.  Outside aviation, imagine a similar scenario with a transport company replacing B-doubles with postie bikes, or an energy company replacing wired electricity distribution with boxes of AA batteries sent to homes.  For some reason though it’s not farfetched in IT, and yet over the years conversations like that have happened in too many companies.

There aren’t many Boeing 737s in Australia, but that isn’t stopping Qantas and Virgin (and airlines around the world) from using equipment that is fit for purpose.  Why should mainframes be different?

The last time

I had an unexpectedly emotional departure from Brisbane last week. It was supposed to be a standard flight to Sydney, but became something a lot more.

When I first started travelling by air, flights to Melbourne were on 737s and to Sydney were 767s. I guess you knew you were going to the “big smoke” when you were on the really big plane (sorry Melbourne, you know I love you). As flight schedules changed Sydney started getting serviced by more 737 flights, but you could often still find yourself on a 767 depending on the time of day, etc. I would seek out the 767 flights, sometimes just for the sake of a change of scene from a 737.

Of course there was another, real, reason why there were fewer 767 flights on Qantas, but I was oblivious to that… until last Monday.

The original flight I was booked on got cancelled. I had a moment of disappointment that I wouldn’t be on the 767 flight I’d planned, but since Qantas has phoned me well in advance and sorted me onto the next flight I couldn’t fault the situation (it gave me some extra time before flying).

When I got my seat assignment I realised I was going to be on a 767 after all. I wondered if they simply pushed the plane from my original flight back to the time of the later flight. Anyway it didn’t matter, I was happy to get my ride on the Seven-Six.

While we were taxiing for takeoff, the captain made an announcement. He did the usual welcome, and then said something remarkable — I don’t remember the exact words, but I’ll paraphrase…

Thank you for your patience, I know some of you were booked on the flight that was due to go before this one, but that aircraft became unavailable. Luckily the airline had the option to use a bigger aircraft to carry the load of the two flights. It’s an option that we won’t have very soon, as this aircraft is due to be retired on the 27th of December. So unless you’re going to be with us again very soon, this could be your last flight on a Boeing 767.

The captain then went on to advise us our departure procedure, but honestly I wasn’t listening. I started looking around me, trying to soak up as much of the environment as I could. Then I thought to myself “it’s just a plane”, but no, it was more…

When do we ever get the chance to know that we’re doing something for the last time? I’m not talking about the extraordinary things, the once-in-a-lifetime things that you know right then you’re never likely to do again. I mean things that are a part of your life, things that… things that until they are gone you do not think you’d miss… or the things that you know damn well you’d miss if they weren’t there, but you just can’t imagine anything could possibly cause them to be gone…


Upon disembarking in Sydney

Even as I write this, days later, I’m choking up.

When we arrived in Sydney I took a photo of my last 767, VH-OGO, which I saw again a couple of days later while I was waiting for my return flight to Brisbane.

Thanks to a captain who knows that there are still people out there who think that flying is more than just the cheapest seat, I got to know in advance that a chapter in my own personal logbook is ending… and as if I needed one, I also got another reminder that nothing should ever be taken for granted.


A couple of days later… OGO probably about to make a return trip to Brisbane








I started writing this on 7 December, the day that VH-OJA, Qantas’ first ever Boeing 747-400, was scheduled to make its last commercial flight as QF107 From Sydney to Los Angeles. Not only was OJA the first Qantas 747-400, it was the aircraft that set the stage for the “Kangaroo Route” by making a promotional flight non-stop from London to Sydney (a record-breaking run, and the record still stands). I wonder how many times I’ve flown on that plane, never knowing its history. I hope the people on that flight got an announcement similar to the one I got on my 767 flight.

iOS8 and OS X Yosemite

A week or so ago I succumbed to the hype (and the nagging from my devices) and installed iOS 8 on a second iPad.  As far as updates go it was smooth although the post-install setup wizard crashed before it could ask me about things like iCloud Drive, which made me wonder whether I might be due for later problems.  For the most part I was proving immune to the “this feature only works with Yosemite” bait but I knew it was probably just a matter of time…

Call it serendipity, call it fate, call it whatever you will… but yesterday I was looking at my OS X desktop and thought “y’know, I’m a bit tired of that Apple font”.  You can probably imagine my wry grin when I surfed to Apple’s OS X Yosemite preview pages to find that one of the key features of the “new design” is a very clean replacement for the old Finder font!  So that, along with the nagging of the devices… and in the spirit of “better late than never”, I decided to join the beta of OS X Yosemite.

Signing up was incredibly easy and well integrated into the App Store.  It only took a login and a couple of clicks and Yosemite was being poured into my MacBook.  I took the opportunity during the download to make sure that my Time Machine backup was up to date, and let it do its thing.  Around 20 minutes later it was finished.  One weird thing I found though was that during the installation — while the big grey X was on the screen, and the progress bar was still counting down — my other iOS devices started squawking that a MacBook had “logged on to FaceTime”.  I even heard VoiceOver alerts from the machine itself, complaining about things in my auto-start that weren’t set up correctly, despite the OS X Installation progress bar reporting 7 minutes to go!  I guess I’m used to the installer for an OS being a different environment entirely from the running system, not just a wizard running on top of a user logon.

While I was poking around things in Yosemite, the iOS 8.0.2 update was released… and was duly applied to the old iPhone 4S and the main iPad.  I am concerned about battery life on the phone — for example the Facebook app seems to take 1% out of the battery every minute it’s running — but in honesty I was having battery issues while still on iOS 7.  I think it’s to do with the age of the device, but at this stage the best I can say is that iOS 8 doesn’t seem to be that much worse than iOS 7 for me, plus of course I get the benefit now of being able to see battery usage by app.

It hasn’t even been 24 hours in Yosemite yet, but I’m impressed.  The update to the look and feel of the OS X desktop is well overdue (although we still can only choose Blue or Graphite for Appearance?).  I really like the iOS integration features of Yosemite, but haven’t had a chance yet to see them in action.  I have to say though, at least for this Little Black Duck™, Yosemite and iOS 8 have reinvigorated my interest in the Apple ecosystem.  I mean I like the iDevices, but the “wow” of some of the Apple tech had faded for me in recent times…  If features like Handoff and the call and message integration actually work as designed, this could put Apple back into the lead position when it comes to “devices designed to work together”.

Tags: , , , ,

I’m back

So I messed up.  I let the domain name registration for veejoe-dot-net expire, which was bad enough, but it turns out that even a low-visibility low-traffic domain like mine can still end up getting heisted by domain squatters.  Which is why it lives in Czechoslovakia now.

The other thing that I had been trying to find time for was a transition that my virtual server provider, Crucial Cloud Hosting, wanted me to complete.  They want to move off older Xen-based servers, one of which my VPS (virtual private server) was hosted on.  They carved me up a new VPS on different hypervisor technology, and have been extremely patient in allowing me the time to migrate from the old to the new.

The intersection of these two events seemed like the perfect opportunity to dust off the domain, which I’ve had for ages but never did anything with.  I had a somewhat current backup of the blog that was being automatically cut (by a plugin called BackWPUp), and it was a fairly straightforward matter to restore that and manually copy the webserver config from the existing VPS.

In very quick time I had the new VPS listening at the new domain name and the blog back online.  So here I am!

Tags: , ,

I lost my Fitbit… and found it

I have settled into a somewhat sedentary lifestyle.  My partner tries valiantly to get me involved in her personal training sessions, but I have a lot of inertia.  I know that I need to do something about being more active and increasing my fitness level, but have struggled to find a motivator.

While in Europe I succumbed to a bit of techno-craziness and bought a Fitbit One.  (The craziness wasn’t buying a Fitbit, it was where I bought it—the Apple Store in the Odysseum in Montpellier—and the resulting price I paid compared to if I’d waited and bought it at home, even from an Apple Store.)  I was enjoying the novelty of tracking activity, counting steps and calories, entering water consumption, and monitoring sleep.  I wore it almost constantly through France, in Amsterdam, and on the way back to Australia, thinking I might have finally found a way to motivate myself to exercise—that’s right: the path to a healthier life through good-old 21st century gamification!

I drove up to Brisbane a week ago for lunch with some work colleagues before picking up my kids; of course, the Fitbit was with me all the way.  The only problem was, my leather belt is too thick for the Fitbit’s clip so I instead clipped it into the coin pocket of my jeans.  It’s not so secure, and the Fitbit slid back and forth along the rim of the pocket, but I figured the seam along the edge of the pocket was thick enough to prevent the Fitbit from coming loose.

Almost over the jet-lag from coming back from Europe, I prepared for bed that evening looking forward to wearing the Fitbit to monitor my sleep—only the Fitbit was nowhere to be found.  Not on the jeans, not anywhere visible.  I decided that my method of clipping the Fitbit into the coin pocket was not so secure after all, and it had come loose during the day.

The next day I did the usual “retrace your steps, check behind the couch, blah blah” routine but still came up blank.  During Sunday however, for some reason I decided to start up the Fitbit app on my phone… and was rewarded with a message telling me it was “Syncing”!  I looked around where I was sitting, but still couldn’t find it.  By this time I had convinced myself it really was gone, and the sync message was the app on the phone syncing with the web site.

It got the better of me again today however.  I started the app again, and again was told it was “Syncing”.  I went to the “Devices” list, and sure enough beside my One it said it had synced just then.  Knowing that it had been over a week since I had last seen it, and that the battery was good but it wouldn’t last forever, I decided to pull out all the stops to locate it.

The BTLExplorer screen as it detects my Fitbit One.

The BTLExplorer screen as it detects my Fitbit One.

I figured there had to be an app similar to those I’d seen for scanning Wi-Fi and Bonjour but for Bluetooth, but searching for “bluetooth locator”, “bluetooth search”, and so on led to nothing helpful—there is a growing number of apps that help you search for headsets or objects to which you’ve attached a Bluetooth Low Energy (BLE) tag, but I couldn’t find anything that did a simple scan of Bluetooth devices in range.

I turned to Google at that point, and decided to search for “locate lost fitbit bluetooth”.  The second item in the results was this blog post, which turned up a free app called BTLExplorer.  I installed it, ran it, and straight away it detected my Fitbit!

What followed was an ultra-modern version of “Marco Polo” or “Hot or Cold”.  I wandered around the house watching the indicated signal strength rising and falling, trying to get closer to where it was hiding.  Eventually, I found the room where the strength was intermittently rising above -60dBm, and sure enough, under a cushion, was my Fitbit One!

Now I can resume the monitoring of my activity levels.  In addition, my fruitless searching of the Apple App Store has made me realise that the App Store app on the iPhone is pretty useless for searching for apps: turns out there are a few other apps similar to BTLExplorer, but because I didn’t search for “bluetooth scanner” or “bluetooth explorer” I didn’t find them.

So far I’m pretty impressed with the Fitbit technology, even though it’s not that much more than a fancy pedometer.  While the device is pretty cool most of the intelligence of the system is in the app and the website, which analyse and interpret the data gathered by the device itself.  It is pretty nicely integrated: the device itself gets the movement data and syncs to the phone, which you can use to do basic display of the data while entering additional data like weight measurements and food and water consumption; the phone app syncs all that data to the website which does additional analysis and provides more of the social aspects of the system.

I’ll report back on how the Fitbit and its application environment helps me with my health transformation!

Tags: , ,