Posts Tagged network


Among the coffee mugs in my cupboard at home is one I’ve had for over 20 years.  It was a gift; if I remember right, a semi-joke gift in an office “Secret Santa”.

"Works and plays well with others"

“Works and plays well with others”. O RLY?

The slogan on it reads “Works and plays well with others”, and it’s a reference to one of the standard phrases seen on children’s school report cards.  It’s one of the standard mugs in my hot beverage rotation, and every time I use it I can’t help but think back to when it was new, and of how much has changed since those days.

It’s easy to treat a silly slogan on a coffee mug as little more than just a few words designed to evoke a wry grin from a slightly antisocial co-worker.  Sometimes it can take on a deeper meaning, if you let it.

For the last 6 months or more I’ve been working on transferring the function of our former demonstration facility in Brisbane to a location in Melbourne.  This has been fraught with problems and delays, not the least of which was an intermittent network fault into the network our systems are connected to.  Steady-state things would be fine; I could have an IRC client connected to a server in our subnet for days at a time.  When I actually try to do anything else (SSH, HTTP, etc), within about 5 minutes all traffic to the subnet would stop for a few minutes.  When traffic would pass again, it would stay up for five or so minutes then fail.  Wash, rinse, repeat.

It looked like the problem you get when Path MTU Discovery (PMTUD) doesn’t work and you have an MTU mismatch[1].  I realised that we had a 1000BaseT network that was connected to a 100BaseT switch port, so went around all my systems and changed where I was trying to use jumbo frames, but that made no difference to the network dropouts.  I found Cisco references to problems with ARP caches filling, but I couldn’t imagine that the network was so big that MAC address learning would be a problem (and if general MAC learning was constrained, why no-one else was having a problem).

Everything I could think of was drawing blanks.  I approached the folks who run the network we uplink through, and all they said was “our network is fine”.  I was putting up with the problem, thinking that it was just something I was doing and that in time we would change over to a different uplink and we wouldn’t have to worry any more.  My frustration at having to move everything out of the wonderful environment we had in Brisbane down to Melbourne, with its non-functional network, multiplied every time an SSH connection failed.  I actually started to rationalise that it was pointless to continue with setting up the facility in Melbourne; I’d never be able to re-create what I’d built in Brisbane, it would never be as accessible and useful, and besides no-one other than me had ever made good use of the z Systems gear in the Brisbane lab anyway.  Basically, I had lost confidence in myself and my ability to get the network fixed and the Melbourne lab set up.

Confidence is a mental strength, like our muscles which provide our physical strength.  Just like muscle, confidence grows from active use and wastes if underused.  Chemicals can boost it, and trauma can damage it.  Importantly though, confidence can be a huge barrier to a person’s ability to “work and play well with others” — too little confidence and one lacks conviction and decision-making; too much confidence and they appear overbearing and dictatorial.

Last week I was in Singapore for the z/VM, Linux on z, and KVM “T3” event.  Whenever I go to something like this I get fired up by all of the things that I’d like to work on and have running to demo.  The motivation to get new things going in the lab overcame my pessimism about the network connection (and lack of confidence), and I got in touch with the intern in charge of the network we connect through.  All I need, I said, is to look and see what the configuration of the port we connect into looks like.  We agreed to get together when I returned from Singapore, and try to work out the problem.

We got into the meeting, and I went over the problem in terms of how we experience it — a steady state that could last for days, then activity leading to three-minute lockouts.  I asked if I could see the configuration of the port we attached to… after a little bit of discussion about which switch and port we might be on, a few lines of Cisco CatOS configuration statements appeared in our chat session.  Straight away I saw:

switchport port-security

W. T. F.

Within a few minutes I had Googled what this meant.  Sure enough, it told the switch to monitor the active MAC addresses on that port and disable the port if “unknown” MACs appear.  There were no configured MACs, so it just remembered the first one it saw.  It explained why I could have a session running to one system (the IRC server) for ages, and as soon as I connected to something else everything stopped — the default violation mode is “shutdown”.  It explained why the traffic would stay down for three minutes and then begin again — elsewhere in the switch configuration was this:

errdisable recovery cause psecure-violation 180

If the switch disabled a port due to port-security violation, it would be automatically recovered after 180 seconds.

The guys didn’t really understand what this all meant, but it made sense to me.  Encouraged by my confidence that this was indeed the problem, they gave me the passwords to log on to the switch and do what I thought was needed to remove the setting.  A couple of “no” commands later and it was gone… and our network link has functioned perfectly ever since.

The real mystery for the other network guys was: why has this suddenly become a problem?  None of them had changed the network port definition, so as far as anyone knew the port was always configured with Port Security.  The answer to this question is, in fact, on our side.  To z/VM and Linux on z Systems people, networks come in two modes: “Layer 3” or “IP” mode, where the system only deals with IP addresses, and “Layer 2” or “Ethernet” mode, where the system works with MAC addresses.  In Layer 3 mode, all the separate IP addresses that exist within Linux and z/VM systems actually exist behind the MAC address of the mainframe OSA card.  In Layer 2 mode however, each individual Linux guest or z/VM stack gets its own MAC address.  When we first set up this network link and the z/VM and Linux systems there the default operating mode was Layer 3, so the network switch only saw one or two MAC addresses.  Nowadays though the default mode is Layer 2.  When I built new systems for moving everything down from Brisbane, I built them in Layer 2 mode.  Suddenly the network switch port was seeing dozens of different MAC addresses where it used to only see one or two, and Port Security was being triggered constantly.

This has been a learning experience for me.  Usually I don’t have any trouble pointing out where I think a problem exists and how it needs to be fixed.  Deep down I knew the issue was outside our immediate network and yet this time for some reason I lacked the ability, motivation, nerve, or whatever, to chase the folks responsible and get it fixed.  The prospect of trying to work with a group of guys who, based on their previous comments, really strongly thought that their gear was not the problem, was so daunting that it became easier to think of reasons not to bother.  Maybe it’s because I didn’t know for certain that it wasn’t something on our side — there is another problem in our network that definitely is in our gear — so I kept looking for a problem on our side that wasn’t there.

For the want of a 15-minute phone meeting, we had endured months of a flaky network connection.

On this occasion it took me too long to become sufficiently confident to turn my thoughts into actions.  Once I got into action though, it was the confidence I displayed to the network team that got the problem fixed.   For me, lesson learned: sometimes I need a prod, but I am one who “works and plays well with others”.


[1] I get that all the time when using the Linux OpenVPN client to connect to this lab, and got into the habit of changing the MTU manually.  Tunnelblick on the Mac doesn’t suffer the same problem, because it has a clever MTU monitoring feature that keeps things going.

Tags: , , , , , , , ,

Another IPv6 instalment (subtitled: Watch Your Tech Library Currency!)

I made a somewhat cryptic tweet a little while ago about how I spent a crazy-long period of time researching what was, I believed, the next-big-thing in DNS resolution for IPv6 (or so my 2002 edition of “IPv6 Essentials” told me).  I could not work out why I saw nothing about A6 records in any of the excellent Hurricane Electric IPv6 material or in any other documentation I came across.

The answer should have been obvious: DNS A6 records (and the corresponding DNAME records) never caught on.  RFC 3363 recommended that the RFC that defined A6 and DNAME (RFC 2874) be moved back into Experimental status.  If I hadn’t been using an old edition of the IPv6 book, I might never have even known the existence of A6 and not have wasted any time.

In my previous post on IPv6 I theorised that we are in the early-adoption phase of IPv6 where things aren’t quite baked, and yet now I’ve picked up a 9 year old text on the topic and acted all surprised when it got something wrong.  It was a bit stupid of me; had I bought a book about IPv4 in 1976, might it have been similarly out of date in 1985?

As always though I’m richer for the experience!  Or so I thought…  Like many, I’m becoming increasingly time-poor.  When I bought a book on IPv6 some years ago I thought I was making an investment, but it turned out that my investment actually lost for me in several ways:

  1. The book took up physical space in my bookshelf for all that time I wasn’t using it
  2. I didn’t actually use the information at the time I acquired it
  3. The time I could have got value from it was wasted by it idly sitting on the shelf
  4. Once I did try to use it, it actually cost me time rather than saved time

I came to think about the other books on my shelf.  It’s pretty easy to recognise that a book that proclaims to be up-to-date because it “Now covers Red Hat 5.2!” will be anything but.  Also, from the preface of a Perl programming book that says “this was written about Perl 5.8, but it should apply to 5.10 as well” I’ll be forewarned that things will be fairly applicable to 5.12 but maybe not to Perl 6 when it’s out.

Technology usually has a somewhat abbreviated lifespan, so therefore the corresponding documentation will have a lifespan correspondingly short…  Here, however, is an example of a technology that will have a far greater lifespan (we hope) than much of the documentation that currently exists around it.  I emphasise “currently exists”, because it won’t always be that way: IPv4 was pretty well-baked by the time I had anything to do with it, so I could have bought a book on IPv4 with next to no concern that it was going to lead me astray (indeed, I bought W. Rich Stevens’ TCP/IP programming texts during the 1990s, and still use them to this day).  I keep forgetting that I’m on a completely different point of the IPv6 adoption curve, and the “experts” are learning along with me.

So, a new tech library plan then:

  • Reduce dependence on physical books (okay, this one is already a work-in-progress for me) — they don’t come with you on your travels as easily, and (more important in this context) they’re harder to keep up to date.
  • Before regarding the book on the shelf as authoritative, check its publication date.  If it’s more than three years old, depending on the subject matter it might be out of date.  Check if there’s a new edition available, and consider updating.  If there’s no new edition, check for recent reviews (Amazon, etc).  Someone who just bought it last month might have posted an opinion on its currency.
  • If you have to buy a paper book, don’t buy a book on any technology that is a moving target.  On the same shelf as my copy of “IPv6 Essentials” there is a book entitled “Practical VoIP Using VOCAL”.  I never even installed VOCAL, and I’m sure many current VoIP practitioners never heard of it.  (Side note: I think it’s strange that I bought that book, and a Cisco one, but still to this day have never owned a book on Asterisk.  Maybe I have some kind of inability to pick the right nascent-technology book to buy.)
  • Use bookmarking technology more! I have a Delicious account, and I went through a phase of bookmarking everything there.  I realise now that, if I was a bit more disciplined, I could actually use it (or a system like it, depending on what Yahoo! does to it) as my own personal index to the biggest tech library in existence: the Internet.

That first point is harder than it sounds (especially for someone like me who has a couple of books on his shelf with his name on the cover).  My Rich Stevens books are littered with sticky-note bookmarks for when I flick to-and-fro between different programming examples.  Electronic readers are still not there when it comes to the “handy-hints-I-keep-on-my-lap-while-coding” aspect of book ownership.

I have a Sony Reader which I purchased with the intent of making it my mobile tech library.  It’s just not that great for tech documents though, since it doesn’t render diagrams and illustrations well (it also isn’t ideal for PDFs, especially in A4 ratio).  This may change as publishers of tech docs start releasing more titles on e-reader formats like ePub.  The iPad is working much better for tech library tasks; I’m using an app called GoodReader which renders PDFs (especially RedBooks!) quite well and has good browsing and syncing capability as well.

More on these topics later, I’m sure!

Update: I omitted another option in my “tech library plan” — since IPv6 Essentials is an O’Reilly book, I could have registered with their site to get offers on updating to new editions.  Had I done so, the events of this post might not have happened!  Now that I’ve registered my books with O’Reilly, I’m getting offers of 40% off new paper editions and 50% off e-book editions.  Also, in line with my reduce-paper-book-dependence policy, I can “upgrade” any of the titles I own in paper to e-book for US$4.99.  If you haven’t already, I encourage anyone who has O’Reilly books that they rely on as part of their tech library to register them at  (This is an unsolicited endorsement from a happy customer, nothing more!)

Tags: , , , , ,


Two of the four keynotes at LCA 2011 referenced the depletion of the IPv4 address space (and I reckon if I looked back through the other two I could find some reference in them as well).  I think there’s a good chance Geoff Huston was lobbying his APNIC colleagues to lodge the “final request” (for the two /8s that triggered the final allocation of the remaining 5, officially exhausting IANA) a week earlier than they did, as it would have made the message of his LCA keynote a bit stronger.  Not that it was a soft message: we went from Vint Cerf the day before, who said “I’m the guy who said that a 32-bit address would be enough, so, sorry ’bout that”, to Geoff Huston saying “Vint Cerf is a professional optimist.  I’m not.”.  But I digress…

I did a bit of playing with IPv6 over the years, but it was too early and too broken when I did (by “too broken” I refer to the immaturity of dual-stack implementations and the lack of anything actually reachable on the IPv6 net).  However, with the bell of IPv4 exhaustion tolling, I had another go.

Freenet6, who now goes alternatively as gogonet or gogo6, was my first point-of-call.  I had looked at Gogo6 most recently, and still had an account.  It was just a matter of deciding whether or not I needed to make a new account (hint: I did) and reconfiguring the gw6c process on my router box.  Easy-as, I had a tunnel — better still, my IPv6-capable systems on the LAN also had connectivity thanks to radvd.  From Firefox (and Safari, and Chrome) on the Mac I could score both 10/10 scores on!

My joy was short-lived, however.  gw6c was proving to be about as stable as a one-legged tripod, and not only that Gogo6 had changed the address range they allocated me.  That wouldn’t be too bad, except that all my IPv6-capable systems still had the old address and were trying to use that — looks like IPv6 auto-configuration doesn’t un-configure an address that’s no longer valid (at least by default).  I started to look for possible alternatives.

Like many who’ve looked at IPv6 I had come across Hurricane Electric — in the countdown to IPv4 exhaustion I used their iOS app “ByeBye v4”.  They offer free v6-over-v4 tunneling, and the configuration in Gentoo is very simple.  I also get a static allocation of an IPv6 address range that I can see in the web interface.  The only downside I can see is that I had to nominate which of their locations I wanted to terminate my tunnel; they have no presence in Australia, the geographically-nearest location being Singapore.  I went for Los Angeles, thinking that would probably be closest network-wise.  The performance has been quite good, and it has been quite reliable (although I do need to set up some kind of monitoring over the link, since everything that can talk IPv6 is now doing so).

In typical style, after I’d set up a stable tunnel and got everything working, I decided to learn more about what I’d done.  What is IPv6 anyways?  Is there substance to the anecdotes flying around that are saying that “every blade of grass on the planet can have an IPv6 address” and similar?  Well, a 128-bit address provides for an enormous range of addresses.  The ZFS guys are on the same track — ZFS uses 128-bit counters for blocks and inodes, and there have been ridiculous statements made about how much data could theoretically be stored in a filesystem that uses 128-bit block counters.  To quote the Hitchhiker’s Guide to the Galaxy:

Space is big. Really big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist’s, but that’s just peanuts to space.

The Guide, The Hitchhiker’s Guide To The Galaxy, Douglas Adams, Pan Books 1979

Substitute IPv6 (or ZFS) for space.  To try and put into context just how big the IPv6 address range is, let’s use an example: the smallest common subnetwork.

When IPv4 was first developed, there were three address classes, named, somewhat unimaginatively, A B and C.  Class A was all the networks from 1.x.x.x to 127.x.x.x, and each had about 16 million addresses.  Class B was all the networks from 128.0.x.x to 191.255.x.x, each network with 65 534 usable addresses.  Class C went from 192.0.0.x to 223.255.255.x, and each had 254 usable addresses.  Other areas, such as 0.x.x.x and the networks after 224.x.x.x, have been reserved.  So, in the early days, the smallest network of hosts you could have was a network of 254 hosts.  After a while IP introduced something called Classless Inter-Domain Routing (CIDR) which meant that the fixed boundaries of the classes were eliminated and it became possible to “subnet” or “supernet” networks — divide or combine the networks to make networks that were just the right size for the number of hosts in the network (and, with careful planning, could be grown or shrunk as plans changed).  With CIDR, since the size of the network was now variable, addresses had to be written with the subnet mask — a format known as “CIDR notation” came into use, where an address would have the number of bits written after the address like this:

Fast-forward to today, with IPv6…  IPv4’s CIDR notation is used in IPv6 (mostly because the masks are so huge).  In IPv6, the smallest network that can be allocated is what is called a “/64”.  This means that out of the total 128-bit address range, 64 bits represent what network the address belongs to.  Let’s think about that for a second.  There are 32 bits in an IPv4 address — that means that the entire IPv4 Internet would fit in an IPv6 network with a /96 mask (128-32=96).  But the default smallest IPv6 subnet is /64 — the size of the existing IPv4 Internet squared!

Wait a second though, it gets better…  When I got my account with Gogo6, they offered me up to a /56 mask — that’s a range that covers 256 /64s, or 256 Internet-squareds!  Better still, the Hurricane Electric tunnel-broker account gave me one /64 and one /48Sixty-five thousand networks, each the size of the IPv4 Internet squared! And how much did I pay for any of these allocations?  Nothing!

I can’t help but think that folks are repeating similar mistakes from the early days of IPv4.  A seemingly limitless address range (Vint said that 32 bits would be enough, right?) was given away in vast chunks.  In the early days of IPv4 we had networks with two or three hosts on them using up a Class C because of the limitations of addressing — in IPv6 we have LANs of maybe no more than a hundred or so machines taking up an entire /64 because of the way we designed auto-configuration.  IPv6 implementations now will be characterised not by how well their dual-stack implementations work, or how much more secure transactions have become thanks to the elimination of NAT, but by how much of the addressable range they are wasting.  So, is IPv6 just Same Sh*t, Different Millennium?

Like the early days of IPv4 though, things will surely change as IPv6 matures.  I guess I’m just hoping that the folks in charge are thinking about it, and not just high on the amount of space they have to play with now.  Because one day all those blades of grass will want their IP addresses, and the Internet had better be ready.

Update 16 May 2011: I just listened to Episode 297 of the Security Now program…  Steve Gibson relates some of his experience getting IPv6 allocation from his upstream providers (he says he got a /48).  In describing how much address space that is, he made the same point (about the “wasteful” allocation of IPv6).  At about 44:51, he starts talking about the current “sky is falling” attitude regarding IPv4, and states “you’d think, maybe they’d learn the lesson, and be a little more parsimonious with these IPs…”.  He goes on to give the impression that the 128-bit range of IPv6 is so big that there’s just no need to worry about it.  I hope you’re right, Steve!

Tags: , , , , , ,

Sharing an OSA port in Layer 2 mode

I posted on my developerWorks blog about an experience I had sharing an OSA port in Layer 2 mode.  Thrilling stuff.  What’s more thrilling is the context of where I had my OSA-port-sharing experience: my large-scale Linux on System z cloning experiment.  One of these days I’ll get around to writing that up.

Tags: , , ,

Network virtualisation

I’ve been doing a lot of mucking around with KVM with libvirt (I keep promising an update here, don’t I).  In my desktop virtualisation requirements I had a need for presenting VLAN traffic to guests: simple enough, and I’ve done it before.  You can do what I usually do, and configure all your VLANs against the physical interface then create a bridge for each VLAN you want to present to a guest.  The guest then attaches to the bridge appropriate to the VLAN it wants access to, with no need to configure 8021q.

(The other method of combining VLAN-tagging and bridging is to bridge the physical interface first, then create VLANs on the bridge.  I couldn’t work out how to get VLAN-unaware guests attached to this kind of setup, and it didn’t work for me even to give IP access to the host using a br0.100 for example.  Still, it must work for someone as it’s written about a lot…)

I realised that from particular virtual machines I needed to get access to the VLAN tags — I needed VLAN-awareness.  Now I knew up-front that the way I could do this was to just throw another NIC into the machine and either dedicate it to the virtual guest or set up a bridge with VLAN tags intact.  I really wanted to exhaust all possible avenues to solve the problem without throwing hardware around (as I’ve been doing a bit of that recently, I have to admit).

First, I tried to use standard Linux bridges as a solution, but discovered that an interface can’t belong to more than one bridge at a time, which put paid to my plan to have one or more VLAN-untagging bridges and a VLAN-tagged bridge.  I figured it could be done with bridges, but I envisaged a stacked mess of bridge-to-tap-to-bridge-to-tap-to-guest connections and decided that wasn’t the way to go.

Next I checked out VDE, which I had first seen a couple of years ago — but something gave me the impression that VDE either wasn’t really going to give me anything more than bridging would, or was not flexible enough to do what I needed.  I like the distributed aspect of VDE (the D in the name) but I’d rarely use that capability so it wasn’t a big drawcard.  I widened my search, and found two interesting projects — one that eventually became my solution, and another that I think is quite incredible in its scope and capability.

First, the amazing one: ns-3, “a great network simulator for research and education”.  As the name suggests, it simulates networks.  It is completely programmable (in fact your network “scripts” are actually C++ code using the product’s libraries and functions) and can be used to accurately model the behaviour of a real network when faced with network traffic.  The project states that ns-3 models of real networks have produced libpcap traces that are almost indistinguishable from the traces of the real networks being modelled…  I’ll take their word for that, but when you get to configure the propogation delay between nodes in your simulated network it seems to me it’s pretty thorough.  Although the way that I found ns-3 was via a forum posting from someone who claimed to have used it to solve a similar situation as me, and ns-3 does provide a way to “bridge” between the simulated network and real networks, the simulation aspect of ns-3 seems to be more complexity than I’m looking for in this instance.  It does look like a fascinating tool however, and one I’ll definitely be keeping at least half-an-eye on.

To my eventual solution, then: Open vSwitch.  Designed with exactly my scenario in mind–network connection for virtualisation–it has at least two functions that make it ideal for me:

  • a Linux-bridging compatibility mode, allowing the brctl command to still function
  • IEEE 802.1Q VLAN support (innovatively at that)

The Open vSwitch capability can be built as a kernel module (there’s a second module that supports the brctl compatibility mode), or very recent versions have the ability to be run in user-space (with a corresponding performance drop).

On the surface, configuring an OvS bridge does seem to result in something that looks exactly like a brctl bridge (especially if you use brctl and the OvS bridging compatibility feature to configure it), but its native support for VLANs really brings it into its own for me.  In summary, for each “real” bridge you configure in OvS, you can configure a “fake” bridge that passes through packets for a single VLAN from the real bridge (the “parent” bridge).  This is exactly what I needed!

For the guest interfaces that needed full VLAN-awareness, I simply provided the name of my OvS bridge as the name of the bridge for libvirt to connect the guest to–OvS bridge-compatibility mode took care of the brctl commands issued in the background by libvirt.  The VLAN-unaware guest interfaces presented a bit of a challenge–the OvS “fake” bridge does not present itself like a Linux bridge, so it doesn’t work with libvirt’s bridge interface support.  This ended up being moderately easy to overcome as well, thanks to libvirt’s ability to set up an interface configured by an arbitrary script–I hacked the supplied /etc/qemu-ifup script and made a version that adds the tap interface created by libvirt to the OvS fake bridge.

The only thing I might want from this now is an ability for an OvS bridge to have visibility over a subset of the VLANs presented on the physical NIC.  The OvS website talks about extensive filtering capability though, so I’ve little doubt that the capability is there and I’m just yet to find it.  From a functionality aspect, OvS is packed to the gills with support for various open management protocols, including something called OpenFlow that I’d never heard of before (but I hope that some certain folks in upstate New York have!) but is apparently an open standard that enables secure centralised management of switches.

Detail of exactly how I pulled this all together will come in a page on this site; I’ll make a bunch of pages that describe all the mucky details of my KVM adventures and update this post with a link, so stay tuned!

Tags: , ,