Learning-Org October 1999: Y2K a Design Flaw - Thus a learning Issue LO22778

Y2K a Design Flaw - Thus a learning Issue LO22778

Tom Christoffel (tjcdsgns@shentel.net)
Sat, 2 Oct 1999 10:39:40 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Steve Eskow: "Interdisciplinearity LO22779"
Previous message: ACampnona@aol.com: "Responsia in absentia LO22777"
Next in thread: Tom Christoffel: "Y2K a Design Flaw - Thus a learning Issue LO22805"
Reply: Tom Christoffel: "Y2K a Design Flaw - Thus a learning Issue LO22805"
Reply to [ author only ][ Learning-Org list ]

LO Folk:

I'm forwarding in its entirety this lucid description of where we sit
relative to the emergence of computer technology in our society and our
current dependence on it. It is written by Mark Kuharich of The Software
Review. I thought it might be of value to as a LO case study. For future
installments you can go the the source. I've been a subscriber for several
months and find Mark's viewpoint useful.

Tom Christoffel <tjcdsgns@shentel.net>
...snip...

*** THESOFTWAREVIEW post by: Mark Kuharich <mark@mrkint.com>

"the Software View": What, me worrY2K? or How I stopped worrying and
learned to love the Year-2000 problem. (Part I)

...snip...

Now, dear readers, on with this week's episode of "the Software View"!
Your company is an aircraft carrier, and it could also become your
Titanic. The iceberg is January 1, 2000 - Black Saturday. The good news
is that you can see it coming. The bad news is that you're heading right
for it. The worse news is that you can't turn in time. Now what? The
countdown has begun and we've less than a year until we discover what
"really" happens when the year becomes "00". Trust the software industry
to shorten the "Year 2000" problem to "Y2K". It was this kind of thinking
that caused the problem in the first place. It is The End Of The World As
We Know It (TEOTWAWKI).

THE PROBLEM: BLIND DATE

As the entire world now knows, there's a problem with computer hardware,
software, and data. Norman Shakespeare writes, "The Year-2000 problem's
root cause is easily explained. The earliest computer programmers had so
little memory to work with that any trick for saving two bits was
worthwhile. In 1950, who was even worried about how computers would
handle data in 2050? The chances that a year entered into corporate
records would need to begin with anything other than "19" seemed quite
remote, so dropping the century digits was adopted as a memory-saving
method. As computers became more powerful, this abbreviated dating
convention continued to be the standard, mostly out of habit.

For many reasons, programmers have routinely used only two digits to
represent the year in dates. Thus, "25" meant 1925. This works fine
until 1999. After that, two-digits dates cause confusion because, if "25"
means 1925, then "00" means 1900. This is called the Year-2000 problem -
or Y2K for short, or sometimes the Millennium Bug (although it's not a bug
at all). Year-2000 is a crisis without precedent in human history. We
know exactly when it is going to occur. We also know that its effects
will be global. We even know what is causing it and what to do about it.
That is right: We can, if we all choose, solve it before it happens,
although we probably will not. But here we are at the end of the
twentieth century, a time when the inability of our machines to answer the
simple question "is it the twentieth century or the twenty-first?" could
result in the collapse of the communications, financial, filing,
monitoring, security, and manufacturing systems that our entire economy
relies upon.

The timing will be fortunate, giving businesses the weekend to accommodate
the possible onslaught. New Year's Eve 1999 will fall on a Friday
evening. January 1 is a Saturday. So if the world comes to an end for a
couple of days, it'll be okay. We've all had weekends like that.

THE MYTH OF ORDER

Ellen Ullman writes, "The real lesson of the Year-2000 problem is that
software operates just like any other natural system: out of control. Y2K
has uncovered a hidden side of computing. It has always been there, of
course, and always will be. It has simply been obscured by the pleasures
we get from our electronic tools and toys, and then lost in the zingy glow
of techno-boosterism. Y2K is showing everyone what technical people have
been dealing with for years: the complex, muddled, bug-bitten systems we
all depend upon, and their nasty tendency toward the occasional disaster.

It is almost a betrayal. After being told for years that technology is
the path to a highly evolved future, it has come as something of a shock
to discover that a computer system is not a shining city on a hill -
perfect and ever new - or a gleaming glass tower of academia, but
something more akin to an old farmhouse built bit by bit over decades by
nonunion carpenters.

The reaction has been anger, outrage even - how could all you programmers
be so stupid? Y2K has challenged a belief in digital technology that has
been almost religious. But it is not surprising. The public has had
little understanding of the context in which Y2K exists. Glitches,
patches, crashes - these are as inherent to the process of creating an
intelligent electronic system as is the beauty of an elegant algorithm,
the satisfaction of a finely tuned program, the gee-whiz pleasure of
messages sent around the world at light speed. Until you understand that
computers contain both of these aspects - elegance "and" error - you can
not really understand Y2K.

Technically speaking, the "millennium bug" is not a bug at all, but what
is called a "design flaw". Programmers are very sensitive to the
difference, since a bug means the code is at fault (the program is not
doing what it was designed to do), and a design flaw means it is the
designer's fault (the code is doing exactly what was specified in the
design, but the design was wrong and/or inadequate). In the case of the
millennium bug, of course, the code was designed to use two-digit years,
and that is precisely what it is doing. The problem comes if computers
misread the two-digit numbers - "00", "01", et cetera. Should these be
seen as 1900 and 1901, or as 2000 and 2001? Two-digit dates were used
originally to save space, since computer memory and disk storage were
prohibitively expensive. The designers who chose to specify these
two-digit "bugs" were not stupid, and perhaps they were not even wrong. By
some accounts and estimates, the savings accrued by using two-digit years
will have outweighed the entire cost of fixing the code for the year 2000.

But Y2K did not even begin its existence as a design flaw. Up until the
mid-1980s - almost thirty years after two-digit years were first put into
use - what we now call Y2K would have been called an "engineering
trade-off," and a good one. A trade-off: To get something you need, you
give up something else you need less urgently; to get more space on disk
and in memory, you give up the precision of the century indicators.
Perfectly reasonable. The correct decision. The surest sign of its
correctness is what happened next: Two-digit years went on to have a long,
successful life as a "standard." Computer systems could not work without
standards - an agreement among programs and systems about how they will
exchange information. Dates flowed from program to program, system to
system, from tape to memory to paper, and back to disk - it all worked
just fine for decades.

Though not for centuries, of course. The near immortality of computer
software has come as a shock to programmers. Ask anyone who was there:
We never expected this stuff to still be around.

Bug, design flaw, side effect, engineering trade-off - programmers have
many names for system defects, the way Eskimos have many words for snow.
And for the same reason: They are very familiar with the thing and can
detect its fine gradations. To be a programmer is to develop a carefully
managed relationship with error. There is no getting around it. You
either make your accommodations with failure, or the work will become
intolerable. Every program has a bug; every complex system has its blind
spots. Occasionally, given just the right set of circumstances, something
will fail spectacularly. There is a Silicon Valley company, formerly
called Failure Analysis, whose business consists of studying system
disasters. The company's sign used to face the freeway like a warning to
every technical person heading north out of Silicon Valley: "Failure
Analysis".

No one simply accepts the inevitability of errors - no honest programmer
wants to write a bug that will bring down a system. Both engineers and
technical managers have continually looked for ways to normalize the
process, to make it more reliable, predictable - schedulable, at the very
least. They have talked perennially about certification programs, whereby
programmers would have to prove minimal proficiency in standard skills.
They have welcomed the advent of reusable software components, or
"objects," because components are supposed to make programming more
accessible, a process more like assembling hardware than proving a
mathematical theorem. They have tried elaborate development
methodologies. But the work of programming has remained maddeningly
undefinable, some mix of mathematics, sculpting, scrupulous accounting,
and wily, ingenious plumbing.

In the popular imagination, the programmer is a kind of traveler into the
unknown, venturing near the margin of mind and meatspace. Maybe. For
moments. On some extraordinary projects, sometimes - a new operating
system, a newly conceived class of software. For most of us, though,
programming is not a dramatic confrontation between human and machine; it
is a confused conversation with programmers we will never meet, a
frustrating wrangle with some other programmer's code called maintenance.

Most modern programming is done through what are called application
programming interfaces, or API's. Your job is to write some code that
will talk to another piece of code in a narrowly defined way using the
specific methods offered by the interface, and only those methods. The
interface is rarely documented well. The code on the other side of the
interface is usually sealed in a proprietary black box. And below that
black box is another, and below that another - a receding tower of black
boxes, each with its own errors. You can not envision the whole tower,
you can not open the boxes, and what information you have been given about
any individual box could be wrong. The experience is a little like looking
at a madman's electronic bomb and trying to figure out which wire to cut.
You try to do it carefully but sometimes things blow up.

At its core, programming remains irrational - a time-consuming,
painstaking, error-stalked process, out of which comes a functional but
flawed piece of work. And it most likely will remain so as long as we are
using computers whose basic design descends from ENIAC, a machine
constructed to calculate the trajectory of artillery shells. A programmer
is presented with a task that a program must accomplish. But it is a task
as a human sees it: full of unexpressed knowledge, implicit associations,
allusions to allusions. Its coherence comes from knowledge structures
deep in the body, from experience, memory. Somehow all this must be
expressed in the constricted language of the API, and all of the
accumulated code must resolve into a set of instructions that can be
performed by a machine that is, in essence, a giant calculator. It should
not be surprising if mistakes are made.

There is irrationality at the core of programming, and there is
irrationality surrounding it from without. Factors external to the
programmer - the whole enterprise of computing, its history and business
practices - create an atmosphere in which flaws and oversights are that
much more likely to occur.

The most irrational of all external factors, the one that makes the
experience of programming feel most insane, is known as "aggressive
scheduling." Whether software companies will acknowledge it or not,
release schedules are normally driven by market demand, not the actual
time it would take to build a reasonably robust system. The parts of the
development process most often foreshortened are two crucial ones: design
documentation and testing. There is a senior consultant - a woman who has
been in the business for some thirty years, someone who founded and sold a
significant software company - who explains why she would no longer work
with a certain client. She had presented a software development schedule
to the client, who received it, read it, then turned it back to her,
asking if she'd remake the schedule so that it took exactly half the time.
There were many veteran programmers in the room; they nodded along in
weary recognition.

Even if programmers were given rational development schedules, the systems
they work on are increasingly complex, patched together - and incoherent.
Systems have become something like Russian nesting dolls or gifts within
gifts, with newer software wrapped around older software, which is wrapped
around software that is older yet. We have come to see that software
programming code does not evolve; it accumulates.

A young Web company founder - very young; Scott Hassan of eGroups.com -
suggests that all programs should be replaced every two years. He is
probably right. It would be a great relief to toss all our old code into
that trash container where we dumped the computer we bought a couple of
years ago. Maybe on the Web we can constantly replenish our code: The
developer never lets go of the software; it sits there on the server
available for constant change, and the users have no choice but to take it
as it comes.

But software does not follow Moore's Law, doubling its power every
eighteen months. It is still the product of a handworked craft, with too
much meticulous effort already put into it. Even eGroups.com, founded
only nine months ago, finds itself stuck with code programmers have no
time to redo. Said Carl Page, another of its founders, "We are living
with code we wish we had done better the first time."

The problem of old code is many times worse in a large corporation or a
government office, where whole subsystems may have been built twenty or
thirty years ago. Most of the original programmers are long gone, taking
their knowledge with them - along with the programmers who followed them,
and ones after that. The code, a sort of palimpsest by now, becomes
difficult to understand. Even if the company had the time to replace it,
it is no longer sure of everything the code does. So it is kept running
behind wrappers of newer code - so-called middleware, or quickly developed
user interfaces like the Web - which keeps the old code running, but as a
fragile, precious object. The program runs, but is not understood; it can
be used, but not modified. Eventually, a complex computer system becomes
a journey backward through time. Look into the center of the most
slick-looking Web banking site, built a few months ago, and you are bound
to see a creaky database running on an aged mainframe.

Adding yet more complexity are the electronic connections that have been
built between systems: customers, suppliers, financial clearinghouses,
whole supply chains interlinking their systems. One patched-together
wrapped-up system exchanges data with another patched-together wrapped-up
system - layer upon layer of software involved in a single transaction,
until the possibility of failure increases exponentially.

It is from deep in there - somewhere near the middle-most Russian doll or
gift in the innermost layer of software - that the millennium bug
originates. One system sends it on to the next, along with the many bugs
and problems we already know about, and the untold numbers that remain to
be discovered. One day - maybe when we switch to the new version of the
Internet Protocol, or when some router somewhere is replaced - one day the
undiscovered bugs will come to light and we will have to worry about each
of them in turn. The millennium bug is not unique; it is just the flaw we
see now, the most convincing evidence yet of the human fallibility that
lives inside every system.

It is hard to overstate just how common bugs are. Every week, the
computer trade paper "InfoWorld" prints a little box called "The Bug
Report," showing problems in commonly used software, some of them very
serious. And the box itself is just a sampling from "www.bugnet.com",
where one day's search for bugs relating to "security" yielded a list of
sixty-eight links, many to other lists and to lists of links, reflecting
what may be thousands of bugs related to this keyword alone. And that is
just the ones that are known about and have been reported.

If you think about all the things that can go wrong, it will drive you
crazy. So technical people, who can not help knowing about the fragility
of systems, have had to find some way to live with what they know. What
they have done is develop a normal sense of failure, an everyday
relationship with potential disaster.

Sincerely,
Mark Kuharich

...snip...

Tom Christoffel, AICP * e-mail: tjcdsgns@shentel.net
Futurist, Facilitator & Regional Planner - My mission: "Value Regions
because Regions Work!" Why?
The economy is global; production is local; all markets in-between are
regional. Check my region's website: http://www.lfpdc7.state.va.us
Regional alignment of public data sets enables creativity and supports
sustainabilty.
*TJCdesigns * Box 1444 * Front Royal, Virginia (VA) 22630-1444 * Ph:
540-635-8582*

-- 

Tom Christoffel <tjcdsgns@shentel.net>

        Learning-org -- Hosted by Rick Karash <rkarash@karash.com>
  Public Dialog on Learning Organizations -- <http://www.learning-org.com>

Next message: Steve Eskow: "Interdisciplinearity LO22779"
Previous message: ACampnona@aol.com: "Responsia in absentia LO22777"
Next in thread: Tom Christoffel: "Y2K a Design Flaw - Thus a learning Issue LO22805"
Reply: Tom Christoffel: "Y2K a Design Flaw - Thus a learning Issue LO22805"