splitbrain.org

electronic brain surgery since 2001

mystic network tales

This was a very strange day. I've been at work today. There were a few smaller boring things to do like setting up some new machines and installing Windows XP on them for some new employees. Well I was busy building in parts and fighting with a box that denied to boot from the CD-ROM drive when my boss called. We had to discuss some new security service we want to offer. Just a perfect normal business day…


Well when I came back to my workplace I discovered that somehow the network was slow. Suddenly my home directory wasn't available anymore:

Stale NFS handle

. “Oh no!” I thought and went straight to the fileserver's console at our serverrack. I had a look but everything was fine: 98% idle, enough free RAM, low load, RAID mounted without errors - everything looked as it should. Well to be sure I restarted the NFS kernel server daemon. I went back to my workstation to remount the homes with

mount /home -o remount

. But I still got

Stale NFS handle

. Meanwhile my colleagues came in reporting there's something wrong with Samba. I went back again to the serverrack and restarted Samba and tried to access the files from a colleagues Windows box - everything okay.

But my NFS problem still wasnt solved. Then I discovered the next mystery: when becoming root by using

su

I could access the homes which weren't accessible as normal user - very strange. I thought it may be some problem with LDAP service which also runs on the fileserver for group and user authentication. Tired of walking back an forth betwen my workstation and the serverrack, I locked in remote. I did a

ps aux

and noticed [i]a lot[/i] of

slapd

's running. “Gotcha!” I thought an restarted the LDAP server. But this didn't solve anything. Setting slapd's debug level to 256 brought up permanent requests to the ldap but with [i]very strange[/i] filters. However I couldn't see were they were originating from.

This was the time when I started to think that there may be a problem with one of the libnss_ldap modules on one of the clients. Maybe it was running amok producing these requests? So I started to proof this hypothesis by rebooting some machines, including my own one. But nothing worked - the strange requests remained.

I logged in on the fileserver again. And discovered that the RAID wasn't mounted anymore. This was strange. A

dmesg

showed me that the box just went through a booting process and

uname -a

revealed that it was running the wrong Kernel: 2.4.22 which doesnt work with the IDE controller. That was the explanation for the unmounted RAID (which is a software RAID). But who the fuck booted the fileserver? I for sure did not! I asked my colleages but nobody did it. Mmmhh so it must have done it itself I thought.

Clueless I walked over to the serverrack again. But was that? The console was still open and showed my last commands? How the hell? I thought it just booted? I tried an

uname -a

- Kernel 2.4.20!

mount

- RAID mounted! Okay - I thought - thats it. I finally went mad. Strange feeling. I should get a sane person to help me - so I went to my boss and asked him if he sees what I do. At my desk: logged in to fileserver, IP 192.168.0.75, kernel 2.4.22. At the serverrack: logged in to fileserver, IP 192.168.0.75, kernel 2.4.20. Yes he saw the same. phew… not mad… but the world went crazy…

Next thing we did: pulling the network cable out of the fileserver. Switching to another server at the KVM and logging in to the fileserver. Yes it works. That an interesting case of a virtual server :-)

Okay so theres a second fileserver but where? I started to ping the thing and following the periodic blinking lights at the switches to get the room. Well my boss was faster he did it by just running around (we only have a few rooms).

And there it was: the box I was assembling in the morning with the broken CDROM drive. It just booted from the newly installed harddrive. Yes - you already guess it. This harddrive was the old system disk of the fileserver! Still configured with the same fucking IP and connected to the network! AAARGHH!