Vallard's Tech Notes
Enterprise Datacenter Management Voodoo
Enterprise Datacenter Management Voodoo
Oct 23rd
Apple doesn’t ship support for mounting NTFS partitions read-write. You can mount drives, but you can only read from them. Well that’s not good enough. Go here:
No need to reboot, just close the installer when finished, then remount your drive and you’ll be writing like the wind.
I use the same drivers for CentOS/Red Hat to mount NTFS all the time.
Oct 23rd
In Windows I was able to do this from the VMware interface, but on Fusion I have to hard code it. This link shows how its done:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001875
Not that cool… But at least its possible.
Oct 21st
I do a lot of provisioning on my vmware machines to test features of xCAT. VMware’s built in DHCP server conflicts with my own. I turn it off by issuing:
sudo kill -15 `sudo cat /var/run/vmnet-dhcpd-vmnet1.pid`
Then I take DHCP into my own hands.
Oct 19th
Here’s the situation:
You have a machine called skull that has access to the internet. However, no one can come into skull from the outside.
You also have a machine that is on a private network with skull called bones.
Finally, you have a third machine out on the internet named benincosa.org that you want to have access bones.
To make this happen, you use an SSH backdoor plus redir to set it up. Here’s how its done:
1. On skull: ssh -R 2222:localhost:22 benincosa.org
2. On skull: redir –lport=2222 –cport=22 –caddr=bones
3. Now from anywhere: ssh -p 2222 benincosa.org and enter the passwd for bones and you will magically find yourself on bones.
That is how it is done my friends.
Another case:
from internal firewall machine:
ssh -R 2222:localhost:22 vallard@benincosa.org
On Benincosa.org run:
redir –lport=2223 –cport=2222 –caddr=127.0.0.1
Oct 19th
I had to install an old x440 with SCSI attached tape drive. I did it like this:
1. Go here:
ftp://ftp.software.ibm.com/storage/devdrvr/Linux/lin_tape_source-lin_taped/
Grab IBMtapeutil, lin_tape-1.*, and lin_taped*
2. Make sure you install dependencies: rpm-build, gcc
Easy way: yum -y install rpm-build gcc kernel-devel
3. rpm -ivh lin_tape-1.27.0-1.src.rpm.bin (why they left the bin extension is beyond me)
4. cd /usr/src/redhat/SPECS
5. rpmbuild -ba lin_tape.spec
6. cd ../RPMS/x86_64/
7. rpm -ivh lin_tape*rpm
8. Now install the lin_taped rpm you downloaded:
rpm -ivh lin_taped-1.27.0*rpm.bin
9. Install the IBMtapeutil tarball you downloaded:
tar xf IBMtapeutil.1.5.1.rhel5.x86_64.tar.bin
cd IBMtapeutil.1.5.1
make install
That’s it!
Oct 17th
Last night I tried my hand at iPhone development. I watched the Stanford classes on iTunes U. The first class by Evan Doll was ok. Since I’ve been programming for a while, really the only thing that helped was the last 15 minutes where he gave a demo of it. The demo didn’t work and there were also some differences in Xcode with my setup (either that or its the new 3.0 stuff.) I was able to figure it out and get it working.
2 Gotchas:
1. Make sure you save the .xib file in Interface builder before compiling and running in Xcode with apple-R
2. The part I was stuck on was adding the Actions and Outlets. I couldn’t do this in the Inspector window like he did. I created the class in the Inspector, but then had to add the Action and Outlets in the Tools->Library. Select the class in the list of all the classes and then you can edit it below.
Once that was done, I could follow along with the rest. It was a good way to jump into iPhone development. For me, since I come from the world of no IDE’s and just VI, this was a good intro. Hopefully I’ll be able to get more into it in the future. It certainly looks like a cool platform to develop on.
Oct 14th
All the sudden I stopped getting voicemail. I thought perhaps nobody wanted to leave me any messages. I was fine with that. Then I tried it myself. Lo and behold! It just stopped working! This is yet another example of the lameness of AT&T and the iPhone. I can tell you that people like me will leave AT&T as soon as this exclusivity trash stops. I think Verizon is evil too, but I rarely had a service issue with them like I do with AT&T. Now perhaps this would have happened on Verizon’s network as well. After all, Apple recently seems to have stability issues with the last few months of firmware upgrades… Quite frustrating. I wish I could just go back to June 2009. Things were perfect then. Lately, the network seems slow, the phone is sloggier. I’ve reinstalled from scratch a few times without any improvements. More competition will be good…
Oct 13th
In xCAT we’ve been doing ram-root solutions since 2005. We call it ‘stateless’. Since 2005 there have been a lot of other ‘stateless’ solutions that don’t necessarily match our definition. Fair enough. You can call it what you like. Red Hat for example calls ‘stateless’ an NFS root solution.
We have ignored NFS root for a while because ram-root seemed to be the best for HPC applications. The tradeoffs between NFS root and RAM root are the following:
NFS root = more network traffic
RAM root = less ram to use on your system.
RAM root has been very good to us because these days with Nehalem processors we normally see servers with 24GB of RAM. So to use 300MB at the most (with a bloated InfiniBand stack in it) doesn’t seem that bad.
However, if we consider a hypervisor running a slew of virtual machines, having a bunch of copies of the same thing in memory doesn’t make that much sense. Especially if you consider that if we do an NFS root then the files will be read only. NFS root also allows us to lock down the machine in a way that RAM root doesn’t. But that can create some complexity: NFS root you need to tell which files are read only and which ones can be writable.
xCAT will soon have an NFS root solution. We are examining other implementations to see what we can ‘steal’.
What I have learned is that you can’t just give any standard kernel the nfsroot=<path option>. This is because the proper modules are not all in the kernel to do NFS mounts. So the kernel dies. So we have to give it a ram disk. The secret sauce of xCAT will be in the ramdisk start up file. This is where we can scale, do random waits, and mount everything nicely. We also should put an xCAT client in the initial ramdisk so that we can tell the server where we are. Once you have it all up and going in the ramdisk, the last step is:
exec switch_root -c /dev/console /sysroot /sbin/init
The magic is in between that last step and the kernel loading the initrd.
Oct 6th
Sometimes I’ll do a basic install without any window manager (like gnome) and then want to add it after I’m done. This is most easily done by doing:
yum -y install gnome-desktop gnome-session gnome-applets gnome-panel gnome-utils gnome-screensaver
Once done do:
startx
And you’re good!
Sep 29th
Last week I researched a few different configuration management tools. Configuration Management is the art, or act of managing lots of computers in some organized fashion. The act of managing a computer involves what is put on the machine as far as software and also figuring out permissions, environmentals etc. The problem isn’t complex when you deal with maybe 1 or 5 machines. However, when you have a cluster, or a cloud, then having a good way to manage them all becomes very important.
In the world I came from, High Performance Computing, the job was a bit easier because every machine was identical. Every ‘node’ did the same thing. The only difference was the IP address, MAC address, and hostname. Everything else was identical. We never did any management other than the initial install plus some post scripts to make sure they were configured perfect. We could spend a few good solid days making sure our postscripts were perfect. That way if a machine died, or a new one needed to be added, installing it was trivial. In this we never needed any post configuration management. In addition the packages required were rather simple because a lot of the required files, libs, and programs were contained on the distributed file system. (NFS, GPFS, or some other way)
Another point to all this is that we usually kept our nodes ‘stateless’, or in other words ‘ram-root’ as it is called. Ram-root just means that the entire operating system resides in memory. You may say “wow, that’s a lot of memory” but keep in mind, the entire OS for HPC environments, including the memory hogging InfiniBand modules could be loaded in less than 200MB image. So when your modern Nehalem machines are usually equipped with 24GB of ram, then what is a measly 200MB of ram? Plus your system runs better cause its only doing what you want. This is all made possible via xCAT.
But, I digress. The world of cloud computing is different. There are different OSes, different applications, and we’re dealing with a very heterogeneous environment. Thus configuring the software on all of these machines is not as trivial of a problem. It’s no longer just one image that you need to be concerned about – it’s many!
Rather than creating my own, (which is never a good idea when there are so many great solutions available), I went to take a look at what was out there.
The most promising that I saw were:
Never the less, let me give some info on what I found:
This tool was created by Mark Burgess. There is an interesting talk he gave to google that is available on YouTube here. cfengine seems to be the most venerable and developed, but it seems from the mailing lists I’ve read that it’s seem to lost its luster in favor of puppet.
Puppet seems to be what all the cool kids are using these days. The web site is very well developed, the documentation seems to be organized well and far better than cfengine nor anything else I looked at. This really impressed me: If you want to make a good open source tool that everyone uses you need to do two things right:
1. You have to present it well on a web site with clear documentation, customer testimonials, and all kinds of good information.
2. You need to have to make it easy to use, get, install. IT is too complicated these days. No one wants to spend hours learning something. The easier you can make it to use the more successful it will be.
Puppet may not be better than cfengine (though I think they think it is) and it may not be better than bcfg2. But the presentation is worlds better, and that makes people want to use it. It invites you to use it. xCAT can take a page from that and it’s made me want to double my efforts in revamping the web page.
This shouldn’t be a surprise either. After all, this is what Apple does. They’re a marketing company. Presentation is everything. A good presentation, a good feel, and ease of use will make a tool stand out, even if it isn’t that much better than the rest in the pack.
Part of the marketing is that the person who started puppet used to code vigorously for cfengine adding lots of modules before striking out on his own. This gives people the idea that puppet is the next generation of cfengine. Its a good story. The ease of use is there, and so just on that alone, I can see why its all the rage now days.
bcfg2 or ‘bconfig’ seems to be the lone wolf of the pack. It’s web site even mentions that it doesn’t get as much press as it probably should. Well, what do you expect? This is a national lab full of unsexy engineers. (no offense guys/gals). They’re engineers developing tools. Having said that, Ti Leggett and I spoke and he showed me all the cool things bcfg2 could do. The modules in there seemed very cool as well as the client/server implementation.
So where does this leave me? Which one do you choose? Well, I hate to say it, but in my situation, I was looking for a solution that could handle an NFS root boot up. It was apparent that they could all handle this in a postscript bring up, but the solutions seemed to fall short when we got a little more specific:
Consider the case of an organization that want’s their images locked down. (meaning NFS root where nearly everything is read only and can’t be touched) This could be a large global organization so /etc/resolv.conf in a lab in Spain isn’t going to be the same as one in Montreal, even though they’re all using the same installation source. Never the less you want /etc/resolv.conf to boot up as a non-writable file, preferably nfs mounted. Sure the user could unmount the file and then change it as root, however no changes they make would stick.
It was a situation such as that where I couldn’t make use of these tools. Perhaps someone knows of a way to do it, but it seems to me that such a tool would need to be integrated into the creation of the ram disk. In addition this global traversing would have to go through a hierarchy of directories:
/foo/globalfiles/
/foo/usafiles/
/foo/newyorkcity/
/foo/datacenter3/
All of these directories may contain an /etc/resolv.conf or a SSH known-host keys that have to be integrated and concatenated down. Perhaps we could look at it from an object perspective instead and this would allow us to see if a node belongs to a particular class. If so how do you establish the hierarchy? It didn’t seem to me that the above tools could handle that. Maybe I’m wrong.
But I think like a lot of other people I would go with Puppet. Not because it’s technically better but because the crowd mind would look like this:
1. If everyone’s doing it, then its going to stick around and I’m not wasting my time learning a dying tool.
2. It’s so easy to learn cause all this documentation, then its not going to take me a long time.
Thus we see my friends, and my point: Sexy wins.