Vallard's Tech Notes
Enterprise Datacenter Management Voodoo
Enterprise Datacenter Management Voodoo
Aug 31st
So the first two days of VMworld so far have exceeded my expectations. It wasn’t so much the sessions (though they were pretty good). It wasn’t the partner super session. The super sessions basically verbalized what is on everybody’s mind right now. And that’s the thing that makes it exciting: Its the vibe in the air. Everyone knows that there’s a big change happening in the industry. And with this big change, we all sense opportunities.
The cool thing about these opportunities is that the field is wide open. Certainly VMware has a giant leap on everyone as solidified by the magic quadrant they’re hyping on their home page right now. But the way VMware is set up to let partners also develop solutions based upon them gives them sizable advantage. However, there are other great things afoot as exibited by the vendors in the tech floor down below where I write this.
So here are the coolest things so far:
1. The labs: They took a risk, put the lab in the cloud. I remember trying to do something like this at IBM and thinking it opened up a whole new realm of possibilities. Now the labs don’t need to be given strictly at the conference. In fact there’s no reason that VMware won’t offer these labs outside of VMworld. A fantastic investment. They’ve had some snags: slow networks, machines not provisioning, etc. However, it is usable and they announced that 3800 labs were deployed.
2. Meeting people that we’ve been reading from and just hearing ideas from people that I’m sitting next to. This has been great and the ideas are just flowing. There are lots of brilliant people here that freely share ideas. I really dig this.
3. The ability to sign up and talk to technical experts: a 1 on 1 session for 15 min to get questions answered. This has been great.
4. Seeing all my clients, partners, and long time friends. This is really what its all about: Meeting the people, getting contacts, leads, and laughing about the 24 hour sleepless nights and adventures we shared X years ago.
But there’s also a lot of fud and marketing and statements made that I don’t agree with.
One of the statements I disagree with is what they said in the partner super session and one that has been repeated many times. They say:
“virtualization is stage one of any cloud”.
I strongly disagree. Coming from an HPC background, I am very adamant about the statement: virtualization doesn’t equal cloud. And for that manner, you do not even need virtual machines to have a cloud. Virtualization comes after you’ve got a handle on your data center. This includes switches, physical machines, etc. So if virtualization is stage 1, then stage 0 is getting control of hardware.
Stage 0 means a lights out data center where its dark because people only go in there once in a while to replace failed components. Stage 0 also requires the ability to re-purpose hardware on demand. This is something we’ve been doing in xCAT for many years. Stage 0 means we can power physical machines off / on. We can deploy hypervisors or native OSes without hypervisors to physical machines over the network. This requires a centralized deployment engine. And all of this is the bottom layer that we at Sumavi and the open source xCAT community have been working on for many years.
This area of functionality can not be trivialized and VMware gets it. However, there is no product in its portfolio to hype other than some beta works so the problem is largely ignored and religated to “Look to your hardware vendor to provide this solution”. However, this ignores multivendor sites, which is pretty much everyone.
All of this has given me the feeling that what we are doing at Sumavi is becoming more and more important. Our partners and our customers have stressed the need for it. You can be sure to see more products in this space as time goes on. And you’ll certainly hear more of it at VMworld 2011.
Aug 27th
You have a user on your machine and you only want to enable them to do things like rinv, rvitals, and nodels. You don’t want them to be able to provision nor power on/off and do all those other awesome things that xCAT can do.
So what do you do?
Suppose your user name is ‘foobar’.
You do this:
1. Set up the policy table so that it contains the following: (tabedit policy)
1 2 3 4 5 | #priority,name,host,commands,noderange,parameters,time,rule,comments,disable "1","root",,,,,,"allow",, "1.1","foobar",,"rinv",,,,"allow",, "1.11","foobar",,"rvitals",,,,"allow",, "1.12","foobar",,"nodels",,,,"allow",, |
2. Set up the local cert for the user:
1 | /opt/xcat/share/xcat/scripts/setup-local-client.sh foobar |
Any other commands you can add by adding another number, like 1.13, etc. The numbers are arbitrary, just make sure there is a unique number. They stand for the priority of access of how the commands are processed. (e.g: if two commands are received by the xCAT server at the same time.
Aug 24th
After looking everywhere for a BitTorrent client for CentOS 5.5 I found that the old archives on bittorrent.com provided a perfect match that had no prereq RPMs that I had to download. I got BitTorrent-4.1.3-1.noarch. Installed it with RPM, then ran it like so:
1 | btdownloadgui.py |
Aug 20th
Here’s a simple example to connect to a hypervisor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #!/usr/bin/perl use Data::Dumper; require VMware::VIRuntime; VMware::VIRuntime->import(); use strict; # try logging into a node: my $conn; my $hyp = shift || 'vhost31'; print "performing action on $hyp\n"; eval { $conn = Vim->new(service_url=>"https://$hyp/sdk"); $conn->login(user_name=>'root',password=>'cluster'); }; |
Now you probably want to do something since you’re connected. The best way is to go over and read the VMware API documentation. The Reference Guide seems to be the best. You have to do a lot of guessing since it isn’t necessarily written for any language. Hopefully I’ll be able to post more on using this later. If you want to huge example, you can look at the ESX plugin in the xCAT source tree. We do pretty much everything you could think of with it. Since its open source, you can use it however you want.
Aug 19th
Wish I had better news. It can’t be done. After disabling all that ‘Smartware’ software as explained by on Western Digital’s website, I still fail when it tries to do an mcopy to grab the kickstart file.
And after loading all you get to see is:
rescanning in 10 second(s), press
I copied everything on to a different drive and it worked fine. Moral of the story: Take back your WD USB Hard Drive and get one that’s less ‘smart’
Aug 18th
I recently added the ESXi 4.1 base template kickstart file to xCAT. The code is checked in here. We’ve had the ability to do stateless ESXi 4.1 since it came out and we’ve been doing stateless ESXi 4.0 as well. But for some of our customers, we have needed a way to get the ESXi 4.1 server on the disk. This seems to be the most common way people want to install VMware ESX(i) these days. We hope in the future more people will go stateless. But for now, here is our xCAT ESXi 4.1 base kickstart file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # Sample scripted installation file # edited and updated by vallard@sumavi.com # Accept the VMware End User License Agreement vmaccepteula # Set the root password for the DCUI and Tech Support Mode rootpw --iscrypted #CRYPT:passwd:key=vmware,username=root:password# # clear all partitions. clearpart --alldrives --overwritevmfs # Choose the first disk (in channel/target/lun order) to install onto autopart --firstdisk --overwritevmfs # The install media is on the network. install url http://#TABLE:noderes:$NODE:nfsserver#/install/#TABLE:nodetype:$NODE:os#/#TABLE:nodetype:$NODE:arch# # Set the network to DHCP on the first network adapter #network --bootproto=dhcp --device=vmnic0 network --bootproto=dhcp # reboot automatically when we're done. reboot # A sample post-install script %post --interpreter=busybox --unsupported --ignorefailure=true # tell xCAT management server we are done installing # have to put in the IP address instead of the hostname because VMware # ESXi 4.1 can not resolve IP addresses... echo "<xcatrequest>\n<command>nextdestiny</command>\n</xcatrequest>" | /bin/openssl s_client -quiet -connect #COMMAND: host #TABLE:noderes:$NODE:xcatmaster# | head -1 | sed 's/.*address//g' #:3001 2>&1 | tee /tmp/foo.log # enable SSH on next boot: %firstboot --interpreter=busybox --unsupported --level=47 sed -ie 's/#ssh/ssh/' /etc/inetd.conf #ssh is too nice not to have |
Since this is an xCAT kickstart template then you see the #TABLE … # and #COMMAND ..# tags in there. Basically these are just cues for xCAT to look up the different attributes for the nodes so that it can customize this one template to be used on the entire data center. So the password, main HTTP server, and xCAT server are all stored in the xCAT database.
I have two scripts in here. The first is the %post. This script simply signals back to xCAT that it is done installing so that the next time it reboots, instead of reinstalling, xCAT will tell the node to boot to hard disk. This happens right after the install.
The second is the %firstboot script. Notice that I added the –level 47 to the script. This is important as it tells this script when to run. If you look at /etc/vmware/init.d/init you’ll see the levels. Level 48 starts the networking. Before the networking starts, I want to enable SSH, so I just uncomment the section inside /etc/inetd.conf to allow SSH to happen on boot. (Another thing you could do is just do an /etc/init.d/TSM-SSH start)
So this template is stored in xCAT in /opt/xcat/share/xcat/install/esx/. You can have a node boot to it (provided the rest of xCAT is setup and copycds have been run) by doing the following:
nodeset <noderange> install=esxi4.1-x86_64-base rpower <noderange> boot
or just:
rinstall <noderange>
Then the template is copied into the /install/autoinst/ directory and the name is changed to match the node and all variables are substituted in. Then the PXE server and DHCP server are set to point to the file to grab and install the node. This is in xCAT 2.5 which you can get now as the development release (make sure you grab the files at the bottom in the ‘Development Builds’ section)
Another thing that is fun to do with the ESXi kickstart file is to make a new VM as part of the kickstart install. Generally I recommend using an NFS server to store your VMs on, but there are cases where you just want them on the local drive. As part of the above kickstart file, the datastore1 partition is created. This is a place where you could now run the vim-cmds during post to create machines. This is easy to do during the firstboot section (you would probably do this at level 99) but not so easy to do in the %post section.
The problem with the %post section is that hostd isn’t running so none of the vim-cmds will work. So you have to start it. This can be done by running:
/etc/init.d/hostd start
But wait, there is another problem! The hostd command doesn’t return and hangs! So you have to use some magic (like creating a script to run it that forks off and returns) otherwise your %post hangs forever. (This is a total bug)
Anyway once you work around that then just running the commands like:
/bin/vim-cmd solo/registervm /vmfs/volumes/datastore1/vm01/vm01.vmx vm01 /bin/vim-cmd vmsvc/power.on 16
Seems to work. But, during %firstboot, you’ll have to reregister them again.
I hope to put more information on this as we go forward with it. I am happy that VMware has made this kickstart file for 4.1 and I can only see it improving over time. The more automation the better and with kickstart we can really automate everything we need.
Aug 13th
While working at IBM I wrote an article about how to install Windows Server 2008 using xCAT. The cool thing about this procedure is that you’re using Linux to provision a Windows machine, using the native Windows installer. This isn’t like the other solutions where they just do something like partimage. We think this still has a lot more cool stuff that can be done and from Sumavi’s perspective (my company) its just the beginning of what we’re going to be doing with Windows provisioning.
There are some common pitfalls to doing windows installations with xCAT. Here they are:
1. Is Samba enabled? This seems to be the biggest issue that I always forget. You’ll know if this is your problem and it boots all the way to the command prompt and then reboots. My Samba configuration looks like this:
/etc/samba/smb.conf
[global] workgroup = MYGROUP server string = Samba Server Version %v security = shared passdb backend = tdbsam load printers = yes cups options = raw [install] path = /install public = yes writable = no
Once that’s up restart it and make sure it comes back up on boot:
service smb start chkconfig --add smb
2. Do you have the drivers in your base WinPE image?
This is the hardest part. If Samba is up and you don’t have the drivers then you need to add them to your base WinPE image. I hope to write more on this later, but this is generally the big problem I run into.
3. Do you have drivers in your /install/drivers directory?
If the machine installs and then reboots fine, but then errors out its because it can’t find the boot directory. These drivers in /install/drivers are for the reboot and the script adds them in.
Usually once you get past these issues you can install Windows pretty easily. I hope to write another article on how to do this with the latest updates. Since I left IBM that document has been removed, so if you have troubles either post to the xCAT mailing list or drop me an email and I’ll be glad to see if I can help. We’re trying to make this easier.
Aug 13th
ESXi 4.1 kickstart is adequate for most things but I still have several issues with it that I consider ‘bugs’:
1. If you’re not connected to a network, it doesn’t work. This is fine since most people will be on a network with VMware anyway right? Fine, I’ll let this one slide. But if you just have a machine and a usb stick, then why do you need the network? Sure you’ll have it eventually but I just want to test it on my server on my desk…
2. The kickstart file likes to stop and give you alerts even if everything is ok. As an example: In the post install script if I don’t put the interpreter it stops and gives me a note: “Interpreter not specified, using busybox” That’s fine, that’s what the default is. Why stop me? The docs state clearly that the default is busybox.
3. Name resolution doesn’t work in postscripts. If you’re trying to get information from other hosts, it doesn’t work. Forget it. Just put in the IP address in your post install script.
4. USB installations without kickstart don’t work. You need to have a CD/DVD image. This is lame. In an era where most servers I deal with don’t have DVD roms, why make me buy a usb DVD drive? A $10 usb stick should do this just fine.
5. Lack of mount support. This kills me. I want to have a USB drive boot up ESXi 4.1 in kickstart and then boot up with a virtual machine. Problem is my virtual machine is 60GB. After digging around, I see that ESXi 4.1 can get files from a FAT32 filesystem by using the mcopy command. (It doesn’t do a mount). But what I really want is ext3 support so that I can copy 60GB files onto a hard disk. I’m thinking about hacking an ext3 driver for busybox, but I don’t know how difficult that will be. Right now, my options seem to break up my disk image into 2GB chunks so they fit on the FAT 32 partition… lame. Anyway, please don’t tell me to consider NFS and all that stuff, because I know that’s the optimal solution. This project is a little different than what you may be thinking of.
Anyway, I don’t want to keep complaining, so here are some nice things:
We want SSH on our machines, even if its not supported. So we add this to our kickstart file:
%firstboot –interpreter=busybox –unsupported –level=47
sed -ie ‘s/#ssh/ssh/’ /etc/inetd.conf
As I mentioned you can’t mount USB drives on ESXi 4.1. (At least I haven’t figured it out yet). You can do passthrough with the USB drives so that the VMs can mount them, but you can’t actually mount it on the hypervisor.
However you can copy files from the FAT32 partition. Here is an example of a command to use in a kickstart file:
mcopy -i /dev/disks/mpx.vmhba32:C0:T0:L0:1 \::IMAGEDD.BZ2 /install_cache/IMAGEDD.BZ2
(In fact, this is the exact command used by the installer to grab the bz2 image from the fat 32 partition)
So if you had a file named foo on there, you could substitute it in for the IMAGEDD.BZ2 file name and copy it onto your hypervisor. I would do this for copying *vmx files or things like that.
There’s one catch: The mcopy command is available during installation, but upon reboot, there is no mcopy command! So if you want it, then a good idea is to copy it during the kickstart file to some place where you can get it after its installed.
Anyway, happy VMware VMworld to all you who are going.
Aug 9th
I’ve been looking for a long time for a real whiteboard solution. The ones I usually run across that look good to me are $100+ dollars. I’ll grant that those have great quality, but for what I’m looking for it may be too much. So I was happily surprised 2 weeks ago while talking with the good people over at Rocky Mountain SuperComputing Center about what to do about this. They told me all I had to do was go to Home Depot and get one of their 8x4feet boards called melamine. I went with the kids to check it out. what I found was indeed an 8×4 foot whiteboard that was perfect for my room. So to my wife’s horror, I dragged this thing up the stairs, hacked a few inches off of it and proudly screwed in 6 screws to against my wall.
Total cost? $11.43. (screw cost not included) So naturally I looked at this and thought: I need more whiteboard!
So next Saturday when I get some time, I’ll be going back to the Home Depot and getting 3 more of them and it will cover my entire wall. Total cost for an entire wall: ~ $44. You can’t even buy a standard good whiteboard for that much money.
The benefits are endless:
- Teach my kids that its ok to write on walls.
- Late night math equations
- No more lost lists of feature requests.
Oh, and here’s the best part: If it gets old and stained, and I need to replace it? No problem, just take the old stuff down and turn it upside down and build a half-pipe out of it and I’ll just skate on it.
There really is nothing that makes a house a home like a wall of whiteboard to write on.
Aug 2nd
One of the problems with jEditable and HAML is that HAML puts whitespace in the form automatically. So my HAML script looks like this:
%td{ :class => 'edit_area', :id=>"#{k}_#{f}" }
= format_table_value(f, v[f])
The problem with this in jEditable is that now once the user clicks on it, there is gobs of white space around the word. To get rid of this you use HAML’s fancy no whitespace flag ‘<’. This looks like:
%td{ :class => 'edit_area', :id=>"#{k}_#{f}" }<
= format_table_value(f, v[f])
This is an example of how one simple character can really help improve the user experience