Vallard's Tech Notes
Enterprise Datacenter Management Voodoo
Enterprise Datacenter Management Voodoo
Apr 2nd
Got some info from various sites, so thought I would consolidate it here. I just started a new project on my Mac and I want to back it up on the net. Here is what I did:
yum -y install zlib-devel openssl-devel perl cpio expat-devel gettext-devel autoconf curl curl-devel gcc ldconfigtar zxvf git-latest.tar.gz cd git-<date> make configure ./configure --prefix=/usr make make install
1 wget http://www.codemonkey.org.uk/projects/git-snapshots/git/git-latest.tar.gz
When done you should be able to run:
git
and see a help message. Nice! This means git is installed. Now let’s set up a repo.
mkdir -p /var/git/cookingwright cd /var/git/cookingwright.git git init
Cool, we now have a blank repository on our CentOS 5.4 server.
So I’ve been developing this base project on the MBP so now I want to make sure I start using SCM/git. So I go to my app:
cd ~/Sites/cookingwright git init git add . git commit -am 'initial release' git remote add origin ssh://benincosa.com/var/git/cookingwright
git push origin master
Now I’m git-afied. Hurray!
Mar 15th
I got a good tutorial here:
http://www.gra2.com/article.php/using-rmagick-imagemagick-rails
for installing this. I’m installing it because I need it for, (what else?) manipulating images:
Well when I first tried to run:
sudo port install imagemagick
I got the nice errors telling me that make couldn’t be found… Shesh… so I had to reinstall Xcode and then I had to reinstall zlib:
sudo port upgrade –force zlib
Then it finally built!
Anyway, then it was just:
gem install rmagic.
For Linux I thought it would be easier, but it wasn’t because my RPMS were older. 6.2 instead of 6.3 or greater. (I’m using
CentOS 5.4 but apparently this isn’t new enough)
I thought the answer was:
yum -y install imagemagick ImageMagick-devel
With tons of dependencies!! But when running:
gem install rmagic
I got the error:
Can’t install RMagick 2.12.2. You must have ImageMagick 6.3.5 or later.
So after trying a few things, I found this blog entry to be the most helpful:
http://andrewduck.name/2009/01/imagemagick-64x-on-centos-5/
(Look at the notes below, that’s what I used) and was able to get it installed:
- yum remove ImageMagick
- use the configure options and make, make install
- Then tried to build again:
gem install rmagick
Mar 11th
on machine behind firewall do this:
1 | <strong>vncserver :99</strong> |
1 | <strong>ssh -R 5999:localhost:5999 user@box.on.the.internet.com</strong> |
From my OSX machine:
1 | <strong></strong> |
1 2 | <strong>vncviewer -via user@box.on.the.internet.com :99 </strong> |
Mar 11th
Installing Windows remotely I’ll sometimes only be able to see the SAC> on the remote console output since I may not have a graphical viewer to do rdp or some other tricky way.
Once SAC boots up I can access the command prompt by doing the following:
cmd
(It then gives you a name like cmd001)
Switch to it:
ch -sn cmd001
Now enter the user name, I usually don’t enter any domain, and then enter the password. Once done you can log in and run commands!!
C:\Windows\system32>diskpart Microsoft DiskPart version 6.0.6001 Copyright (C) 1999-2007 Microsoft Corporation. On computer: I02 DISKPART> list disk Disk ### Status Size Free Dyn Gpt -------- ---------- ------- ------- --- --- Disk 0 Online 233 GB 0 B DISKPART> select disk 0 Disk 0 is now the selected disk. DISKPART> list partition Partition ### Type Size Offset ------------- ---------------- ------- ------- Partition 1 Primary 233 GB 1024 KB DISKPART>
Mar 11th
Everyone raves about Chicken of the VNC on OSX but the version I have doesn’t have ssh forwarding… something I really need. So here is how I got a good vnc.
First:
sudo port search vnc
That listed some vnc clients.
I first tried just standard vnc. But somewhere in the middle my files got corrupted on download because I’m on a hotel network, so I had to resync:
sudo port sync
Now I tried again:
sudo port install vnc
That failed too since I removed the corrupt file that was xorg-libXmu. So I had to clean it up:
sudo port clean –all xorg-libXmu
Rerunning:
sudo port install tightvnc
Now I have a real man’s VNC since I can get to it on the command line with the -via option!!
Mar 4th
I wrote a little last year on installing Windows iSCSI with xCAT. Its a great trick and there’s more that Windows has come out with since then to make their HPC product do similar things.
The only problem with doing this on xCAT is that its a huge landmind of problems. Coupled with the fact that doing this on VMware makes it a slow process I thought I’d list all the things that can go wrong.
Here are the issues I ran into that took me quite a while to go through and debug:
1. Corrupt ISO image. A corrupt ISO image will actually copy with xCAT’s copycds and then you’ll actually see it expand just fine. It isn’t until you get to the setup.exe when you start seeing messages like:
“this application has failed to start because SPWIZENG.DLL
was not found. Re-installing the application may fix this problem.”
“the file ‘autorun.dll’ could not be loaded or is corrupt. setup
cannot continue. error code is [0x7E]”
These errors were both due to a bad ISO. I found the windows ISO on their website, downloaded it and problem solved.
2. VMware DHCP server
You have to disable this! Then you can let xCAT do all the DHCP work. Even if xCAT serves the first DHCP and you get the iSCSI, there’s still some DHCP requests that happen after the install. If you don’t get it then you have problems.
3. Wrong or bad WinPE file
I was using a WinPE file that I had made from a Windows 7 install to do Windows 2008. They are supposed to be backwards compatible, but this one didn’t work for me. It could be that I forgot to include the right drivers. But it worked just fine until all the sudden it dropped the iSCSI connection. (I saw this in my syslog as it tgtd would get an unexpected disconnect)
4. / file system full!
I didn’t realize that I had filled it up! But apparently I did. Total bummer. So I had to clear out some data. I found out this was so because the samba server wasn’t making any connections. When I trolled through my logs I saw that it was because there was no space left! Yikes. I should have done something better to take care of that.
5. xCAT tables…
This is where you can really be thrown off. Especially if your noderes.netboot is set to pxe. It should be set to xnba for it to work properly!
6. gPXE or xNBA?
I couldn’t tell if gPXE was the problem so I tried to upgrade to 1.0. This only made matters worse because of certain things xCAT does with DHCP. (xNBA by the way means xCAT NetBoot Agent, which is gPXE with some patches)
So after a day or so of hacking around, we’re back to having xCAT deploy Windows Server 2008 over iSCSI without any special hardware. Still a pretty decent solution for anyone looking for Windows Stateless. Its as close as it gets right now.
Mar 3rd
wget http://download.microsoft.com/download/B/4/D/B4DC75A1-D7D2-4F31-87F9-E02C950E8D31/6001.18000.080118-1840_amd64fre_Server_en-us-KRMSXFRE_EN_DVD.iso
Feb 19th
finding the intersection in Ruby of 2 arrays is quite easy. If a is an array and b is an array then c can be the intersection as follows:
c = a & b
Similarly the union of the arrays can be found as follows:
c = a | b
The problem I had was finding the intersection of multiple arrays. So how to do it?
I did it as follows:
1. First find the union of all arrays.
2. Then take the intersection of the big union with each of the individual parts.
This sounds vaguely familiar to me as perhaps it was something I did in my computer science class at one point.
Anyway, here is the function:
def intersect(array_of_arrays)
ar = []
# first find all the commons, then do intersection on it.
array_of_arrays.each { |a|
ar = a| ar
}
# now take the intersection of each array with the union of them all
array_of_arrays.each { |a|
ar = a & ar
}
ar
end
Anyway, if there’s some better way to do it, I’d love to know.
Feb 7th
Every one is talking about the cloud. Today on HPC wire I read how NASA is turning to Parabon to create a cloud to enable all the scientists to use machines all over the different NASA sites. NASA will pay Parabon 600k to implement this. Here is where you can read the article.
Obviously, I’m skeptical about this because to me cloud means more than just virtual machines and more than just taking machines with fixed personalities and routing appropriate code to them. After a brief reading of Parabon’s technology it seems to offer the following:
- A web interface to launch a job
- A scheduler to schedule the job
- A client that runs on Operating Systems of different types: Mac OSX, Windows, and Linux.
This just sounds to me like its a scheduler like Platform’s LSF or Adaptive Computing’s Moab.
There are some pluses: Its in its 4th generation which should imply stability, and its got a sleek web interface.
But as I read this, to me its just Grid. Yes, ladies, there is a difference between Grids and Clouds that people just don’t seem to get. They also mention that their software is compatible with Virtual machines. Big deal! Virtual machines are just operating systems too.
The article states how they will be able to save money and consolidate resources. I think that’s great. And I think they’re moving in the right direction, but this is not a cloud. This is a grid.
My definition of a cloud includes that all machines in your data center have no fixed purposes or personalities. They can be interchanged. All this Parabon solution does is puts a client on everybody’s machine (which I don’t think HPC folks will particularly enjoy) and puts it under management (which less people will enjoy).
These people are rightly focused on the end user, but the solution seems to put more burden on the IT staff. In case you haven’t spoken to your friendly IT administrator, I can assure you they more than likely already have their hands full. Cloud solutions also need to make life easier for the Admin and make it so that he doesn’t have to install machines.
What happens if NASA user Joe decides he wants to run his app, but his app only runs on SLES 10.2? Now what if all the SLES10.2 boxes are taken? But, there are 100 Red Hat 5.3 boxes sitting idle and 5 Windows XP boxes not doing anything? Tough luck Joe. With this solution, you can’t have those nodes, because you’re running on a Grid, not a Cloud.
Jan 29th
Today is my last day as an employee at IBM. Its been fantastic. I’ve had a great experience to learn about this business of scale-out computing. I’ve met with some bright minds and gained experiences I could have no where else.
I’ve been able to visit data centers of nearly every big company in the industry and see how they opperate: Boeing, Toyota, Honda, JP Morgan Chase, Visa, Bank of America, NYU, Ohio State, USC, Google, NERSC, Berkeley, UCLA, Blizzard, Acxiom, Shell, BEA, IPICyT, iStockPhoto, Latisys, Lego, Lockheed Martin, NASA Goddard, NAVO, UNF, NetApp, Omniture, Adaptive Computing, PayPal, eBay, PNNL, SafeCo, SciNet (University of Toronto), Sony, Warner Brothers, Threshold, Synopsys, TV Guide, HBO, University of Chicago, University of Pittsburgh, ARL, Voonami… and I’ve configured machines for dozens of places I never visited.
So I’ve gained a very good perspective as to what people are doing and how the manage things. This perspective has given me more confidence than ever as to what kind of solutions people are looking for and what solutions I can deliver.
So I’m striking out on my own to deliver these solutions. Fortunately, I’m not entirely alone. I’ve got a good group of people at ThinkAtomic that will be teaming up with me to make this happen. I look forward to sharing more as things develop!