Archive for the ‘Uncategorized’ Category

How To Create OOM Killer Exceptions in Linux

Wednesday, October 19th, 2011

When a linux machine runs extremely low on memory the kernel begins deciding which processes it thinks are least important and starts killing them off in order to keep the processes it thinks are more important running. Unfortunately, the kernel OOM (out of memory) killer rarely makes the right decision, and your system is usually hosed once OOM killer rears it’s ugly head. Luckily, you can tell the kernel to never OOM kill a given pid. So, If you’re running on a low memory system, and want to ensure that important processes (like sshd for instance) are never killed, these options may be of use to you.

Disabling OOM is done on a process by process basis, so you’ll need to know the PID of the running process that you want to protect. This is far from ideal, as process IDs can change frequently, but we can script around it.

As documented by http://linux-mm.org/OOM_Killer: “Any particular process leader may be immunized against the oom killer if the value of its /proc/$pid/oom_adj is set to the constant OOM_DISABLE (currently defined as -17).”

This means we can disable OOM on an individual process, if we know its PID, using the command below:

# OOM_DISABLE on $PID
echo -17 > /proc/$PID/oom_adj

Using pgrep we can run this knowing only the name of the process. For example, let’s ensure that the ssh listener doesn’t get OOM killed:

pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

Here we used pgrep to search for the full command line (-f) matching “/usr/sbin/sshd” and then echo -17 into the procfs entry for each matching pid.

In order to automate this, you could run a cron regularly to update the oom_adj entry. This is a simple way to ensure that sshd is excluded from OOM killer after restarting the daemon or the server.

#/etc/cron.d/oom_disable
*/1 * * * * root pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

The above job will run every minute, updating the oom_adj of the current process matching /usr/sbin/sshd. Of course this could be extended to include any other processes you wish to exclude from OOM killer.

wwt (Wiimms WBFS Tool) Usage Examples

Friday, October 14th, 2011

It took me a bit of trial and error to get this tool to do exactly what I wanted, so here are a few quick examples that hopefully will save you some time.

Updating the WBFS partition

Copy everything in the /source/path directory to the wbfs partition (auto detected) that doesn’t already exist on the wbfs partition.

# sudo wwt UPDATE -t /source/path/ --auto


UPDATE: Copy only new files
-t: Test mode, don't make any changes (until this flag is removed)
--auto: Auto detect the wbfs drive destination

Time Management For Recovering Sysadmins

Tuesday, June 21st, 2011

Everybody has heard the usual time management lecture; plan your day in advance, work from a list, don’t multitask. All fine and good, but this advice doesn’t really address the constant interruptions that sysadmins deal with. Let’s be honest, it’s really hard to focus on getting shit done while you’re constantly being pinged for help and advice. I spent some time recently thinking about what events distract me the most, and then tried to figure out ways to tune out those distractions. It’s a work in progress, but here’s the advice I’ve got so far.

Choose Your Interrupts Wisely

Face it, systems administration is always going to be interrupt driven. On top of day to day operations, many people depend on you for projects, advice and help carrying heavy things. Not to mention that if a system or service falls over, you’re going have to drop what you’re doing and fix it. You can’t get away from every interruption, but you can categorize interrupts into two types. Things you need to know about right now, and things you don’t.

Does your mail client really need to bounce around on every new message? – Set your email client to check for mail less often, and turn off the alerts. You wont miss out on anything critical (because people will literally run around when shit breaks) and you’ll gain back a significant amount of your concentration. Also, try setting up special notifications for specific important individuals, like your boss.

Do you really need to receive every Nagios alert on your phone? – Many sysadmins that I know like to stay “in touch” with things by subscribing to everything under the sun in the monitoring system. I think this is dumb. Instead, choose carefully which alerts you receive, and reserve SMS alerting for absolutely critical events. Stay “in touch” by planning time to proactively monitor systems and by implementing more advanced automatic health checks.

Should every event in your IM or IRC client make a noise and trigger a visual alert? – I doubt it. In fact, you probably only need to see messages that are either directed to you, or that are relevant to an area of your responsibility or interest. Try setting up alerts in your chat client that automatically notify you when your name and other specific keywords are mentioned.

Do you really need to sift through all that cron output email? – If you find yourself constantly discarding the same sets of cron output each night, stop doing it. Take a few minutes and edit the offending cron job to redirect the unnecessary output to /dev/null, or add a conditional to the cron job to alert you only if the job returns an error code.

Does Your Twitter/Facebook/Whatever desktop app seriously need to be running? – These things are the kings of all time wasters, and they usually aren’t even loosely related to the things you need to get done. Close them.

Don’t be a BOFH

Seriously. You can save a ton of time in the long run by simply teaching and encouraging your co-workers to carry out basis sysadmin tasks themselves. Don’t worry about root access, just give them sudo privs for a few commands at first. It will make their life easier because they can get things done without bugging the grumpy sysadmin, and you won’t have to deal with being interrupted to chown or chmod a file for someone.

Do One Small Thing at a Time

You can’t multi-task. Don’t even try to fool yourself. Seriously. If you want to see what I mean, try to have a phone conversation and write a regular expression at the same time. It’s just not going to happen. Concentrate fully on what you’re doing and keep a todo list. Focus, make notes, and get things done one at a time.

With that said, lists can be really overwhelming until you start crossing things off. So, try breaking down large projects into individual tasks. Things that take maybe 10-20 minutes to complete, max. For me, constantly finishing tasks makes me feel like things are getting done, and motivates me to do more. It also lends itself to focusing on one thing at a time, and provides a nearby stopping point if you need to switch tasks.

Read Email in Bulk

Now that your email client isn’t bouncing up and down every 30 seconds you can check your mail at whatever interval works best for you. Checking mail in bulk is basically just a sorting game. Stuff you need to do gets added to your list and stuff you need to reply to gets replied to. I also find that deleting or archiving messages from my inbox makes it much easier to identify messages that still need my attention. Besides, having an empty inbox just feels plain good.

Close Windows and Tabs When You’re Finished.

Seriously. This is really obvious but its easy enough to neglect. When you’re done with a tab, close it. When you’ve finished reading an email, close it. When you’re finished with a terminal, close it. Not only does it make your workspace less distracting and more focused on your current task, but it also will free memory and cpu cycles which will make your machine run faster, too.

“Self-Healing” Scripts Aren’t Evil

I’ve heard so many sysadmins go on and on about how taking an automatic action on a monitoring alert is a bad idea. This is bullshit. Sure, there may be some oddball scenario where a complication could occur if its a full moon on the second tuesday of the month. But actual problems are what you’re there to troubleshoot and fix, right? I mean, think about how many times you’ve had drop what you’re doing only to find that apache needed a restart. My point is, don’t go crazy. Let automatic self-healing scripts attempt the basic fixes, and have the system page you if things still are still broken. 9/10 times it will fix the issue.

Don’t Attend Every Meeting

Try to only attend meetings that you will pay 100% full attention to. If you’re just planning to bring your laptop and work on something else, why go at all? Also, if everyone in your meeting is on their laptop working on other things, take the hint.

Use A Config Management System

I use puppet, and it rocks. If you don’t already have a config management system in place, it’s time to bite the bullet and set one up. You don’t have to get your entire infrastructure under puppet control overnight. Just set up one thing at a time. For instance, if you need to update the SSH authorized_keys on all your hosts, take the time and write a puppet module for it instead of doing it by hand. It’s a bit more time up-front, but once it’s done, managing that resource becomes completely trivial. Before long you’ll be able to manage all of your machines from a central place, and build out new systems without any manual intervention.

Take Quality Breaks

As a sysadmin, its important to stay up to date on news and events. And as a human being, it’s important to stretch, get fresh air, keep in touch with friends, etc. So, instead of leaving reddit, slashdot, facebook and whatever else open all day long. Get up, go for a walk, get a drink, browse the internets and get it out of your system. Then, when you’re done, log out of that shit. Seriously, it’s really distracting.

Ask Others For Advice

That’s all I’ve got for now. What other time management practices work well for you? Please, leave a comment below, I’d love to hear your ideas and experiences.

What Could Have Saved Dropbox?

Tuesday, June 21st, 2011

Well, the cat is out of the bag. Dropbox has announced that they accidentally introduced (and then quickly remediated) a bug that had disabled authentication. This left the front door wide open to all of their 25+ million accounts for a few hours. Certainly a big mistake, but this makes me wonder. Is this the sort of thing that could only happen to Dropbox? I mean development happens at breakneck speed these days. With the common mindset of “release early and release often” a bug making its way into production is not surprising to me at all.

This got me thinking. Could this whole thing have been avoided? If development moves at a pace which makes it difficult to QA every detail, perhaps redundancy is the answer. What if, instead of relying solely on a username and password, additional factors were verified when users attempt to authenticate? Had this been the case, things could have played out much differently for Dropbox.

Thinking Outside the Password:

IP addresses If a user is trying to log in from an IP address that isn’t recognized, require them to verify their secret phrase, or to verify via e-mail, etc.

Cookies Similar to the IP address, if a user hasn’t logged in before from their current computer, browser, app, whatever, have them jump through hoops to verify their identity.

External Authentication Services There are many sites out there pushing their own auth services. Openid, google, and facebook, to name a few. Why not link the local account to one of these services to implement multi-factor auth, requiring both valid local and external credentials before permitting entrance.

Text Message When users register accounts, have them enter a mobile phone number. Then when they authenticate, text them a random number and require them to type it in.

Strange Behavior If you see a user trying to log in simultaneously from many different places make them, verify their account. If you see a single IP trying to reach into many different user accounts, block the IP and require those accounts to be verified. Etc.

All in all, I think this temporary security hole could have been avoided. Not by expecting developers to never make mistakes, but instead by embracing human error as a fact of life and implementing a layered authentication and security methodology.

Freeing Disk Space in Linux

Tuesday, June 14th, 2011

Did you know that most filesystems reserve a percentage of the available free space as an emergency reserve for when the disk becomes full? This is a great safety mechanism if you’re running critical applications or database, but in many cases all this reserved space winds up going to waste. Especially so in the case of today’s 2 & 3 Terabyte disks!

On a linux EXT filesystem, 5% is reserved for access only by the root user. Assuming you have a 2T disk this is approximately 100G reservation, which is total overkill if you ask me! Luckily it’s easy enough to adjust on-the-fly with the tune2fs command.

For this example I’m going to change the number of reserved blocks from the default of 5% to 1% on the filesystem mounted as /misc.

A simple df -h will show you free space minus the reserve.

[root@foo ~]# df -h | grep misc
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda5             789G  110G  639G  15% /misc

As you can see, there is 639G of free space on this filesystem.

Now, we use tune2fs -m 1 to change the percentage reserved to 1%

[root@foo ~]# tune2fs -m 1 /dev/sda5
tune2fs 1.39 (29-May-2006)
Setting reserved blocks percentage to 1% (2133291 blocks)

Now we check the available space

[root@foo ~]# df -h | grep misc
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda5             789G  110G  672G  14% /misc

Now we have 672G free. That’s 33G of storage we just got back for free!

Now, it may seem attractive to set this to 0% to squeeze the most possible free space from your drive, and under some circumstances this is perfectly safe to do. However, don’t set this to 0% if you have processes running as root that you want to continue running if the disk fills. Here’s a simple of thumb if you’re not sure. If you’ll loose irreplaceable data (or your job) if this filesystem goes casters up, don’t set to 0%.

Converting Windows Guests From VMWare ESX to KVM With Virtio Drivers

Friday, June 10th, 2011

The below steps were tested while pulling my hair out trying to migrate a Windows 2k3 guest from VMWare ESX to KVM managed by libvirt, hopefully this will save you from much windows related pain and suffering.

Prep the VM while it’s still running in VMware

Download MergeIDE.zip and run MergeIDE.bat. This is a really important step that will prepare the system to boot using an IDE driver when we bring it up in KVM. Skipping this step will likely result in blue screen errors and much frustration.

MergeIDE can be downloaded at
http://www.virtualbox.org/attachment/wiki/Migrate_Windows/MergeIDE.zip

Also, ensure that you know the local administrator password. This seems rather obvious but networking may not come up, so best to be prepared.

Now you’re ready to shut down and copy the vm.

Shutdown the VM and copy it’s vmdk

Gracefully shut down the VM. Once it’s down, ssh into your ESX server and navigate to the appropriate VMFS volume containing your VM.

[root@esx1 example-vm]# pwd
/vmfs/volumes/VMFS1/example-vm
[root@esx1 example-vm]# ls
example-vm-flat.vmdk    example-vm.vmdk  example-vm.vmx   
example-vm.nvram      example-vm.vmsd  example-vm.vmxf 
vmware-0.log  vmware-2.log  vmware-7.log  vmware.log
vmware-1.log  vmware-6.log  vmware-8.log

SCP the “flat” .vmdk file (or files if your machine has multiple virtual disks) from your ESX host to your KVM host.

[root@esx1 example-vm]# scp -pr example-vm-flat.vmdk kvm1:/etc/libvirt/images/
root@kvm1 password: 
example-vm-flat.vmdk           100%   20GB  14.6MB/s   23:20

Prepare the libvirt/KVM instance

Create an XML file to define the attributes of the virtual machine. You’ll likely want to copy things like the memory size, number of CPUs and MAC address from the vmware vmx file, in our case this is /vmfs/volumes/VMFS1/example-vm/example-vm.vmx.

In order to set up virtio we first set our primry disk as bus type ide and create a secondary disk with a bus type of virtio. This will allow us to boot from the existing OS without special drivers. From there windows will detect the secondary virtio SCSI adapter and prompt for driver installation. We’ll also attach a virtio guest drivers floppy while we’re at it.

Virtio guest driver iso and floppy images can be downloaded at
http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/

<domain type='kvm'>
  <name>example-vm</name>
  <uuid>5de881b6-7738-4c30-8f14-da203c1b09f5</uuid>
  <memory>2097152</memory>
  <currentMemory>2097152</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel5.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' cache='none'/>
      <source file='/etc/libvirt/images/example-vm/example-vm-flat.vmdk'/>
      <target dev='hda' bus='ide'/>
      <driver name='qemu' cache='none'/>
      <source file='/etc/libvirt/images/example-vm/virtio-temp.raw'/>
      <target dev='vdb' bus='virtio'/>
    </disk>
    <disk type='file' device='floppy'>
      <source file='/root/virtio-win-1.1.16.vfd'/>
      <target dev='fda' bus='fdc'/>
    </disk>
      <readonly/>
    </disk>
    <interface type='bridge'>
      <mac address='00:50:56:b9:4e:50'/>
      <source bridge='virbr0'/>
      <model type='virtio'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
  </devices>
</domain>

Note: if you need to generate a new uuid simply run uuidgen

Now you’re ready to “Define” the new virtual machine in libvirt.

[root@kvm1 ~]# virsh define /etc/libvirt/qemu/example-vm.xml 
Domain example-vm defined from /etc/libvirt/qemu/example-vm.xml

Boot the KVM guest

Using virt-manager, start the virtual machine. It should boot up and allow you to log in as a local administrator or with cached credentials. If you are seeing a bsod (blue screen) with error 0x000007b boot errors, verify that the MergeIDE step was successful. This resolved my 0x7b errors.

Install the windows virtio guest drivers

Upon logging in, windows should detect the new virtio SCSI and network hardware and prompt you for drivers. You’ll want to point it at the floppy drive we attached to the guest.

A very detailed description of what to do is available at
http://www.linux-kvm.org/page/WindowsGuestDrivers/viostor/installation#Non-System_disk_installation_procedure.

Change the primary disk from ide to virtio

Finally, once the virtio drivers have been successfully installed you’re ready to change your boot disk to a bus type of virtio and can remove the temporary second drive. This can be done by executing virsh edit example-vm while the host is running and changing the disk dev= and bus= as follows:

    <disk type='file' device='disk'>
      <driver name='qemu' cache='none'/>
      <source file='/etc/libvirt/images/example-vm/example-vm-flat.vmdk'/>
      <target dev='vda' bus='virtio'/>
    </disk>

You can also remove the driver floppy at this time.

Perform a full reboot of the guest

To make the above change take effect simply shut the machine down completely and then re-start it with virt-manager. Don’t simply issue a reboot from within the VM, we need to apply the changes to the libvirt domain definition.

You’re done!

Your VM should now be up and running, free from vmware and benefiting from the performance gains of virtio.

Mounting a File System on a Partition Inside of an LVM Volume

Thursday, May 12th, 2011

In my linux virtual environment I am using LVM volumes as the backing devices for virtual machines. Each of these LVM volumes contains a partition table splitting the LVM volume into at least one linux partition and one swap partition. In order to access these partitions from the dom0 host itself we can use the kpartx command to create device mapper entries which correspond to each of the partitions.

In this example we want to access the ext3 filesystem contained on the first partition of the “vm_example” logical volume.

 
[root@vm ~]# lvs
  LV              VG   Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  vm_example      vg0  -wi-ao  6.00G                                      
 
[root@vm ~]# fdisk -l /dev/vg0/vm_example 
 
Disk /dev/vg0/vm_example: 6442 MB, 6442450944 bytes
255 heads, 63 sectors/track, 783 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
              Device Boot      Start         End      Blocks   Id  System
/dev/vg0/vm_example1               1         653     5245191   83  Linux
/dev/vg0/vm_example2             654         783     1044225   82  Linux swap / Solaris

As you can see, partition 1 is type linux and partition 2 is type linux swap. Now we use kpartx with the -a flag to create the device mapper entries for the partitions displayed above.

[root@vm ~]# kpartx -a /dev/vg0/vm_example

And now we can interact with the /dev/mapper devices as you normally would to mount, fsck, etc.

 
[root@vm ~]# file -s /dev/mapper/vm_example1 
/dev/mapper/vm_example1: Linux rev 1.0 ext3 filesystem data (large files)
 
[root@vm ~]# file -s /dev/mapper/vm_example2
/dev/mapper/vm_example2: Linux/i386 swap file (new style) 1 (4K pages) size 261055 pages

Then, when you’re finished, clean up with the kpartx -d command. The logical volume will remain in-use until this is done.

[root@vm ~]# kpartx -d /dev/vg0/vm_example

Coping With Cloud Downtime

Thursday, April 21st, 2011

In light of the recent Amazon cloud service interruptions it seems like a good time to share some ideas about how to help keep cloud hosted services available during unexpected and potentially long lasting outages.

Each of these items can be implemented using free and open source software either hosted in your own datacenter or *gasp* in a different cloud.

Use Puppet instead of custom machine images.

Sure, custom machine images are useful for quickly lighting up lots of copies of the same system, but they fall down in many ways.

  • Applying/reverting changes to already running instances is a manual process.
  • No way to ensure that systems are in a consistent state.
  • Updating images is cumbersome, especially for minior config tweaks.
  • Machine instance versions are a pain to keep track of, and can sprawl out of control.
  • Dependency on specific cloud resources can make them difficult to access during an outage.
  • Small differences in system configs require a whole new machine image.

    A configuration management system like Puppet provides you ability to customize machine images, plus it avoids the above shortcomings and provides a number of extra features.

  • Automatic propagation of configuration changes to all relevent systems.
  • Regular checks are performed to ensure system consistency. If a difference is detected the necessary updates are made automatically.
  • Provides a single location and interface for system configuration.
  • Small differences between systems are easy to incorporate while using the same template.
  • Provides the ability to push a configuration change out to all systems right away.
  • There are no dependencies on a particular cloud technology.
  • Revision control is easy to hook in and lets you quickly revert changes that didn’t go quite as expected.
  • System configuration details are inherently documented in detail by the configuration manifests.

    Synchronize your data elsewere on a regular basis.

    Cloud storage offers some really compelling features and is very useful, don’t get me wrong. But I wouldn’t expect it to always be there. Especially if the backups and snapshots that you depend on for recovery are hosted with the same provider, running the same software, potentially inside the same physical facilities. After all, cloud storage is susceptible to the same problems as any other storage platform. It’s a good idea to keep backups outside the cloud (or at least inside a different cloud). The simpler the backup methodology, the better.

    Rsync, mylvmbackup, rsnapshot and rdiff-backup are great open source backup tools that are secure and are optimized for efficiency with bandwidth and on-disk size.

    Manage your systems from outside the cloud.

    If your infrastructre is a cloud and the running instances are aircraft, then configuration management and monitoring systems are ground control. If they were hosted in the same cloud as the systems they manage, game over.

    By separating your management infrastructure and your service delivery infrastructure you gain the ability to monitor and manage systems remotely. You can even quickly deploy replacement resources elsewhere using a config management system and then copy your most recent off-cloud backup up there to restore your database and web content.

    Optimize your DNS configuration.

    If things are looking really bad you can at least put up a maintenance page. That is if you can update your DNS and get the chages to propagate quickly. Ensuring the following ahead of time will save you many headaches in the event of a service outage.

  • Set your DNS TTL low. This makes your DNS updates take effect more quickly, reducing the number of error messages your users see.
  • Use as many DNS servers are you can get your hands on. Lots of DNS servers means that the chances of one or more of them having problems is less likely to affect you. After all, they may be using the same cloud resources that you are!

    Verify that you have a backup MX.

    While we’re on the subject of DNS, it’s a good idea to make sure you have a backup mail exchanger configured and defined as an MX for your domain. It doesn’t have to be anything fancy, just something to recieve and queue up mail until you’re able to get the primary mail system back online.

    Keep thinking and talking about it!

    The above ideas hopefully are a good starting point for protecting yourself against unexpected outages but it definitely doesn’t stop here.

    What practices have worked well for you or your company?

    What ideas do you have?

  • Ubuntu Lucid 10.04 Cobbler Kickstart Setup How To

    Thursday, February 17th, 2011

    Importing the ISO

    At the time of this writing the version of cobbler available for CentOS-5 via the EPEL repo was 2.0.3.1. This version doesn’t seem to include proper support for “breeds” other than redhat, allthough it is alluded to in the documentation. So, in order to import the ubuntu media, I had to perform a few extra manual steps.

    Copy the ISO contents to a directory that corresponds to the name of your distro

    In my case this is Ubunutu-10.04-x86_64. I created the directory /var/www/cobbler/ks_mirror/ and copied the entire contents of the mounted ubunutu lucid installer ISO to this location.

    To mount an iso file in linux the command is something like

    mount -o loop /path/to/lucid.iso /mnt/foo

    Creating the .treeinfo file manually

    Once the files have been copied, you should put a .treeinfo file in place too. Koan freaked out when this file wasn’t present when I attempted to perform a –virt install. I think this is safe to omit if you aren’t planning on using Koan.

    [general]
    family = Ubuntu
    timestamp = 1272326522.13
    totaldiscs = 1
    version = 10.04
    discnum = 1
    packagedir = dists
    arch = x86_64
     
    [images-x86_64]
    kernel = install/netboot/ubuntu-installer/amd64/linux
    initrd = install/netboot/ubuntu-installer/amd64/initrd.gz

    Creating the Distro

    Now, it’s time to create the cobbler distro manually. This is straightforward enough, just double check all your paths and names to make sure evertyhing matches up.

    Here’s what mine looks like:

    [root@cobbler cobbler]# cobbler distro report --name="Ubuntu-10.04-x86_64"
    Name                           : Ubuntu-10.04-x86_64
    Architecture                   : x86_64
    Breed                          : generic
    Comment                        : 
    Initrd                         : /var/www/cobbler/ks_mirror/Ubuntu-10.04-x86_64/install/netboot/ubuntu-installer/amd64/initrd.gz
    Kernel                         : /var/www/cobbler/ks_mirror/Ubuntu-10.04-x86_64/install/netboot/ubuntu-installer/amd64/linux
    Kernel Options                 : {}
    Kernel Options (Post Install)  : {}
    Kickstart Metadata             : {'tree': 'http://@@http_server@@/cblr/links/Ubuntu-10.04-x86_64'}
    Management Classes             : []
    OS Version                     : generic26
    Owners                         : ['admin']
    Red Hat Management Key         : <<inherit>>
    Red Hat Management Server      : <<inherit>>
    Template Files                 : {}

    In my case I specificed a Breed of “generic” and an OS Version of “generic26″. Here’s hoping that this gets sorted out formally and makes its way through the EPEL repo soon.

    Creating the Ubuntu 10.04 Lucid Kickstart Template

    Here’s an example kickstart that is working for automated installes of Ubunutu 10.04 Lucid. It’s not 100% dynamic, but it works well.

    #/var/lib/cobbler/kickstarts/lucid.ks
    #
     
    #System language
    lang en_US
     
    #Language modules to install
    langsupport en_US
     
    #System keyboard
    keyboard us
     
    #System mouse
    mouse
     
    #System timezone
    timezone America/New_York
     
    #Root password
    rootpw --iscrypted $default_password_crypted
     
    #Initial user
    user --disabled
     
    #Reboot after installation
    reboot
     
    #Use text mode install
    text
     
    #Install OS instead of upgrade
    install
     
    # Use network installation
    url --url=$tree
     
    #System bootloader configuration
    bootloader --location=mbr 
     
    #Clear the Master Boot Record
    zerombr yes
     
    #Partition clearing information
    clearpart --all --initlabel 
     
    #Disk partitioning information
    part swap --size 512
    part / --fstype ext3 --size 1 --grow 
     
    #System authorization infomation
    auth  --useshadow  --enablemd5
     
    #Network information
    network --bootproto=dhcp --device=eth0
     
    #Firewall configuration
    firewall --enabled --trust=eth0 --ssh 
     
    #Do not configure the X Window System
    skipx
     
    %pre
     
    #services
    services --enabled=ntpd,nscd,puppet
     
    #Package install information
    %packages
    ubuntu-standard
    man-db
    wget
    postfix
    openssh-server
    sysstat
    nfs-common
    nscd
    postfix
    quota
    ntp
    puppet
     
    %post
    #raw
    sed -i 's/no/yes/' /etc/default/puppet 
    #end raw

    Creating the Profile

    Creating the profile is much like creating the distro, matching up the vairables. I’ve also specified some extra kernel options to output the installer to the serial console (ttyS0) which I find preferable for –virt installs.

    [root@cobbler cobbler]# cobbler profile report --name="Ubuntu-10.04-x86_64"
    Name                           : Ubuntu-10.04-x86_64
    Comment                        : 
    DHCP Tag                       : default
    Distribution                   : Ubuntu-10.04-x86_64
    Enable PXE Menu?               : False
    Kernel Options                 : {'text': '~', 'console': ['tty0', 'ttyS0,9600n8'], 'nofb': '~'}
    Kernel Options (Post Install)  : {'console': ['tty0', 'ttyS0,9600n8'], 'nofb': '~'}
    Kickstart                      : /var/lib/cobbler/kickstarts/lucid.ks
    Kickstart Metadata             : {}
    Management Classes             : []
    Name Servers                   : []
    Name Servers Search Path       : []
    Owners                         : ['admin']
    Parent Profile                 : 
    Red Hat Management Key         : <<inherit>>
    Red Hat Management Server      : <<inherit>>
    Repos                          : ['ubuntu-lucid-x86_64']
    Server Override                : <<inherit>>
    Template Files                 : {}
    Virt Auto Boot                 : 0
    Virt Bridge                    : br345
    Virt CPUs                      : 1
    Virt File Size(GB)             : 5
    Virt Path                      : 
    Virt RAM (MB)                  : 512
    Virt Type                      : qemu

    Note: I manually created an apt-mirror named ubunutu-lucid-x86_64. You could use any other ubuntu repo you’d like, local or remote.

    Notes & Future Improvements

    This should get you going in the right direction towards automating your ubuntu builds with a cobbler install sourcesd from EPEL. Hopefully soon a fix for this will make its way into the EPEL repos.

    Finding a MAC Address in VMware ESX

    Wednesday, February 16th, 2011

    Sometimes you just have to trace a system down by its MAC address. It could be a security incident, an abuse complaint or perhaps a long forgotten legacy system. Whatever it is, you don’t have much info to work with, but you do have a hardware address. Sadly, VMware doesn’t seem to have an easy way to search for a host by its MAC address. And filtering by corresponding IP address in the VI client only works if your VMware tools are installed and working. Which, after all, probably isn’t the case if all you know about a machine is its MAC address. Luckily with root shell access to the ESX hosts you can force a MAC address search easily enough.

    The following one-liner will search the VMFSes presented to the host for a given string, for our purposes it’s a MAC address. In many cases running this search from one ESX node will effectively search the whole cluster, because they all share the same VMFS datastores.

    [root@esx1 root]$ find /vmfs/volumes | grep .vmx$ | while read i; do \
                      grep -i "00:50:56:b9:79:70" "$i" && echo "$i"; done
    ethernet0.generatedAddress = "00:50:56:b9:79:70"
    /vmfs/volumes/49358dcc-139b80f0-2d98-001ec9cf6a91/FOOVM/FOOVM.vmx

    What the above script does is grep for the mac address 00:50:56:b9:79:70 in all files ending with .vmx in /vmfs/volumes. If a match is found, the full path to that vmx file is printed to the screen and from there you can glean the name and location of this formerly elusive virtual machine.