Monday, September 24, 2012

ulimits & confluence


I have a machine.  I actually have many machines.  This specific machine runs a daemon, let's call it Atlassian Confluence, just for fun.  The daemon is run by a user, let's call it senhorcrap. This user is in a little jail, no ssh, no nothing.

I get a note from an enduser saying something to the effect of:
what the fark is going on with your farking website it is farking down.

I respond:
really?

Actually he said:
hey, i've gone a 500 error and then a few minutes ago i saw this:

Service Temporarily Unavailable. The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.


I responded with:
not again.

Not again. Before I'd lazily restart the service and the world would be good. Not this time.

And a sick stack trace later...

Looking at the logs (we always look at the logs) I found it was a open file error.  Too many of them were open. Interesting.  Well.  There are limits to these things to prevent system resource exhaustion.
# tail -f -n 30 /home/senhorcrap/senhorcraps-home/logs/atlassian-confluence.log
Then I tried to gracefully stop the service. Then I just killed it by sweeping it away with a script I have on this blog.
# killsomething
# ps aux |grep confluence
Not there.  Nice.
# su - senhorcrap
# ulimit -aS | grep open
1024

# lsof |wc
2044
Uh.

As root... I edited /etc/security/limits.conf , /etc/pam.d/login , /etc/profile

/etc/security/limits.conf
senhorcrap      soft    nofile          1024
senhorcrap      hard    nofile          4096
/etc/pam.d/login
session    required   pam_limits.so
/etc/profile
if [ $USER = "senhorcrap" ]; then
        if [ $SHELL = "/bin/bash" ]; then
              ulimit -n 4096
        fi
fi
Once I su'd as senhorcrap I checked my limits, and all was well.
I started my daemon and the system was fine. Doing the "Windows refresh" wasn't required.

...

What I did not write was it took me a goodly long time to figure out I needed the soft and hard limits in limits.conf to work.  And that those limits have to be divisible by 1024.  And the new limits would only take effect on new processes (daemons) after the fact; thus I had to kill confluence. But, we don't talk about that. A note before you start to sneeze bs all over me. YES hard alone should work. In this instance, it did not. And I got mad. Well, as only as mad as a sysadmin can be, which is not really mad at all.

Tuesday, September 18, 2012

vmware esx 5 excitement + ghettoVCB


I have to backup a vm, but I don't have the VMware extensions.  What to do?  Use ghettoVCB, of course.  that's fine, but the deal with VMWare is that a lot of stuff is just plain ephemeral.

My environment is pretty simple.  I have an ESX 5 box with two NICs.  One is connected to the prod network, the other to a private storage network.  The priv net has a server with an NFS export where I can drop stuff from the ESX box.

I've got the NFS export mounted on my ESX box as /vm-repo .  Via shell, it is located here:
/vmfs/volumes/vm-repo/

I've decided to use NFS as opposed to iSCSI since I am able to access the data and not have the partition formatted as vmfs.  There are drawbacks to both, but for my purposes here, NFS works best.  On the directory have placed ghettoVCB and a few more scripts.

Okay.

Luckily, /etc/rc.local survives between boots on an ESX 5 machine.  I've added the following:

# boot vm
for i in $(vim-cmd vmsvc/getallvms|cut -f1 -d" "| grep -v Vmid); do vim-cmd vmsvc/power.on $i; sleep 10; done

# allow smtp through firewall
cp /vmfs/volumes/vm-repo/smtp.xml /etc/vmware/firewall/
esxcli network firewall refresh

# fix root cron
echo "0 0,6,18 * * * /vmfs/volumes/vm-repo/tools/ghettoVCB/ghettoVCB.sh -a" >> /var/spool/cron/crontabs/root

boot vm
This iterates through vms on my ESX box and starts them.  This only happens at boot time.  This is an issue because ESX no longer does an auto-start.

allow smtp
ESX does not have a nice clickable GUI where I can let SMTP go through.  I want SMTP traffic to be sent by the system since I want to know what...

fix root cron
does.  This calls the ghettoVCB script which creates a full backup of my VMs at midnight, 6am and 6pm.

Yay.  Now my systems auto-start, I have backups and I get a report.  Life is grand.

links
ghettoVCB http://communities.vmware.com/docs/DOC-8760
ghettoVCB-restore http://communities.vmware.com/docs/DOC-10595
smtp hint http://www.vladan.fr/how-to-change-default-ssh-port-on-esxi-5-and-make-the-change-persistent-after-reboot/
rc.local hint http://communities.vmware.com/thread/217704
vm restart http://blogs.vmware.com/vsphere/2012/03/free-esxi-hypervisor-auto-start-breaks-with-50-update-1.html

Wednesday, September 12, 2012

Tuesday, September 4, 2012

solaris 11, i weep

solaris11!

why have you cast aside the simplicity of solaris 10? what did i ever do to you? were you taunted as a child for boasting your sysv lineage? don't you just want to get back to your bsd roots? embrace unics, solaris 11. look what happened to your friends aix and hpux. no one really likes them, not really. all the kids look to debian derivatives for cool awesomeness. you had hope solaris 11, you really did. and debuting on armistice day, that was cool. i was quiet for two minutes. i was. forget this mean oracle branding. please?