ataraxia: November 2012

Wednesday, November 28, 2012

temptrak rrd create

just in case i forget...

rrdtool create temp.rrd --step 3600 \
DS:probe1:GAUGE:300:U:U \ 
DS:probe2:GAUGE:300:U:U \
RRA:AVERAGE:0.5:1:576

let's make it granular

rrdtool create temp.rrd \ 
--start N --step 300 \
DS:probe1:GAUGE:600:55:95 \ 
DS:probe2:GAUGE:600:55:95 \
RRA:MIN:0.5:12:1440 \
RRA:MAX:0.5:12:1440 \ 
RRA:AVERAGE:0.5:1:1440

let's do something really basic

rrdtool create temp.rrd \
--start N --step 60 \
DS:probe1:GAUGE:300:U:U \
DS:probe2:GAUGE:300:U:U \
DS:probe3:GAUGE:300:U:U \
DS:probe4:GAUGE:300:U:U \
RRA:AVERAGE:0.5:1:576 \
RRA:AVERAGE:0.5:6:576 \
RRA:AVERAGE:0.5:24:576 \
RRA:AVERAGE:0.5:144:576 \
RRA:AVERAGE:0.5:288:576

Tuesday, November 27, 2012

aix 6.1 odm fun

trying to ssh userwithlongname@aixhost fails. when i su - userwithlongname i get this on AIX 6.1:

3004-503 Cannot set process credentials

What?

# pam.conf
sshd auth   required    /usr/lib/security/pam_aix use_new_state use_first_pass 
sshd account      required    /usr/lib/security/pam_aix 
sshd password     required    /usr/lib/security/pam_aix 
sshd session      required    /usr/lib/security/pam_aix

# /etc/ssh/sshd_config
uncomment the UsePAM line and change UsePAM = no to UsePAM = yes.

# chsec -f /etc/nscontrol.conf -s authorizations -a secorder=files,LDAP

# lsattr -El sys0
shows system variables in the ODM database.

# chdev -l sys0 -a max_logname=30

did it work?*

# getconf LOGIN_NAME_MAX
30

yeah.

# nfso -p -o nfs_use_reserved_ports=1

* Why?

because sometimes you have users with groups and names longer than 8 characters.
if so, if their primary GID is one of those groups, or if their uids are longer than 8 characters, no logon.
first hint... tried to su as a user, only first 8 characters shown.
did an lsgroup and the group did not exist.
did an lsgroup ALL and saw that the LDAP group had no content.

neat.

Friday, November 16, 2012

aix installed packages

What do I have installed. AIX 6.1, tell me...

# lslpp -L

# lslpp -l

Thursday, November 15, 2012

aix sshd install

after rpm (openssl installed, yes) hell, you decide to torture yourself more with sshd... quick & dirty:

# cd /tmp
# wget http://sourceforge.net/projects/openssh-aix/files/openssh-aix61/openssh_5.2p1_aix61.tar.Z/download
# mkdir openssl.0.9.8.1103 && cd openssl.0.9.8.1103 && uncompress -c < ../openssl.0.9.8.1103.tar.Z |tar -xvf - && installp -acXYgd . openssl

gen your keys:

# cd /etc/ssh
# ssh-keygen -t rsa

then edit /etc/ssh/sshd_confg to suit, and issue:

# stopsrc -g ssh ; startsrc -g ssh

Wednesday, November 14, 2012

solaris 10 statd death

statd problems galore in /var/adm/messages:

Nov 11 06:06:66 localhost statd[262]: [ID 766906 daemon.warning] statd: cannot talk to statd at nastynfsserver, RPC: Timed out(5)

# ps -eaf | fgrep statd 
  daemon 16000 17000   0 13:13:13 ?           0:00 /usr/lib/nfs/statd
    root 16000 17500   0 14:14:14 pts/13      0:00 fgrep statd

# svcs -a | grep "nfs/status"
online          13:13:13 svc:/network/nfs/status:default

# svcadm -v disable nfs/status
svc:/network/nfs/status:default disabled.

# ls /var/statmon/sm.bak
nastynfsserver

# rm /var/statmon/sm.bak/nastynfsserver

# svcadm -v enable nfs/status
svc:/network/nfs/status:default enabled.

NB:
if fgrep is not your friend, grep'll do:

ps -ef |grep -v grep |grep statd

debugging solaris 10 ssh daemon

on solaris 10 i had a problem. it bugged me off and on for like a week.

it was like this:

ldap user on a solaris 10 box with a pubkey or without a pubkey was unable to ssh to other systems, be they solaris or otherwise. this was the case for all zillion solaris 10 sparc and x86 systems i have. not so for solaris 9. and nope for solaris 11.

first i thought there was something amiss with the user's ssh directory. maybe it was the perms on the mount. hell. maybe it was an issue then with the ldap record. the ssh daemons? time to debug...

localhost # /usr/lib/ssh/sshd -p 2222 -Dddd
localhost ~ ssh -vvv -l notme -p 2222 localhost

little did i know, it was not a problem with:

/etc/pam.conf
login auth sufficient         pam_ldap.so.1

nor an issue with:

/etc/ssh/ssh_conf
Host *
   StrictHostKeyChecking no
   UserKnownHostsFile=/dev/null

or even:

/etc/ssh/sshd_conf
#ListenAddress 0.0.0.0
#ListenAddress ::

no no.

it was the existence of this wickedness:

localhost notme ~ .sunw

i don't care what that directory holds, it makes my systems puke:

localhost # cp -r /notme/.sunw /notme/.sunw.crap
localhost # rm -rf /notme/.sunw ; mkdir /notme/.sunw
localhost # chmod ugo-rwx /notme/.sunw
localhost # la -al /notme/ |grep .sunw*
drwxrwxr-x   5 notme    notme          4096 Nov 13 13:31 .sunw.crap
d---------   2 notme    notme          4096 Nov 13 13:31 .sunw

Monday, November 12, 2012

solaris 11 ldap client kick start

There's nothing more depressing than when you've got a console going and you see this course by when you do a warm restart of your Solaris 11 box:

svc.startd[44]: libsldap: Status: 2  Mesg: Unable to load configuration '/var/ldap/ldap_client_file' ('').

Say it ain't so. But it is.

Sadly, I've given up and trying to figure out what's wrong, because really, nothing's wrong at all. What'd I've done is throw in a kludge, sort of like what I used to have to do on Solaris 8, 9 and 10, to get my ldap clients running. Here's what I did:

Place a script in /etc/init.d and...

Place a symlink to said script in /etc/rc3/d.

First get those ldap services running:

#!/bin/sh

# set up ldap
svcadm enable network/ldap/client:default
svcadm enable network/nis/domain
svcadm enable dns/client
svcadm refresh name-service/switch
svcadm enable -r nfs/client

exit

Symlink it:

# ln -s /etc/init.d/svc-start-ldapclient.sh /etc/rc3.d/S99svc-start-ldapclient

That was easy.

solaris 11 client nfs gone missing

Solaris 11 is all new all the time. One thing that's sort of annoying or mystifying is why, after booting, my zones just decide to skip out on the whole mounting of nfs exports even though they are defined in /etc/vfstab. That's okay. I don't mind creating a cron job:

if [ $(mount| grep 'nfsserver' | grep -v grep | wc -l | tr -s "\n") -eq 0 ]; then mount -a ; fi 2>&1

Oh, and I'm okay with running it every five minutes in crontab.

0,5,10,15,20,25,30,35,40,45,50,55 * * * * /root/scripts/script.sh

Tuesday, November 6, 2012

solaris 10 forcefully shutdown a zone

In my notes this is marked: "killzonekill".

That being said...

Sometimes my zones on Solaris 10 refuse to shut down. This could be for a variety of reasons. A tell-tale sign is, say after 1day, you see this:

[root@bigsystem ~]# zoneadm -z soxvm218 shutdown

... 24 hours later ...

[root@bigsystem ~]# zoneadm list -civ 

  18 soxvm218       shutting_down /opt/zones/soxvm218          native   shared

Well hell. Maybe there be zombies.

[root@bigsystem ~]# ps -fz soxvm218
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root  1619     1   0 21:56:00 ?           0:00 zsched
 0003088  4486     1   0        - ?           0:00 defunct

Yeah. defunct that's no fun.

You try the usual:

[root@bigsystem ~]# zoneadm -z zonename unmount -f
[root@bigsystem ~]# zoneadm -z zonename reboot -- -s 
[root@bigsystem ~]# pkill -9 -z zonename

Nada.

In that case, do some kill -9 action. Programmatically:

for i in `ps -lLef | grep defunct |grep -v grep | awk '{print $4}'`
               do 
                 echo "Killiing Process..pidno= $i" ; sleep 1 
                  kill -9 $i ; sleep 5; 
               done

Yeah. That does it every time.