Wednesday, November 28, 2012

temptrak rrd create

just in case i forget...
rrdtool create temp.rrd --step 3600 \
DS:probe1:GAUGE:300:U:U \ 
DS:probe2:GAUGE:300:U:U \
let's make it granular
rrdtool create temp.rrd \ 
--start N --step 300 \
DS:probe1:GAUGE:600:55:95 \ 
DS:probe2:GAUGE:600:55:95 \
RRA:MIN:0.5:12:1440 \
RRA:MAX:0.5:12:1440 \ 
let's do something really basic
rrdtool create temp.rrd \
--start N --step 60 \
DS:probe1:GAUGE:300:U:U \
DS:probe2:GAUGE:300:U:U \
DS:probe3:GAUGE:300:U:U \
DS:probe4:GAUGE:300:U:U \
RRA:AVERAGE:0.5:1:576 \
RRA:AVERAGE:0.5:6:576 \
RRA:AVERAGE:0.5:24:576 \
RRA:AVERAGE:0.5:144:576 \

Tuesday, November 27, 2012

aix 6.1 odm fun

trying to ssh userwithlongname@aixhost fails. when i su - userwithlongname i get this on AIX 6.1:

3004-503 Cannot set process credentials

# pam.conf
sshd auth   required    /usr/lib/security/pam_aix use_new_state use_first_pass 
sshd account      required    /usr/lib/security/pam_aix 
sshd password     required    /usr/lib/security/pam_aix 
sshd session      required    /usr/lib/security/pam_aix 
# /etc/ssh/sshd_config
uncomment the UsePAM line and change UsePAM = no to UsePAM = yes.
# chsec -f /etc/nscontrol.conf -s authorizations -a secorder=files,LDAP
# lsattr -El sys0
shows system variables in the ODM database.
# chdev -l sys0 -a max_logname=30
did it work?*
# getconf LOGIN_NAME_MAX
# nfso -p -o nfs_use_reserved_ports=1
* Why?

because sometimes you have users with groups and names longer than 8 characters.
if so, if their primary GID is one of those groups, or if their uids are longer than 8 characters, no logon.
first hint... tried to su as a user, only first 8 characters shown.
did an lsgroup and the group did not exist.
did an lsgroup ALL and saw that the LDAP group had no content.


Friday, November 16, 2012

Thursday, November 15, 2012

aix sshd install

after rpm (openssl installed, yes) hell, you decide to torture yourself more with sshd... quick & dirty:
# cd /tmp
# wget
# mkdir openssl. && cd openssl. && uncompress -c < ../openssl. |tar -xvf - && installp -acXYgd . openssl
gen your keys:
# cd /etc/ssh
# ssh-keygen -t rsa
then edit /etc/ssh/sshd_confg to suit, and issue:
# stopsrc -g ssh ; startsrc -g ssh

Wednesday, November 14, 2012

solaris 10 statd death

statd problems galore in /var/adm/messages:
Nov 11 06:06:66 localhost statd[262]: [ID 766906 daemon.warning] statd: cannot talk to statd at nastynfsserver, RPC: Timed out(5)
# ps -eaf | fgrep statd 
  daemon 16000 17000   0 13:13:13 ?           0:00 /usr/lib/nfs/statd
    root 16000 17500   0 14:14:14 pts/13      0:00 fgrep statd

# svcs -a | grep "nfs/status"
online          13:13:13 svc:/network/nfs/status:default

# svcadm -v disable nfs/status
svc:/network/nfs/status:default disabled.

# ls /var/statmon/sm.bak

# rm /var/statmon/sm.bak/nastynfsserver

# svcadm -v enable nfs/status
svc:/network/nfs/status:default enabled.
if fgrep is not your friend, grep'll do:
ps -ef |grep -v grep |grep statd

debugging solaris 10 ssh daemon

on solaris 10 i had a problem. it bugged me off and on for like a week.

it was like this:

ldap user on a solaris 10 box with a pubkey or without a pubkey was unable to ssh to other systems, be they solaris or otherwise. this was the case for all zillion solaris 10 sparc and x86 systems i have. not so for solaris 9. and nope for solaris 11.

first i thought there was something amiss with the user's ssh directory. maybe it was the perms on the mount. hell. maybe it was an issue then with the ldap record. the ssh daemons? time to debug...
localhost # /usr/lib/ssh/sshd -p 2222 -Dddd
localhost ~ ssh -vvv -l notme -p 2222 localhost
little did i know, it was not a problem with:
login auth sufficient
nor an issue with:
Host *
   StrictHostKeyChecking no
or even:
#ListenAddress ::
no no.

it was the existence of this wickedness:
localhost notme ~ .sunw
i don't care what that directory holds, it makes my systems puke:
localhost # cp -r /notme/.sunw /notme/.sunw.crap
localhost # rm -rf /notme/.sunw ; mkdir /notme/.sunw
localhost # chmod ugo-rwx /notme/.sunw
localhost # la -al /notme/ |grep .sunw*
drwxrwxr-x   5 notme    notme          4096 Nov 13 13:31 .sunw.crap
d---------   2 notme    notme          4096 Nov 13 13:31 .sunw

Monday, November 12, 2012

solaris 11 ldap client kick start

There's nothing more depressing than when you've got a console going and you see this course by when you do a warm restart of your Solaris 11 box:
svc.startd[44]: libsldap: Status: 2  Mesg: Unable to load configuration '/var/ldap/ldap_client_file' ('').
Say it ain't so. But it is.

Sadly, I've given up and trying to figure out what's wrong, because really, nothing's wrong at all. What'd I've done is throw in a kludge, sort of like what I used to have to do on Solaris 8, 9 and 10, to get my ldap clients running. Here's what I did:
  • Place a script in /etc/init.d and...
  • Place a symlink to said script in /etc/rc3/d.

    First get those ldap services running:
    # set up ldap
    svcadm enable network/ldap/client:default
    svcadm enable network/nis/domain
    svcadm enable dns/client
    svcadm refresh name-service/switch
    svcadm enable -r nfs/client
    Symlink it:
    # ln -s /etc/init.d/ /etc/rc3.d/S99svc-start-ldapclient
    That was easy.
  • solaris 11 client nfs gone missing

    Solaris 11 is all new all the time. One thing that's sort of annoying or mystifying is why, after booting, my zones just decide to skip out on the whole mounting of nfs exports even though they are defined in /etc/vfstab. That's okay. I don't mind creating a cron job:
    if [ $(mount| grep 'nfsserver' | grep -v grep | wc -l | tr -s "\n") -eq 0 ]; then mount -a ; fi 2>&1
    Oh, and I'm okay with running it every five minutes in crontab.
    0,5,10,15,20,25,30,35,40,45,50,55 * * * * /root/scripts/ 

    Tuesday, November 6, 2012

    solaris 10 forcefully shutdown a zone

    In my notes this is marked: "killzonekill".

    That being said...

    Sometimes my zones on Solaris 10 refuse to shut down. This could be for a variety of reasons. A tell-tale sign is, say after 1day, you see this:
    [root@bigsystem ~]# zoneadm -z soxvm218 shutdown
    ... 24 hours later ...
    [root@bigsystem ~]# zoneadm list -civ 
      18 soxvm218       shutting_down /opt/zones/soxvm218          native   shared

    Well hell. Maybe there be zombies.
    [root@bigsystem ~]# ps -fz soxvm218
         UID   PID  PPID   C    STIME TTY         TIME CMD
        root  1619     1   0 21:56:00 ?           0:00 zsched
     0003088  4486     1   0        - ?           0:00 defunct
    Yeah. defunct that's no fun.

    You try the usual:

    [root@bigsystem ~]# zoneadm -z zonename unmount -f
    [root@bigsystem ~]# zoneadm -z zonename reboot -- -s 
    [root@bigsystem ~]# pkill -9 -z zonename

    In that case, do some kill -9 action. Programmatically:
    for i in `ps -lLef | grep defunct |grep -v grep | awk '{print $4}'`
                     echo "Killiing Process..pidno= $i" ; sleep 1 
                      kill -9 $i ; sleep 5; 
    Yeah. That does it every time.