Friday, September 30, 2011

Solaris10 u10 - lucreate failures

After several successful upgrades to the currently released Solaris10 u10 I've got an issue:
lucreate failed to prepare alternative boot environment with an error like:
Mounting ABE .
ERROR: mount: /zones/myzone-dataset1/legacy: No such file or directory
ERROR: cannot mount mount point 

and several warnings like:
WARNING: Directory zone lies on a filesystem shared between BEs, remapping path to .

Hmm , OK I don't need these filesystems  mounted, but can safely mount them  (at least temporary) if it will solve the problem. New mountpoint is set and lucreate successfully finished with warnings only.
The box is not in critical environment and warnings were ignored - BIG MISTAKE -

!!! Do not ignore WARNINGS during lucreate !!!

But anyway, upgrade finished successfully, new BE activated, init 6 ...

Server started, but two zones failed to start ... (there are other zones on the server that booted without issues)
Attempt to boot affected zone resulted in multiple complains about filesystems that are not "legacy" mounted in global zone ...
Hmm ... Looking at zonecfg -z myzone export info and see bunch of
  add fs
  set dir=....
additionally to correctly defined
 add dataset
 set name=... 
Fixing zone config by removing all fs records that shouldn't be there , another attempt to boot to figure out that system is trying to boot zone from zonepath=/zones/myzone-sol10u10 ( instead of /zones/myzone )

Checking real status of filesystems and fixing zoneconfig again,
but "Zone myzone already installed; set zonepath not allowed."

Not allowed but can be done by editing /etc/zones/myzone.xml and /etc/zones/index ( Don't forget to backup current files ... )

It looks much better now- all zones are up and running ...

But lucreate is still broken and failing on attempt to create new BE.
Looks like a bug in live upgrade. Search shows the same issue in this thread . Currently there are no updates for patches 121431(x86) and 121430(sparc), double checking and filing the bug.

After a long conversation with oracle I was able to confirm that there is a bug in the current LU suite ( Patch 121431-67 ). Solution is  simple - downgrade  LU to 121431-58.
In case the old version of  LU is not backed up - just install the original one from the Solaris media.

Tuesday, September 20, 2011

Dell PERC controllers and Solaris

By default Solaris doesn't include tools for monitoring and management of Dell RAID adapters, but most  of this card  ( PERC H700, 6/i ... ) are re-branded LSI controllers.
Even if the adapter is used in a materialistic config ( almost JBOD ) and RAID functionality is delegated to ZFS I'd prefer to have at least some visibility on the state of the card ( battery, memory ... )

Solaris 10 is using mega_sas ( LSI ) drivers, so for configuration , monitoring, etc ...  you can safely use MegaCli utility which can be downloaded form LSI support site.

Not sure if it will be officially supported by Dell or Oracle, but it works -  personally tested on H700 and 6i - just make sure that you are running it using root privileges.

As a monitoring tool  -  raid-monitor  can be used with Xymon .  It generates an alert if current state differs from generated "good" reference-file.