Friday, September 30, 2011

Solaris10 u10 - lucreate failures

After several successful upgrades to the currently released Solaris10 u10 I've got an issue:
lucreate failed to prepare alternative boot environment with an error like:
...
Mounting ABE .
ERROR: mount: /zones/myzone-dataset1/legacy: No such file or directory
ERROR: cannot mount mount point 
 ...

and several warnings like:
WARNING: Directory zone lies on a filesystem shared between BEs, remapping path to .

Hmm , OK I don't need these filesystems  mounted, but can safely mount them  (at least temporary) if it will solve the problem. New mountpoint is set and lucreate successfully finished with warnings only.
The box is not in critical environment and warnings were ignored - BIG MISTAKE -

!!! Do not ignore WARNINGS during lucreate !!!

But anyway, upgrade finished successfully, new BE activated, init 6 ...

Server started, but two zones failed to start ... (there are other zones on the server that booted without issues)
Attempt to boot affected zone resulted in multiple complains about filesystems that are not "legacy" mounted in global zone ...
Hmm ... Looking at zonecfg -z myzone export info and see bunch of
  add fs
  set dir=....
additionally to correctly defined
 add dataset
 set name=... 
Fixing zone config by removing all fs records that shouldn't be there , another attempt to boot to figure out that system is trying to boot zone from zonepath=/zones/myzone-sol10u10 ( instead of /zones/myzone )

Checking real status of filesystems and fixing zoneconfig again,
but "Zone myzone already installed; set zonepath not allowed."

Not allowed but can be done by editing /etc/zones/myzone.xml and /etc/zones/index ( Don't forget to backup current files ... )

It looks much better now- all zones are up and running ...

But lucreate is still broken and failing on attempt to create new BE.
Looks like a bug in live upgrade. Search shows the same issue in this thread . Currently there are no updates for patches 121431(x86) and 121430(sparc), double checking and filing the bug.

Update:
After a long conversation with oracle I was able to confirm that there is a bug in the current LU suite ( Patch 121431-67 ). Solution is  simple - downgrade  LU to 121431-58.
In case the old version of  LU is not backed up - just install the original one from the Solaris media.


2 comments:

Zafar Nasir said...

We've run into the same exact issue. I was able to go from solaris 10 update 8 to 10 without any issues using the 121431-74 patch.
doing the same steps on solaris 10 update 9 regardless of the patch or not gives us issues. In our case its as if lucreate is reading the zfs history and trying to create filesystems that don't really exist.

Alex Levin said...

Zafas, thanks for the comment and reminder that I should update the article with the solution. :)