in related news to my previous post:
http://storagefoo.blogspot.com/2007/09/vmware-over-nfs.html
Tuesday, May 20, 2008
ESX, OS X and NFS (or, ne'er the twain shall meet)
for the Googlers: after a week of troubleshooting problems using an OS X XServe (acting as a temporary NAS head to an XSan while we wait for our NetApp to arrive) as NFS server to some ESX-3.5 clients, it looks like this is the culprit there as well (referring to this other post about similar errors with a Fedora Core 6 client and Tru64 server). Symptoms are that when using esxcfg-nas(8) to mount NFS shares, the shares will mount, and files can be accessed if you know the full path name. Any attempts to ls(1) a directory or file fail with "Function not implemented" - _almost_ as if the execute bit had been removed from the directories in question. However, if the same share is accessed via the service console via e.g. "mount -t nfs ...", we get the normal behavior we'd expect from a functioning NFS client.
There appear to be two problems here:
1) tcpdump on the ESX service console will not monitor vmkernel traffic (and NFS traffic goes over the vmkernel port, if the mount was initiated by esxcfg-nas; if initiated by mount(8) on the service console, the NFS traffic runs over the service console port and can be sniffed out with e.g. 'tcpdump -nlei vswif0'). I'm pretty sure there's a way to do this, but I haven't sorted out the details yet. However, the real problem is:
2) OS X NFS server (as of 10.5 Leopard) apparently does not grok readdirplus() - after comparing tcpdump outputs on the server side between ls(1) attempts on the share mounted via mount(8) and esxcfg-nas(8), I finally noticed that the former was issuing readdir(), while the latter was using readdirplus().
As of ESX-3.5 (requirement introduced in ESX-3.0 I believe), nfsv3 over TCP is the only supported transport mechanism, so I can't just drop back to nfsv2 to get around the issue. I'm now looking into whether there's a secret knob in OS X that will make it parse readdirplus() properly, but I'm not optimistic. Perhaps the upcoming 10.5.3 update will fix the problem.
*sigh*
There appear to be two problems here:
1) tcpdump on the ESX service console will not monitor vmkernel traffic (and NFS traffic goes over the vmkernel port, if the mount was initiated by esxcfg-nas; if initiated by mount(8) on the service console, the NFS traffic runs over the service console port and can be sniffed out with e.g. 'tcpdump -nlei vswif0'). I'm pretty sure there's a way to do this, but I haven't sorted out the details yet. However, the real problem is:
2) OS X NFS server (as of 10.5 Leopard) apparently does not grok readdirplus() - after comparing tcpdump outputs on the server side between ls(1) attempts on the share mounted via mount(8) and esxcfg-nas(8), I finally noticed that the former was issuing readdir(), while the latter was using readdirplus().
As of ESX-3.5 (requirement introduced in ESX-3.0 I believe), nfsv3 over TCP is the only supported transport mechanism, so I can't just drop back to nfsv2 to get around the issue. I'm now looking into whether there's a secret knob in OS X that will make it parse readdirplus() properly, but I'm not optimistic. Perhaps the upcoming 10.5.3 update will fix the problem.
*sigh*
Subscribe to:
Posts (Atom)