Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ioctl version of OpenSolaris open(path, O_XATTR, mode) and openat(fd, name, O_XATTR, mode) #4437

Open
ryao opened this issue Mar 21, 2016 · 12 comments
Labels
Type: Feature Feature request or new feature

Comments

@ryao
Copy link
Contributor

ryao commented Mar 21, 2016

It looks like we can implement the functionality offered by O_XATTR on OpenSolaris via an ioctl. That would allow us to offer the equivalent to open(path, O_XATTR, mode) for opening the extended attribute directory and openat(fd, name, O_XATTR, mode) for opening an extended attribute. We would need to implement support for doing these operations on SA-based extended attributes, but that should allow us to get that code merged into illumos.

We would need to reimplement the open/openat syscalls, but the symbols that we need to manipulate file descriptors for this are available for our use:

  • get_unused_fd_flags
  • fd_install
  • put_unused_fd

The ioctl would pass a packed struct containing the arguments for openat() that looks something like this:

struct open_xattr {
    int fildes;
    int oflag;
    mode_t mode;
    char path[MAXPATHLEN];
};

The ZPL write functions would need to be modified to check for writes to an extended attribute and call fsnotify_xattr(dentry) on the parent.

In the case of open(), filedes could be -1, which should be an illegal file descriptor.

With extended attribute support implemented via an ioctl, it should be possible to write a compatibility shim that would allow O_XATTR to work. The open flag values differ between Illumos and Linux, so such a compatibility shim would need to pick a value in the hope that it does not conflict with anything:

http://lxr.free-electrons.com/source/include/uapi/asm-generic/fcntl.h#L18
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/fcntl.h#L57

Once we pick a value, we could make our own fcntl.h header that defines O_XATTR and then includes the real fcntl.h header. Then we could use function interposition in userland to intercept calls to open() and redirect those using O_XATTR to our ioctl(). jemalloc uses this technique for things that link to it (or that use LD_PRELOAD) and it is documented at various places:

http://jayconrod.com/posts/23/tutorial-function-interposition-in-linux
https://stackoverflow.com/questions/998464/function-interposition-in-linux-without-dlsym

This sort of thing should matter for the NFSv4 driver, so we might want to get their input on it if we do it.

@RichardSharpe
Copy link
Contributor

You realize that XATTRs and alternate data streams have different semantics, I assume? Streams are files and you can seek on them, read at any location for any amount of data. XATTRs do not offer that semantic.

Why O_XATTR, why not O_STREAM?

This would be very attractive to Samba users who also use ZFS On Linux, although an alternative stream VFS module for Samba would have to be written, which is not hard.

@ryao
Copy link
Contributor Author

ryao commented Mar 21, 2016

@RichardSharpe Solaris-style extended attributes are alternative data streams, which are full fledged files with their own file handles (if open), owner, group, mode, ACLs, etcetera. Things like alternative datastreams of alternative datastreams, linkat() and mknodat() are not permitted though. The IRIX-style extended attributes that Linux adopted are a simple key-value store.

For some time, ZoL has been mapping its Solaris-style extended attributes to the Linux's IRIX-style extended attribute interface. There is a ZoL-specific extension enabled by setting the xattr=sa property on a dataset that implements IRIX style extended attributes, but it converts to the Solaris-style extended attributes when the amount of data exceeds the room in the internal data node (like an inode on UFS).

One consequence of this at the moment is that you cannot call fsync() on a file with large extended attributes and expect the extended attributes to be stored on persistent storage when it returns. That requires doing fsync() on a file descriptor for the extended attribute (which we cannot access) or syncfs() on a file on the dataset (which is over-syncs). Implementing support for userland to access the alternative data streams via file descriptors will require reconciling the two. That could be done by treating the IRIX-style extended attributes as files with the same permissions, mode, etcetera. If we do that, then we should be able to adapt the currently Linux-specific xattr=sa into something that the other OpenZFS platforms can adopt, which would allow them to access extended attributes stored with xattr=sa.

As for why I picked the name O_XATTR, that is what Solaris used. Calling it O_STREAM should not be a problem if people prefer that. The preprocessor identifier used on Linux should not matter.

@RichardSharpe
Copy link
Contributor

Hmmm, while we are at it, are there IOCTLs to retrieve and set CREATE TIME and also the ATTRIBUTES field?

I have specified them locally ...

@ryao
Copy link
Contributor Author

ryao commented Mar 21, 2016

@RichardSharpe Not presently. ZFS does not have an internal create time. As for file attributes, I implemented support for setting appendonly, immutable and nodump via the kernel's existing interface and thought about implementing an ioctl for changing the full set of file attributes. However, no one ever asked for it, so I did not bother.

@RichardSharpe
Copy link
Contributor

@ryao:

Hmmm, in zfs_mknode I see:

...
        ZFS_TIME_ENCODE(&now, crtime);
        ZFS_TIME_ENCODE(&now, ctime);
...
        if (obj_type == DMU_OT_ZNODE) {
                SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_ATIME(zsb),
                    NULL, &atime, 16);
                SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_MTIME(zsb),
                    NULL, &mtime, 16);
                SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_CTIME(zsb),
                    NULL, &ctime, 16);
                SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_CRTIME(zsb),
                    NULL, &crtime, 16);
                SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_GEN(zsb),
...

In any event, I want those features.

@ryao
Copy link
Contributor Author

ryao commented Mar 21, 2016

@RichardSharpe My recollection was wrong. Thanks for catching that. It should be possible to make an ioctl for that, although I would opt to fix the code to make the implementation consistent with OpenSolaris:

http://www.unix.com/man-page/opensolaris/1/ls/

@RichardSharpe
Copy link
Contributor

Unfortunately, Linux's struct stat does not contain an st_crtime field :-(

@ryao
Copy link
Contributor Author

ryao commented Mar 21, 2016

@RichardSharpe By fixing the code, I meant making it report mtime if crtime is not set instead of making it an alias for ctime. Anyway, I think ext4 is in the same boat with regard to crtime. You might want to email Ted T'so to let him know that the 5 year moratorium expired. Please put me on CC.

https://moiseevigor.github.io/software/2015/01/30/get-file-creation-time-on-linux-with-ext4/

By the way, we probably should make separate issues for these things.

@RichardSharpe
Copy link
Contributor

@ryao: Correct. I will create a separate issue.

With respect to alternate streams and XATTRs, the only use for XATTRs by Samba on a system that has ZFS (where CRTIME and ATTRIBUTES are available) is for storing Security Descriptors (SDs) (AKA ACLs).

Since you cannot convert Windows SDs to NFSv4 ACLs without losing functionality, many such users will likely continue using XATTRs only for that. They would be easy to filter out.

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Mar 22, 2016
@tuxoko
Copy link
Contributor

tuxoko commented Mar 24, 2016

@ryao
I don't see how we are going to gain anything with this.
What's a point of introducing a syscall which no program on Linux is going to use to fix a generic thing like fsync?

If we need to fix fsync, just fix it internally. We can just explicitly call zil_commit to the XATTR znode belongs to the file.

By the way, switching between xattr=sa and xattr=dir is still not in one transaction, which I think also should be fixed.

@Freeaqingme
Copy link

What's a point of introducing a syscall which no program on Linux is going to use to fix a generic thing like fsync?

It's almost three years later. During these three years ZoL has gone more or less main stream (imho). As a result, I think many people would be fine using this functionality and simply require use of ZFS.

For me, I have a use case for a feature of alternate data streams on linux for building a FUSE fs, where I'd like to put some extra metadata/journalling inside something like ADS. I was already planning to base this on ZFS so ZoL-only functionality would be perfectly fine.

I realise my own use case is awfully specific, but there will probably be many more use cases once a feature like this lands.

@cschoenebeck
Copy link

I'm thinking about testing the waters and sending an RFC on LKML to see whether there would be any chance for a patch set being accepted for laying the ground for resource forks / alternate streams in the Linux kernel.

It's probably not realistic to get a dedicated API set for this in the Linux kernel right from the start. But for the beginning one could use the original minimalistic approach of Solaris by just adding a new O_whatever enum and just (mis)using the openat() function for both retrieving a list of resource forks and for opening a specific resource fork's data stream.

However wouldn't it make sense to use a different name for a enum in Linux, as the term "XATTR" is already quite confusingly used for multiple things? IMO an enum named O_FORK (instead of O_XATTR that Solaris uses) would be a better choice to make it clear that this has nothing to do with attributes.

What I am wondering on ZFS side: Did I get it right that ZFS already uses forks for xattr=sa, and if yes, wouldn't that cause potential conflicts in the name space?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

6 participants