Re: [patch 07/10] unprivileged mounts: add sysctl tunable for "safe" property



Quoting Miklos Szeredi (miklos@xxxxxxxxxx):
From: Miklos Szeredi <mszeredi@xxxxxxx>

Add the following:

/proc/sys/fs/types/${FS_TYPE}/usermount_safe

Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxx>

Thanks, Miklos, good explanations in the docs.

Acked-by: Serge Hallyn <serue@xxxxxxxxxx>

One comment inline, but not imo your problem :)

---

Index: linux/fs/filesystems.c
===================================================================
--- linux.orig/fs/filesystems.c 2008-02-04 23:47:46.000000000 +0100
+++ linux/fs/filesystems.c 2008-02-04 23:48:04.000000000 +0100
@@ -12,6 +12,7 @@
#include <linux/kmod.h>
#include <linux/init.h>
#include <linux/module.h>
+#include <linux/sysctl.h>
#include <asm/uaccess.h>

/*
@@ -51,6 +52,57 @@ static struct file_system_type **find_fi
return p;
}

+#define MAX_FILESYSTEM_VARS 1
+
+struct filesystem_sysctl_table {
+ struct ctl_table_header *header;
+ struct ctl_table table[MAX_FILESYSTEM_VARS + 1];
+};
+
+/*
+ * Create /sys/fs/types/${FSNAME} directory with per fs-type tunables.
+ */
+static int filesystem_sysctl_register(struct file_system_type *fs)
+{
+ struct filesystem_sysctl_table *t;
+ struct ctl_path path[] = {
+ { .procname = "fs", .ctl_name = CTL_FS },
+ { .procname = "types", .ctl_name = CTL_UNNUMBERED },
+ { .procname = fs->name, .ctl_name = CTL_UNNUMBERED },
+ { }
+ };
+
+ t = kzalloc(sizeof(*t), GFP_KERNEL);
+ if (!t)
+ return -ENOMEM;
+
+
+ t->table[0].ctl_name = CTL_UNNUMBERED;
+ t->table[0].procname = "usermount_safe";
+ t->table[0].maxlen = sizeof(int);
+ t->table[0].data = &fs->fs_safe;
+ t->table[0].mode = 0644;

Yikes, this could be a problem for containers, as it's simply tied to
uid 0, whereas tying it to a capability would let us solve it with
capability bounds.

This might mean more urgency to get user namespaces working at least
with sysfs, else this is a quick way around having CAP_SYS_ADMIN taken
out of a container's capability bounding set.

+ t->table[0].proc_handler = &proc_dointvec;
+
+ t->header = register_sysctl_paths(path, t->table);
+ if (!t->header) {
+ kfree(t);
+ return -ENOMEM;
+ }
+
+ fs->sysctl_table = t;
+
+ return 0;
+}
+
+static void filesystem_sysctl_unregister(struct file_system_type *fs)
+{
+ struct filesystem_sysctl_table *t = fs->sysctl_table;
+
+ unregister_sysctl_table(t->header);
+ kfree(t);
+}
+
/**
* register_filesystem - register a new filesystem
* @fs: the file system structure
@@ -80,6 +132,13 @@ int register_filesystem(struct file_syst
else
*p = fs;
write_unlock(&file_systems_lock);
+
+ if (res == 0) {
+ res = filesystem_sysctl_register(fs);
+ if (res != 0)
+ unregister_filesystem(fs);
+ }
+
return res;
}

@@ -108,6 +167,7 @@ int unregister_filesystem(struct file_sy
*tmp = fs->next;
fs->next = NULL;
write_unlock(&file_systems_lock);
+ filesystem_sysctl_unregister(fs);
return 0;
}
tmp = &(*tmp)->next;
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h 2008-02-04 23:48:02.000000000 +0100
+++ linux/include/linux/fs.h 2008-02-04 23:48:04.000000000 +0100
@@ -1444,6 +1444,7 @@ struct file_system_type {
struct module *owner;
struct file_system_type * next;
struct list_head fs_supers;
+ struct filesystem_sysctl_table *sysctl_table;

struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
Index: linux/Documentation/filesystems/proc.txt
===================================================================
--- linux.orig/Documentation/filesystems/proc.txt 2008-02-04 23:47:58.000000000 +0100
+++ linux/Documentation/filesystems/proc.txt 2008-02-04 23:48:04.000000000 +0100
@@ -44,6 +44,7 @@ Table of Contents
2.14 /proc/<pid>/io - Display the IO accounting fields
2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
2.16 /proc/<pid>/mountinfo - Information about mounts
+ 2.17 /proc/sys/fs/types - File system type specific parameters

------------------------------------------------------------------------------
Preface
@@ -2392,4 +2393,34 @@ For more information see:
Documentation/filesystems/sharedsubtree.txt


+2.17 /proc/sys/fs/types/ - File system type specific parameters
+----------------------------------------------------------------
+
+There's a separate directory /proc/sys/fs/types/<type>/ for each
+filesystem type, containing the following files:
+
+usermount_safe
+--------------
+
+Setting this to non-zero will allow filesystems of this type to be
+mounted by unprivileged users (note, that there are other
+prerequisites as well).
+
+Fuse has been designed to be as safe as possible, and some
+distributions already ship with unprivileged fuse mounts enabled by
+default. There are still some situations (multi-user systems with
+untrusted users in particular), where enabling this for fuse might not
+be appropriate. For more details, see Documentation/filesystems/fuse.txt
+
+Procfs is also safe, but unprivileged mounting of it is not usually
+necessary (bind mounting is equivalent).
+
+Most other filesystems are unsafe. Here are just some of the issues,
+that must be resolved before a filesystem can be declared safe:
+
+ - no strict input checking (buffer overruns, directory loops, etc)
+ - network filesystem deadlocks when mounting from localhost
+ - no permission checking when opening the device
+ - changing mount options when mounting a new instance of a filesystem
+
------------------------------------------------------------------------------

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [patch 07/11] unprivileged mounts: add sysctl tunable for "safe" property
    ... +struct filesystem_sysctl_table { ... +Procfs is also safe, but unprivileged mounting of it is not usually ... +necessary (bind mounting is equivalent). ... +that must be resolved before a filesystem can be declared safe: ...
    (Linux-Kernel)
  • [patch 07/10] unprivileged mounts: add sysctl tunable for "safe" property
    ... +struct filesystem_sysctl_table { ... +Procfs is also safe, but unprivileged mounting of it is not usually ... +necessary (bind mounting is equivalent). ... +that must be resolved before a filesystem can be declared safe: ...
    (Linux-Kernel)
  • [PATCH] VFS: update documentation
    ... The Virtual File System (otherwise known as the Virtual Filesystem ... Switch) is the software layer in the kernel that provides the ... The VFS implements the open, stat, chmodand similar system ... struct file_system_type { ...
    (Linux-Kernel)
  • Re: [PATCH] VFS: update documentation
    ... The Virtual File System (otherwise known as the Virtual Filesystem ... Switch) is the software layer in the kernel that provides the ... The VFS implements the open, stat, chmodand similar system ... struct file_system_type { ...
    (Linux-Kernel)
  • Re: [patch] mm: tiny-shmem fix lor, mmap_sem vs i_mutex
    ... r/o bind mounts: filesystem helpers for custom 'struct file's ... This feature allows a read-only view into a read-write filesystem. ... Some filesystems forego the vfs and may_openand create their own 'struct ... * @dentry: the dentry representing the new file ...
    (Linux-Kernel)