(cache) System Configuration Utilities (man8)

C H A P T E R 31

System Configuration Utilities (man8)

This chapter includes system configuration utilities and includes the following sections:

mkfs.lustre

tunefs.lustre

lctl

mount.lustre

Additional System Configuration Utilities

31.1 mkfs.lustre

The mkfs.lustre utility formats a disk for a Lustre service.

Synopsis

mkfs.lustre <target_type> [options] device

where <target_type> is one of the following:

Option	Description
--ost	Object Storage Target (OST)
--mdt	Metadata Storage Target (MDT)
--mgs	Configuration Management Service (MGS), one per site. This service can be combined with one --mdt service by specifying both types.

Description

mkfs.lustre is used to format a disk device for use as part of a Lustre file system. After formatting, a disk can be mounted to start the Lustre service defined by this command.

When the file system is created, parameters can simply be added as a --param option to the mkfs.lustre command. See Setting Parameters with mkfs.lustre.

Option		Description
--backfstype=fstype		Forces a particular format for the backing file system (such as ext3, ldiskfs).
--comment=comment		Sets a user comment about this disk, ignored by Lustre.
--device-size=KB		Sets the device size for loop and non-loop devices.
--dryrun		Only prints what would be done; it does not affect the disk.
--failnode=nid,...		Sets the NID(s) of a failover partner. This option can be repeated as needed.
--fsname=filesystem_name		The Lustre file system of which this service/node will be a part. The default file system name is “lustre”. NOTE: The file system name is limited to 8 characters.
--index=index		Forces a particular OST or MDT index.
--mkfsoptions=opts		Formats options for the backing file system. For example, ext3 options could be set here.
--mountfsoptions=opts		Sets permanent mount options. This is equivalent to the setting in /etc/fstab.
--mgsnode=nid,...		Sets the NIDs of the MGS node, required for all targets other than the MGS.
--param key=value		Sets the permanent parameter key to value. This option can be repeated as desired. Typical options might include:
	--param sys.timeout=40	System obd timeout.
	--param lov.stripesize=2M	Default stripe size.
	--param lov.stripecount=2	Default stripe count.
	--param failover.mode=failout	Returns errors instead of waiting for recovery.
--quiet		Prints less information.
--reformat		Reformats an existing Lustre disk.
--stripe-count-hint=stripes		Used to optimize the MDT’s inode size.
--verbose		Prints more information.

Examples

Creates a combined MGS and MDT for file system testfs on node cfs21:

mkfs.lustre --fsname=testfs --mdt --mgs /dev/sda1

Creates an OST for file system testfs on any node (using the above MGS):

mkfs.lustre --fsname=testfs --ost --mgsnode=cfs21@tcp0 /dev/sdb

Creates a standalone MGS on, e.g., node cfs22:

mkfs.lustre --mgs /dev/sda1

Creates an MDT for file system myfs1 on any node (using the above MGS):

mkfs.lustre --fsname=myfs1 --mdt --mgsnode=cfs22@tcp0 /dev/sda2

31.2 tunefs.lustre

The tunefs.lustre utility modifies configuration information on a Lustre target disk.

Synopsis

tunefs.lustre [options] <device>

Description

tunefs.lustre is used to modify configuration information on a Lustre target disk. This includes upgrading old (pre-Lustre 1.6) disks. This does not reformat the disk or erase the target information, but modifying the configuration information can result in an unusable file system.

Caution - Changes made here affect a file system when the target is mounted the next time.

With tunefs.lustre, parameters are "additive" -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run:

$ tunefs.lustre --erase-params --param=<new parameters>

The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as <obd|fsname>.<obdtype>.<proc_file_name>=<value>. For example:

$ tunefs.lustre --param mdt.group_upcall=NONE /dev/sda1

Options

The tunefs.lustre options are listed and explained below.

Option		Description
--comment=comment		Sets a user comment about this disk, ignored by Lustre.
--dryrun		Only prints what would be done; does not affect the disk.
--erase-params		Removes all previous parameter information.
--failnode=nid,...		Sets the NID(s) of a failover partner. This option can be repeated as needed.
--fsname=filesystem_name		The Lustre file system of which this service will be a part. The default file system name is “lustre”.
--index=index		Forces a particular OST or MDT index.
--mountfsoptions=opts		Sets permanent mount options; equivalent to the setting in /etc/fstab.
--mgs		Adds a configuration management service to this target.
--msgnode=nid,...		Sets the NID(s) of the MGS node; required for all targets other than the MGS.
--nomgs		Removes a configuration management service to this target.
--quiet		Prints less information.
--verbose		Prints more information.
--writeconf		Erases all configuration logs for the file system to which this MDT belongs, and regenerates them. This is very dangerous. All clients and servers should be stopped. All targets must then be restarted to regenerate the logs. No clients should be started until all targets have restarted. In general, this command should only be executed on the MDT, not the OSTs.

Examples

Changing the MGS’s NID address. (This should be done on each target disk, since they should all contact the same MGS.)

tunefs.lustre --erase-param --mgsnode=<new_nid> --writeconf /dev/sda

Adding a failover NID location for this target.

tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda

31.3 lctl

The lctl utility is used for root control and configuration. With lctl you can directly control Lustre via an ioctl interface, allowing various configuration, maintenance and debugging features to be accessed.

Synopsis

lctl
lctl --device <OST device number> <command [args]>

Description

The lctl utility can be invoked in interactive mode by issuing the lctl command. After that, commands are issued as shown below. The most common lctl commands are:

dl
device
network <up/down>
list_nids
ping {nid}
help
quit

For a complete list of available commands, type help at the lctl prompt. To get basic help on command meaning and syntax, type help command

For non-interactive use, use the second invocation, which runs the command after connecting to the device.

Setting Parameters with lctl

Lustre parameters are not always accessible using the procfs interface, as it is platform-specific. As a solution, lctl {get,set}_param has been introduced as a platform-independent interface to the Lustre tunables. Avoid direct references to /proc/{fs,sys}/{lustre,lnet}. For future portability, use lctl {get,set}_param .

When the file system is running, temporary parameters can be set using the lctl set_param command. These parameters map to items in /proc/{fs,sys}/{lnet,lustre}. The lctl set_param command uses this syntax:

lctl set_param [-n] <obdtype>.<obdname>.<proc_file_name>=<value>

For example:

$ lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))

Many permanent parameters can be set with the lctl conf_param command. In general, the lctl conf_param command can be used to specify any parameter settable in a /proc/fs/lustre file, with its own OBD device. The lctl conf_param command uses this syntax:

<obd|fsname>.<obdtype>.<proc_file_name>=<value>)

For example:

$ lctl conf_param testfs-MDT0000.mdt.group_upcall=NONE 
$ lctl conf_param testfs.llite.max_read_ahead_mb=16

Caution - The lctl conf_param command permanently sets parameters in the file system configuration.

To get current Lustre parameter settings, use the lctl get_param command with this syntax:

lctl get_param [-n] <obdtype>.<obdname>.<proc_file_name>

For example:

$ lctl get_param -n ost.*.ost_io.timeouts

To list Lustre parameters that are available to set, use the lctl list_param command, with this syntax:

lctl list_param [-n] <obdtype>.<obdname>

For example:

$ lctl list_param obdfilter.lustre-OST0000

Network Configuration

Option		Description
dl		Specifies Lustre devices, by device name and number. The command output also lists the device UUID. On server devices, the UUID will be different for each one. On clients, the UUID will be the same for all devices that are part of a single file system mount point. The command output also lists the number of references on the device. On server devices, this is roughly the number of connected clients (plus a small number, like 3 or 4). The accurate number of clients can be found in the `num_exports` file in /proc for each device.
network <up/down>\|<tcp/elan/myrinet>		Starts or stops LNET. Or, select a network type for other lctl LNET commands.
list_nids		Prints all NIDs on the local node. LNET must be running.
which_nid <nidlist>		From a list of NIDs for a remote node, identifies the NID on which interface communication will occur.
ping <nid>		Check’s LNET connectivity via an LNET ping. This uses the fabric appropriate to the specified NID.
interface_list		Prints the network interface information for a given network type.
peer_list		Prints the known peers for a given network type.
conn_list		Prints all the connected remote NIDs for a given network type.
active_tx		This command prints active transmits. It is only used for the Elan network type.

Device Operations

Option		Description
lctl get_param [-n] <path_name>		Gets the Lustre or LNET parameters from the specified path name. Use the `-n` option to get only the parameter value and skip the path name in the output.
lctl set_param [-n] <path_name> <value>		Sets the specified value to the Lustre or LNET parameter specified by the path name. Use the -n option to skip the path name in the output. NOTE: `lctl set_param` only sets a temporary parameter value. Use lctl conf_param to save a permanent parameter value to the configuration file.
conf_param <device> <parameter>		Sets a permanent configuration parameter for any device via the MGS. This command must be run on the MGS node.
activate		Re-activates an import after the deactivate operation.
deactivate		Running `lctl deactivate` on the MDS stops new objects from being allocated on the OST. Running `lctl deactivate` on Lustre clients causes them to return -EIO when accessing objects on the OST instead of waiting for recovery.
abort_recovery		Aborts the recovery process on a re-starting MDT or OST device.

Virtual Block Device Operations

Lustre can emulate a virtual block device upon a regular file. This emulation is needed when you are trying to set up a swap space via the file.

Option		Description
blockdev_attach <file name> <device node>		Attaches a regular Lustre file to a block device. If the device node is non-existent, lctl creates it. We recommend that you create the device node by lctl since the emulator uses a dynamical major number.
blockdev_detach <device node>		Detaches the virtual block device.
blockdev_info <device node>		Provides information on which Lustre file is attached to the device node.

Debug

Option		Description
debug_daemon		Starts and stops the debug daemon, and controls the output filename and size.
debug_kernel [file] [raw]		Dumps the kernel debug buffer to stdout or a file.
debug_file <input> [output]		Converts the kernel-dumped debug log from binary to plain text format.
clear		Clears the kernel debug buffer.
mark <text>		Inserts marker text in the kernel debug buffer.

Options

Use the following options to invoke lctl.

Option		Description
--device		Device to be used for the operation (specified by name or number). See device_list.
--ignore_errors \| ignore_errors		Ignores errors during script processing.

Examples

lctl

$ lctl
lctl > dl
 
	0	UP		mgc		MGC192.168.0.20@tcp							bfbb24e3-7deb-2ffa-
eab0-44dffe00f692 5
	1 UP ost OSS OSS_uuid 3
	2 UP obdfilter testfs-OST0000 testfs-OST0000_UUID 3
lctl > dk /tmp/log Debug log: 87 lines, 87 kept, 0 dropped.
lctl > quit
 
$ lctl conf_param testfs-MDT0000 sys.timeout=40
$ lctl conf_param testfs-MDT0000.lov.stripesize=2M 
$ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15 
$ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15

get_param

$ lctl
lctl > get_param obdfilter.lustre-OST0000.kbytesavail
obdfilter.lustre-OST0000.kbytesavail=249364
lctl > get_param -n obdfilter.lustre-OST0000.kbytesavail
249364
lctl > get_param timeout
timeout=20
lctl > get_param -n timeout
20
lctl > get_param obdfilter.*.kbytesavail
obdfilter.lustre-OST0000.kbytesavail=249364
obdfilter.lustre-OST0001.kbytesavail=249364
lctl >

set_param

$ lctl > set_param obdfilter.*.kbytesavail=0
obdfilter.lustre-OST0000.kbytesavail=0
obdfilter.lustre-OST0001.kbytesavail=0
lctl > set_param -n obdfilter.*.kbytesavail=0
lctl > set_param fail_loc=0
fail_loc=0

31.4 mount.lustre

The mount.lustre utility starts a Lustre client or target service.

Synopsis

mount -t lustre [-o options] directory

Description

The mount.lustre utility starts a Lustre client or target service. This program should not be called directly; rather, it is a helper program invoked through mount(8), as shown above. Use the umount(8) command to stop Lustre clients and targets.

There are two forms for the device option, depending on whether a client or a target service is started:

Option	Description
<mgsspec>:/<fsname>	This mounts the Lustre file system, <fsname>, by contacting the Management Service at <mgsspec> on the pathname given by <directory>. The format for <mgsspec> is defined below. A mounted file system appears in fstab(5) and is usable, like any local file system, providing a full POSIX-compliant interface.
<disk_device>	This starts the target service defined by the `mkfs.lustre` command on the physical disk <disk_device>. A mounted target service file system is only useful for df(1) operations and appears in fstab(5) to show the device is in use.

Option

Description

<mgsspec>:/<fsname>

This mounts the Lustre file system, <fsname>, by contacting the Management Service at <mgsspec> on the pathname given by <directory>. The format for <mgsspec> is defined below. A mounted file system appears in fstab(5) and is usable, like any local file system, providing a full POSIX-compliant interface.

<disk_device>

This starts the target service defined by the mkfs.lustre command on the physical disk <disk_device>. A mounted target service file system is only useful for df(1) operations and appears in fstab(5) to show the device is in use.

Options

Option	Description
<mgsspec>:=<mgsnode>[:<mgsnode>]	The MGS specification may be a colon-separated list of nodes.
<mgsnode>:=<mgsnid>[,<mgsnid>]	Each node may be specified by a comma-separated list of NIDs.

Option

Description

<mgsspec>:=<mgsnode>[:<mgsnode>]

The MGS specification may be a colon-separated list of nodes.

<mgsnode>:=<mgsnid>[,<mgsnid>]

Each node may be specified by a comma-separated list of NIDs.

In addition to the standard mount options, Lustre understands the following client-specific options:

Option	Description
flock	Enables flock support, coherent across client nodes.
localflock	Enables local flock support, using only client-local flock (faster, for applications that require flock, but do not run on multiple nodes).
noflock	Disables flock support entirely. Applications calling flock get an error.
user_xattr	Enables get/set of extended attributes by regular users.
nouser_xattr	Disables use of extended attributes by regular users. Root and system processes can still use extended attributes.
acl	Enables ACL support.
noacl	Disables ACL support.

In addition to the standard mount options and backing disk type (e.g. ext3) options, Lustre understands the following server-specific options:

Option		Description
nosvc		Starts the MGC (and MGS, if co-located) for a target service, not the actual service.
nomsgs		Starts only the MDT (with a co-located MGS), without starting the MGS.
exclude=<ostlist>		Starts a client or MDT with a colon-separated list of known inactive OSTs.
abort_recov		Aborts client recovery and immediately starts the target service.
md_stripe_cache_size		Sets the stripe cache size for server-side disk with a striped RAID configuration.
recovery_time_soft=<timeout>		Allows <timeout> seconds for clients to reconnect for recovery after a server crash. This timeout is incrementally extended if it is about to expire and the server is still handling new connections from recoverable clients. The default soft recovery timeout is 300 seconds (5 minutes).
recovery_time_hard=<timeout>		The server is allowed to incrementally extend its timeout, up to a hard maximum of <timeout> seconds. The default hard recovery timeout is 900 seconds (15 minutes).

Examples

Starts a client for the Lustre file system testfs at mount point /mnt/myfilesystem. The Management Service is running on a node reachable from this client via the cfs21@tcp0 NID.

mount -t lustre cfs21@tcp0:/testfs /mnt/myfilesystem

Starts the Lustre target service on /dev/sda1.

mount -t lustre /dev/sda1 /mnt/test/mdt

Starts the testfs-MDT0000 service (using the disk label), but aborts the recovery process.

mount -t lustre -L testfs-MDT0000 -o abort_recov /mnt/test/mdt

Note - If the Service Tags tool (from the sun-servicetag package) can be found in /opt/sun/servicetag/bin/stclient, an inventory service tag is created reflecting the Lustre service being provided. If this tool cannot be found, mount.lustre silently ignores it and no service tag is created. The stclient(1) tool only creates the local service tag. No information is sent to the asset management system until you run the Registration Client to collect the tags and then upload them to the inventory system using your inventory system account. For more information, see Service Tags.

31.5 Additional System Configuration Utilities

This section describes additional system configuration utilities that were added in Lustre 1.6.

31.5.1 lustre_rmmod.sh

The lustre_rmmod.sh utility removes all Lustre and LNET modules (assuming no Lustre services are running). It is located in /usr/bin.

Note - The lustre_rmmod.sh utility does not work if Lustre modules are being used or if you have manually run the lctl network up command.

31.5.2 e2scan

The e2scan utility is an ext2 file system-modified inode scan program. The e2scan program uses libext2fs to find inodes with ctime or mtime newer than a given time and prints out their pathname. Use e2scan to efficiently generate lists of files that have been modified. The e2scan tool is included in e2fsprogs, located at:

http://downloads.clusterfs.com/public/tools/e2fsprogs/latest

Synopsis

e2scan [options] [-f file] block_device

Description

When invoked, the e2scan utility iterates all inodes on the block device, finds modified inodes, and prints their inode numbers. A similar iterator, using libext2fs(5), builds a table (called parent database) which lists the parent node for each inode. With a lookup function, you can reconstruct modified pathnames from root.

Options

Option	Description
-b inode buffer blocks	Sets the readahead inode blocks to get excellent performance when scanning the block device.
-o output file	If an output file is specified, modified pathnames are written to this file. Otherwise, modified parameters are written to stdout.
-t inode \| pathname	Sets the e2scan type if type is inode. The e2scan utility prints modified inode numbers to stdout. By default, the type is set as pathname. The e2scan utility lists modified pathnames based on modified inode numbers.
-u	Rebuilds the parent database from scratch. Otherwise, the current parent database is used.

31.5.3 Utilities to Manage Large Clusters

The following utilities are located in /usr/bin.

lustre_config.sh

The lustre_config.sh utility helps automate the formatting and setup of disks on multiple nodes. An entire installation is described in a comma-separated file and passed to this script, which then formats the drives, updates modprobe.conf and produces high-availability (HA) configuration files.

lustre_createcsv.sh

The lustre_createcsv.sh utility generates a CSV file describing the currently-running installation.

lustre_up14.sh

The lustre_up14.sh utility grabs client configuration files from old MDTs. When upgrading Lustre from 1.4.x to 1.6.x, if the MGS is not co-located with the MDT or the client name is non-standard, this utility is used to retrieve the old client log. For more information, see Upgrading and Downgrading Lustre.

31.5.4 Application Profiling Utilities

The following utilities are located in /usr/bin.

lustre_req_history.sh

The lustre_req_history.sh utility (run from a client), assembles as much Lustre RPC request history as possible from the local node and from the servers that were contacted, providing a better picture of the coordinated network activity.

llstat.sh

The llstat.sh utility (improved in Lustre 1.6), handles a wider range of /proc files, and has command line switches to produce more graphable output.

plot-llstat.sh

The plot-llstat.sh utility plots the output from llstat.sh using gnuplot.

31.5.5 More /proc Statistics for Application Profiling

The following utilities provide additional statistics.

vfs_ops_stats

The client vfs_ops_stats utility tracks Linux VFS operation calls into Lustre for a single PID, PPID, GID or everything.

/proc/fs/lustre/llite/*/vfs_ops_stats
/proc/fs/lustre/llite/*/vfs_track_[pid|ppid|gid]

extents_stats

The client extents_stats utility shows the size distribution of I/O calls from the client (cumulative and by process).

/proc/fs/lustre/llite/*/extents_stats, extents_stats_per_process

offset_stats

The client offset_stats utility shows the read/write seek activity of a client by offsets and ranges.

/proc/fs/lustre/llite/*/offset_stats

Lustre 1.6 included per-client and improved MDT statistics:

Per-client statistics tracked on the servers

Each MDT and OST now tracks LDLM and operations statistics for every connected client, for comparisons and simpler collection of distributed job statistics.

/proc/fs/lustre/mds|obdfilter/*/exports/

Improved MDT statistics

More detailed MDT operations statistics are collected for better profiling.

/proc/fs/lustre/mds/*/stats

31.5.6 Testing / Debugging Utilities

Lustre offers the following test and debugging utilities.

loadgen

The Load Generator (loadgen) is a test program designed to simulate large numbers of Lustre clients connecting and writing to an OST. The loadgen utility is located at lustre/utils/loadgen (in a build directory) or at /usr/sbin/loadgen (from an RPM).

Loadgen offers the ability to run this test:

1. Start an arbitrary number of (echo) clients.

2. Start and connect to an echo server, instead of a real OST.

3. Create/bulk_write/delete objects on any number of echo clients simultaneously.

Currently, the maximum number of clients is limited by MAX_OBD_DEVICES and the amount of memory available.

Usage

The loadgen utility can be run locally on the OST server machine or remotely from any LNET host. The device command can take an optional NID as a parameter; if unspecified, the first local NID found is used.

The obdecho module must be loaded by hand before running loadgen.

# cd lustre/utils/ 
# insmod ../obdecho/obdecho.ko 
# ./loadgen 
loadgen> h 
This is a test program used to simulate large numbers of clients. The echo obds are used, so the obdecho module must be loaded. 
 
Typical usage would be: 
loadgen> dev lustre-OST0000       set the target device 
loadgen> start 20                 start 20 echo clients 
loadgen> wr 10 5                  have 10 clients do simultaneous brw_write tests 5 times each 
 
Available commands are: 
	device 
	dl 
	echosrv 
	start 
	verbose 
	wait 
	write 
	help 
	exit 
	quit 
 
For more help type: help command-name 
loadgen> 
loadgen> device lustre-OST0000 192.168.0.21@tcp 
Added uuid OSS_UUID: 192.168.0.21@tcp 
Target OST name is 'lustre-OST0000' 
loadgen> 
loadgen> st 4 
start 0 to 4 
./loadgen: running thread #1 
./loadgen: running thread #2 
./loadgen: running thread #3 
./loadgen: running thread #4 
loadgen> wr 4 5 
Estimate 76 clients before we run out of grant space (155872K / 2097152) 
1: i0 
2: i0 
4: i0 
3: i0 
1: done (0) 
2: done (0) 
4: done (0) 
3: done (0) 
wrote 25MB in 1.419s (17.623 MB/s) 
loadgen>

The loadgen utility prints periodic status messages; message output can be controlled with the verbose command.

To insure that a file can be written to (a requirement of write cache), OSTs reserve ("grants"), chunks of space for each newly-created file. A grant may cause an OST to report that it is out of space, even though there is plenty of space on the disk, because the space is "reserved" by other files. The loadgen utility estimates the number of simultaneous open files as the disk size divided by the grant size and reports that number when the write tests are first started.

Echo Server

The loadgen utility can start an echo server. On another node, loadgen can specify the echo server as the device, thus creating a network-only test environment.

loadgen> echosrv 
loadgen> dl 
	0 UP obdecho echosrv echosrv 3 
	1 UP ost OSS OSS 3

On another node:

loadgen> device echosrv cfs21@tcp 
Added uuid OSS_UUID: 192.168.0.21@tcp 
Target OST name is 'echosrv' 
loadgen> st 1 
start 0 to 1 
./loadgen: running thread #1 
loadgen> wr 1 
start a test_brw write test on X clients for Y iterations 
usage: write <num_clients> <num_iter> [<delay>] 
loadgen> wr 1 1 
loadgen> 
1: i0 
1: done (0) 
wrote 1MB in 0.029s (34.023 MB/s)

Scripting

The threads all perform their actions in non-blocking mode; use the wait command to block for the idle state. For example:

#!/bin/bash 
./loadgen << EOF 
device lustre-OST0000 
st 1 
wr 1 10 
wait 
quit 
EOF

Feature Requests

The loadgen utility is intended to grow into a more comprehensive test tool; feature requests are encouraged. The current feature requests include:

Locking simulation

Many (echo) clients cache locks for the specified resource at the same time.

Many (echo) clients enqueue locks for the specified resource simultaneously.

obdsurvey functionality

Fold the Lustre I/O kit’s obdsurvey script functionality into loadgen

llog_reader

The llog_reader utility translates a Lustre configuration log into human-readable form.

lr_reader

The lr_reader utility translates a last received (last_rcvd) file into human-readable form.

31.5.7 Flock Feature

Lustre now includes the flock feature, which provides file locking support. Flock describes classes of file locks known as ‘flocks’. Flock can apply or remove a lock on an open file as specified by the user. However, a single file may not, simultaneously, have both shared and exclusive locks.

By default, the flock utility is disabled on Lustre. Two modes are available.

local mode	In this mode, locks are coherent on one node (a single-node flock), but not across all clients. To enable it, use `-o localflock`. This is a client-mount option. NOTE: This mode does not impact performance and is appropriate for single-node databases.
consistent mode	In this mode, locks are coherent across all clients. To enable it, use the `-o flock`. This is a client-mount option. CAUTION: This mode has a noticeable performance impact and may affect stability, depending on the Lustre version used. Consider using a newer Lustre version which is more stable.

A call to use flock may be blocked if another process is holding an incompatible lock. Locks created using flock are applicable for an open file table entry. Therefore, a single process may hold only one type of lock (shared or exclusive) on a single file. Subsequent flock calls on a file that is already locked converts the existing lock to the new lock mode.

31.5.7.1 Example

$ mount -t lustre -o flock mds@tcp0:/lustre /mnt/client

You can check it in /etc/mtab. It should look like,

mds@tcp0:/lustre /mnt/client lustre rw,flock 	0	0

31.5.8 l_getgroups

The l_getgroups utility handles Lustre user / group cache upcall.

Synopsis

l_getgroups [-v] [-d | mdsname] uid
l_getgroups [-v] -s

Options

Option	Description
--d	Debug - prints values to stdout instead of Lustre.
-s	Sleep - mlock memory in core and sleep forever.
-v	Verbose - logs start/stop to syslog.
mdsname	MDS device name.

Description

The group upcall file contains the path to an executable file that, when properly installed, is invoked to resolve a numeric UID to a group membership list. This utility should complete the mds_grp_downcall_data structure and write it to the /proc/fs/lustre/mds/mds service/group_info pseudo-file.

The l_getgroups utility is the reference implementation of the user or group cache upcall.

Files

The l_getgroups files are located at:

/proc/fs/lustre/mds/mds-service/group_upcall

31.5.9 llobdstat

The llobdstat utility displays OST statistics.

Synopsis

llobdstat ost_name [interval]

Description

The llobdstat utility displays a line of OST statistics for a given OST at specified intervals (in seconds).

Option		Description
ost_name		Name of the OBD for which statistics are requested.
interval		Time interval (in seconds) after which statistics are refreshed.

Example

# llobdstat liane-OST0002 1
/usr/bin/llobdstat on /proc/fs/lustre/obdfilter/liane-OST0002/stats
Processor counters run at 2800.189 MHz
Read: 1.21431e+07, Write: 9.93363e+08, create/destroy: 24/1499, stat: 34, punch: 18
[NOTE: cx: create, dx: destroy, st: statfs, pu: punch ]
Timestamp   Read-delta  ReadRate  Write-delta  WriteRate
--------------------------------------------------------
1217026053    0.00MB    0.00MB/s     0.00MB    0.00MB/s
1217026054    0.00MB    0.00MB/s     0.00MB    0.00MB/s
1217026055    0.00MB    0.00MB/s     0.00MB    0.00MB/s
1217026056    0.00MB    0.00MB/s     0.00MB    0.00MB/s
1217026057    0.00MB    0.00MB/s     0.00MB    0.00MB/s
1217026058    0.00MB    0.00MB/s     0.00MB    0.00MB/s
1217026059    0.00MB    0.00MB/s     0.00MB    0.00MB/s st:1

Files

The llobdstat files are located at:

/proc/fs/lustre/obdfilter/<ostname>/stats

31.5.10 llstat

The llstat utility displays Lustre statistics.

Synopsis

llstat [-c] [-g] [-i interval] stats_file

Description

The llstat utility displays statistics from any of the Lustre statistics files that share a common format and are updated at a specified interval (in seconds). To stop statistics printing, type ctrl-c.h

Options

Option	Description
-c	Clears the statistics file.
-i	Specifies the interval polling period (in seconds).
-g	Specifies graphable output format.
-h	Displays help information.
stats_file	Specifies either the full path to a statistics file or a shorthand reference, mds or ost

Example

To monitor /proc/fs/lustre/ost/OSS/ost/stats at 1 second intervals, run;

llstat -i 1 ost

Files

The llstat files are located at:

/proc/fs/lustre/mdt/MDS/*/stats
/proc/fs/lustre/mds/*/exports/*/stats
/proc/fs/lustre/mdc/*/stats
/proc/fs/lustre/ldlm/services/*/stats
/proc/fs/lustre/ldlm/namespaces/*/pool/stats
/proc/fs/lustre/mgs/MGS/exports/*/stats
/proc/fs/lustre/ost/OSS/*/stats
/proc/fs/lustre/osc/*/stats
/proc/fs/lustre/obdfilter/*/exports/*/stats
/proc/fs/lustre/obdfilter/*/stats
/proc/fs/lustre/llite/*/stats

31.5.11 lst

The lst utility starts LNET self-test.

Synopsis

lst

Description

LNET self-test helps site administrators confirm that Lustre Networking (LNET) has been correctly installed and configured. The self-test also confirms that LNET, the network software and the underlying hardware are performing as expected.

Each LNET self-test runs in the context of a session. A node can be associated with only one session at a time, to ensure that the session has exclusive use of the nodes on which it is running. A single node creates, controls and monitors a single session. This node is referred to as the self-test console.

Any node may act as the self-test console. Nodes are named and allocated to a self-test session in groups. This allows all nodes in a group to be referenced by a single name.

Test configurations are built by describing and running test batches. A test batch is a named collection of tests, with each test composed of a number of individual point-to-point tests running in parallel. These individual point-to-point tests are instantiated according to the test type, source group, target group and distribution specified when the test is added to the test batch.

Modules

To run LNET self-test, load following modules: libcfs, lnet, lnet_selftest and any one of the klnds (ksocklnd, ko2iblnd...). To load all necessary modules, run modprobe lnet_selftest, which recursively loads the modules on which lnet_selftest depends.

There are two types of nodes for LNET self-test: console and test. Both node types require all previously-specified modules to be loaded. (The userspace test node does not require these modules).

Test nodes can either be in kernel or in userspace. A console user can invite a kernel test node to join the test session by running lst add_group NID, but the user cannot actively add a userspace test node to the test-session. However, the console user can passively accept a test node to the test session while the test node runs lst client to connect to the console.

Utilities

LNET self-test includes two user utilities, lst and lstclient.

lst is the user interface for the self-test console (run on console node). It provides a list of commands to control the entire test system, such as create session, create test groups, etc.

lstclient is the userspace self-test program which is linked with userspace LNDs and LNET. A user can invoke lstclient to join a self-test session:

lstclient -sesid CONSOLE_NID group NAME

Example

This is an example of an LNET self-test script which simulates the traffic pattern of a set of Lustre servers on a TCP network, accessed by Lustre clients on an IB network (connected via LNET routers), with half the clients reading and half the clients writing.

#!/bin/bash
export LST_SESSION=$$
lst new_session read/write
lst add_group servers 192.168.10.[8,10,12-16]@tcp
lst add_group readers 192.168.1.[1-253/2]@o2ib
lst add_group writers 192.168.1.[2-254/2]@o2ib
lst add_batch bulk_rw
lst add_test --batch bulk_rw --from readers --to servers     brw read check=simple size=1M
lst add_test --batch bulk_rw --from writers --to servers     brw write check=full size=4K
# start running
lst run bulk_rw
# display server stats for 30 seconds
lst stat servers & sleep 30; kill $!
# tear down
lst end_session

31.5.12 plot-llstat

The plot-llstat utility plots Lustre statistics.

Synopsis

plot-llstat results_filename [parameter_index]

Options

Option		Description
results_filename		Output generated by plot-llstat
parameter_index		Value of parameter_index can be: 1 - count per interval 2 - count per second (default setting) 3 - total count

Description

The plot-llstat utility generates a CSV file and instruction files for gnuplot from llstat output. Since llstat is generic in nature, plot-llstat is also a generic script. The value of parameter_index can be 1 for count per interval, 2 for count per second (default setting) or 3 for total count.

The plot-llstat utility creates a .dat (CSV) file using the number of operations specified by the user. The number of operations equals the number of columns in the CSV file. The values in those columns are equal to the corresponding value of parameter_index in the output file.

The plot-llstat utility also creates a .scr file that contains instructions for gnuplot to plot the graph. After generating the .dat and .scr files, the plot llstat tool invokes gnuplot to display the graph.

Example

llstat -i2 -g -c lustre-OST0000 > log
plot-llstat log 3

31.5.13 routerstat

The routerstat utility prints Lustre router statistics.

Synopsis

routerstat [interval]

Description

The routerstat utility watches LNET router statistics. If no interval is specified, then statistics are sampled and printed only one time. Otherwise, statistics are sampled and printed at the specified interval (in seconds).

Options

The routerstat output includes the following fields:

Field	Description
M	msgs_alloc(msgs_max)
E	errors
S	send_length/send_count
R	recv_length/recv_count
F	route_length/route_count
D	drop_length/drop_count

Files

Routerstat extracts statistics data from:

/proc/sys/lnet/stats

31.5.14 ll_recover_lost_found_objs

The ll_recover_lost_found_objs utility helps recover Lustre OST objects (file data) from a lost and found directory back to their correct locations.

Running the ll_recover_lost_found_objs tool is not strictly necessary to bring an OST back online, it just avoids losing access to objects that were moved to the lost and found directory due to directory corruption.

Synopsis

$ ll_recover_lost_found_objs [-hv] -d directory

Description

The first time Lustre writes to an object, it saves the MDS inode number and the objid as an extended attribute on the object, so in case of directory corruption of the OST, it is possible to recover the objects. Running e2fsck fixes the corrupted OST directory, but it puts all of the objects into a lost and found directory, where they are inaccessible to Lustre. Use the ll_recover_lost_found_objs utility to recover all (or at least most) objects from a lost and found directory back to their place in the O/0/d* directories.

To use ll_recover_lost_found_objs, mount the file system locally (using the -t ldiskfs command), run the utility and then unmount it again. The OST must not be mounted by Lustre when ll_recover_lost_found_objs is run.

Options

Field	Description
-h	Prints a help message
-v	Increases verbosity
-d directory	Sets the lost and found directory path

Field

Description

-h

Prints a help message

-v

Increases verbosity

-d directory

Sets the lost and found directory path

Example

ll_recover_lost_found_objs -d /mnt/ost/lost+found