C H A P T E R 31 |
System Configuration Utilities (man8) |
This chapter includes system configuration utilities and includes the following sections:
The mkfs.lustre utility formats a disk for a Lustre service.
mkfs.lustre <target_type> [options] device
where <target_type> is one of the following:
Configuration Management Service (MGS), one per site. This service can be combined with one --mdt service by specifying both types. |
mkfs.lustre is used to format a disk device for use as part of a Lustre file system. After formatting, a disk can be mounted to start the Lustre service defined by this command.
When the file system is created, parameters can simply be added as a --param option to the mkfs.lustre command. See Setting Parameters with mkfs.lustre.
Creates a combined MGS and MDT for file system testfs on node cfs21:
mkfs.lustre --fsname=testfs --mdt --mgs /dev/sda1
Creates an OST for file system testfs on any node (using the above MGS):
mkfs.lustre --fsname=testfs --ost --mgsnode=cfs21@tcp0 /dev/sdb
Creates a standalone MGS on, e.g., node cfs22:
mkfs.lustre --mgs /dev/sda1
Creates an MDT for file system myfs1 on any node (using the above MGS):
mkfs.lustre --fsname=myfs1 --mdt --mgsnode=cfs22@tcp0 /dev/sda2
The tunefs.lustre utility modifies configuration information on a Lustre target disk.
tunefs.lustre [options] <device>
tunefs.lustre is used to modify configuration information on a Lustre target disk. This includes upgrading old (pre-Lustre 1.6) disks. This does not reformat the disk or erase the target information, but modifying the configuration information can result in an unusable file system.
Caution - Changes made here affect a file system when the target is mounted the next time. |
With tunefs.lustre, parameters are "additive" -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run:
$ tunefs.lustre --erase-params --param=<new parameters>
The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as <obd|fsname>.<obdtype>.<proc_file_name>=<value>. For example:
$ tunefs.lustre --param mdt.group_upcall=NONE /dev/sda1
The tunefs.lustre options are listed and explained below.
Changing the MGS’s NID address. (This should be done on each target disk, since they should all contact the same MGS.)
tunefs.lustre --erase-param --mgsnode=<new_nid> --writeconf /dev/sda
Adding a failover NID location for this target.
tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda
The lctl utility is used for root control and configuration. With lctl you can directly control Lustre via an ioctl interface, allowing various configuration, maintenance and debugging features to be accessed.
lctl lctl --device <OST device number> <command [args]>
The lctl utility can be invoked in interactive mode by issuing the lctl command. After that, commands are issued as shown below. The most common lctl commands are:
dl device network <up/down> list_nids ping {nid} help quit
For a complete list of available commands, type help at the lctl prompt. To get basic help on command meaning and syntax, type help command
For non-interactive use, use the second invocation, which runs the command after connecting to the device.
Lustre parameters are not always accessible using the procfs interface, as it is platform-specific. As a solution, lctl {get,set}_param has been introduced as a platform-independent interface to the Lustre tunables. Avoid direct references to /proc/{fs,sys}/{lustre,lnet}. For future portability, use lctl {get,set}_param .
When the file system is running, temporary parameters can be set using the lctl set_param command. These parameters map to items in /proc/{fs,sys}/{lnet,lustre}. The lctl set_param command uses this syntax:
lctl set_param [-n] <obdtype>.<obdname>.<proc_file_name>=<value>
$ lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))
Many permanent parameters can be set with the lctl conf_param command. In general, the lctl conf_param command can be used to specify any parameter settable in a /proc/fs/lustre file, with its own OBD device. The lctl conf_param command uses this syntax:
<obd|fsname>.<obdtype>.<proc_file_name>=<value>)
$ lctl conf_param testfs-MDT0000.mdt.group_upcall=NONE $ lctl conf_param testfs.llite.max_read_ahead_mb=16
Caution - The lctl conf_param command permanently sets parameters in the file system configuration. |
To get current Lustre parameter settings, use the lctl get_param command with this syntax:
lctl get_param [-n] <obdtype>.<obdname>.<proc_file_name>
$ lctl get_param -n ost.*.ost_io.timeouts
To list Lustre parameters that are available to set, use the lctl list_param command, with this syntax:
lctl list_param [-n] <obdtype>.<obdname>
$ lctl list_param obdfilter.lustre-OST0000
Virtual Block Device Operations
Lustre can emulate a virtual block device upon a regular file. This emulation is needed when you are trying to set up a swap space via the file.
Starts and stops the debug daemon, and controls the output filename and size. |
||
Converts the kernel-dumped debug log from binary to plain text format. |
||
Use the following options to invoke lctl.
Device to be used for the operation (specified by name or number). See device_list. |
||
$ lctl lctl > dl 0 UP mgc MGC192.168.0.20@tcp bfbb24e3-7deb-2ffa- eab0-44dffe00f692 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter testfs-OST0000 testfs-OST0000_UUID 3 lctl > dk /tmp/log Debug log: 87 lines, 87 kept, 0 dropped. lctl > quit $ lctl conf_param testfs-MDT0000 sys.timeout=40 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
$ lctl lctl > get_param obdfilter.lustre-OST0000.kbytesavail obdfilter.lustre-OST0000.kbytesavail=249364 lctl > get_param -n obdfilter.lustre-OST0000.kbytesavail 249364 lctl > get_param timeout timeout=20 lctl > get_param -n timeout 20 lctl > get_param obdfilter.*.kbytesavail obdfilter.lustre-OST0000.kbytesavail=249364 obdfilter.lustre-OST0001.kbytesavail=249364 lctl >
$ lctl > set_param obdfilter.*.kbytesavail=0 obdfilter.lustre-OST0000.kbytesavail=0 obdfilter.lustre-OST0001.kbytesavail=0 lctl > set_param -n obdfilter.*.kbytesavail=0 lctl > set_param fail_loc=0 fail_loc=0
The mount.lustre utility starts a Lustre client or target service.
mount -t lustre [-o options] directory
The mount.lustre utility starts a Lustre client or target service. This program should not be called directly; rather, it is a helper program invoked through mount(8), as shown above. Use the umount(8) command to stop Lustre clients and targets.
There are two forms for the device option, depending on whether a client or a target service is started:
The MGS specification may be a colon-separated list of nodes. |
|
Each node may be specified by a comma-separated list of NIDs. |
In addition to the standard mount options, Lustre understands the following client-specific options:
In addition to the standard mount options and backing disk type (e.g. ext3) options, Lustre understands the following server-specific options:
Starts a client for the Lustre file system testfs at mount point /mnt/myfilesystem. The Management Service is running on a node reachable from this client via the cfs21@tcp0 NID.
mount -t lustre cfs21@tcp0:/testfs /mnt/myfilesystem
Starts the Lustre target service on /dev/sda1.
mount -t lustre /dev/sda1 /mnt/test/mdt
Starts the testfs-MDT0000 service (using the disk label), but aborts the recovery process.
mount -t lustre -L testfs-MDT0000 -o abort_recov /mnt/test/mdt
Note - If the Service Tags tool (from the sun-servicetag package) can be found in /opt/sun/servicetag/bin/stclient, an inventory service tag is created reflecting the Lustre service being provided. If this tool cannot be found, mount.lustre silently ignores it and no service tag is created. The stclient(1) tool only creates the local service tag. No information is sent to the asset management system until you run the Registration Client to collect the tags and then upload them to the inventory system using your inventory system account. For more information, see Service Tags. |
This section describes additional system configuration utilities that were added in Lustre 1.6.
The lustre_rmmod.sh utility removes all Lustre and LNET modules (assuming no Lustre services are running). It is located in /usr/bin.
Note - The lustre_rmmod.sh utility does not work if Lustre modules are being used or if you have manually run the lctl network up command. |
The e2scan utility is an ext2 file system-modified inode scan program. The e2scan program uses libext2fs to find inodes with ctime or mtime newer than a given time and prints out their pathname. Use e2scan to efficiently generate lists of files that have been modified. The e2scan tool is included in e2fsprogs, located at:
http://downloads.clusterfs.com/public/tools/e2fsprogs/latest
e2scan [options] [-f file] block_device
When invoked, the e2scan utility iterates all inodes on the block device, finds modified inodes, and prints their inode numbers. A similar iterator, using libext2fs(5), builds a table (called parent database) which lists the parent node for each inode. With a lookup function, you can reconstruct modified pathnames from root.
The following utilities are located in /usr/bin.
The lustre_config.sh utility helps automate the formatting and setup of disks on multiple nodes. An entire installation is described in a comma-separated file and passed to this script, which then formats the drives, updates modprobe.conf and produces high-availability (HA) configuration files.
The lustre_createcsv.sh utility generates a CSV file describing the currently-running installation.
The lustre_up14.sh utility grabs client configuration files from old MDTs. When upgrading Lustre from 1.4.x to 1.6.x, if the MGS is not co-located with the MDT or the client name is non-standard, this utility is used to retrieve the old client log. For more information, see Upgrading and Downgrading Lustre.
The following utilities are located in /usr/bin.
The lustre_req_history.sh utility (run from a client), assembles as much Lustre RPC request history as possible from the local node and from the servers that were contacted, providing a better picture of the coordinated network activity.
The llstat.sh utility (improved in Lustre 1.6), handles a wider range of /proc files, and has command line switches to produce more graphable output.
The plot-llstat.sh utility plots the output from llstat.sh using gnuplot.
The following utilities provide additional statistics.
The client vfs_ops_stats utility tracks Linux VFS operation calls into Lustre for a single PID, PPID, GID or everything.
/proc/fs/lustre/llite/*/vfs_ops_stats /proc/fs/lustre/llite/*/vfs_track_[pid|ppid|gid]
The client extents_stats utility shows the size distribution of I/O calls from the client (cumulative and by process).
/proc/fs/lustre/llite/*/extents_stats, extents_stats_per_process
The client offset_stats utility shows the read/write seek activity of a client by offsets and ranges.
/proc/fs/lustre/llite/*/offset_stats
Lustre 1.6 included per-client and improved MDT statistics:
Each MDT and OST now tracks LDLM and operations statistics for every connected client, for comparisons and simpler collection of distributed job statistics.
/proc/fs/lustre/mds|obdfilter/*/exports/
More detailed MDT operations statistics are collected for better profiling.
/proc/fs/lustre/mds/*/stats
Lustre offers the following test and debugging utilities.
The Load Generator (loadgen) is a test program designed to simulate large numbers of Lustre clients connecting and writing to an OST. The loadgen utility is located at lustre/utils/loadgen (in a build directory) or at /usr/sbin/loadgen (from an RPM).
Loadgen offers the ability to run this test:
1. Start an arbitrary number of (echo) clients.
2. Start and connect to an echo server, instead of a real OST.
3. Create/bulk_write/delete objects on any number of echo clients simultaneously.
Currently, the maximum number of clients is limited by MAX_OBD_DEVICES and the amount of memory available.
The loadgen utility can be run locally on the OST server machine or remotely from any LNET host. The device command can take an optional NID as a parameter; if unspecified, the first local NID found is used.
The obdecho module must be loaded by hand before running loadgen.
# cd lustre/utils/ # insmod ../obdecho/obdecho.ko # ./loadgen loadgen> h This is a test program used to simulate large numbers of clients. The echo obds are used, so the obdecho module must be loaded. Typical usage would be: loadgen> dev lustre-OST0000 set the target device loadgen> start 20 start 20 echo clients loadgen> wr 10 5 have 10 clients do simultaneous brw_write tests 5 times each Available commands are: device dl echosrv start verbose wait write help exit quit For more help type: help command-name loadgen> loadgen> device lustre-OST0000 192.168.0.21@tcp Added uuid OSS_UUID: 192.168.0.21@tcp Target OST name is 'lustre-OST0000' loadgen> loadgen> st 4 start 0 to 4 ./loadgen: running thread #1 ./loadgen: running thread #2 ./loadgen: running thread #3 ./loadgen: running thread #4 loadgen> wr 4 5 Estimate 76 clients before we run out of grant space (155872K / 2097152) 1: i0 2: i0 4: i0 3: i0 1: done (0) 2: done (0) 4: done (0) 3: done (0) wrote 25MB in 1.419s (17.623 MB/s) loadgen>
The loadgen utility prints periodic status messages; message output can be controlled with the verbose command.
To insure that a file can be written to (a requirement of write cache), OSTs reserve ("grants"), chunks of space for each newly-created file. A grant may cause an OST to report that it is out of space, even though there is plenty of space on the disk, because the space is "reserved" by other files. The loadgen utility estimates the number of simultaneous open files as the disk size divided by the grant size and reports that number when the write tests are first started.
The loadgen utility can start an echo server. On another node, loadgen can specify the echo server as the device, thus creating a network-only test environment.
loadgen> echosrv loadgen> dl 0 UP obdecho echosrv echosrv 3 1 UP ost OSS OSS 3
loadgen> device echosrv cfs21@tcp Added uuid OSS_UUID: 192.168.0.21@tcp Target OST name is 'echosrv' loadgen> st 1 start 0 to 1 ./loadgen: running thread #1 loadgen> wr 1 start a test_brw write test on X clients for Y iterations usage: write <num_clients> <num_iter> [<delay>] loadgen> wr 1 1 loadgen> 1: i0 1: done (0) wrote 1MB in 0.029s (34.023 MB/s)
The threads all perform their actions in non-blocking mode; use the wait command to block for the idle state. For example:
#!/bin/bash ./loadgen << EOF device lustre-OST0000 st 1 wr 1 10 wait quit EOF
The loadgen utility is intended to grow into a more comprehensive test tool; feature requests are encouraged. The current feature requests include:
The llog_reader utility translates a Lustre configuration log into human-readable form.
The lr_reader utility translates a last received (last_rcvd) file into human-readable form.
Lustre now includes the flock feature, which provides file locking support. Flock describes classes of file locks known as ‘flocks’. Flock can apply or remove a lock on an open file as specified by the user. However, a single file may not, simultaneously, have both shared and exclusive locks.
By default, the flock utility is disabled on Lustre. Two modes are available.
A call to use flock may be blocked if another process is holding an incompatible lock. Locks created using flock are applicable for an open file table entry. Therefore, a single process may hold only one type of lock (shared or exclusive) on a single file. Subsequent flock calls on a file that is already locked converts the existing lock to the new lock mode.
$ mount -t lustre -o flock mds@tcp0:/lustre /mnt/client
You can check it in /etc/mtab. It should look like,
mds@tcp0:/lustre /mnt/client lustre rw,flock 0 0
The l_getgroups utility handles Lustre user / group cache upcall.
l_getgroups [-v] [-d | mdsname] uid l_getgroups [-v] -s
The group upcall file contains the path to an executable file that, when properly installed, is invoked to resolve a numeric UID to a group membership list. This utility should complete the mds_grp_downcall_data structure and write it to the /proc/fs/lustre/mds/mds service/group_info pseudo-file.
The l_getgroups utility is the reference implementation of the user or group cache upcall.
The l_getgroups files are located at:
/proc/fs/lustre/mds/mds-service/group_upcall
The llobdstat utility displays OST statistics.
llobdstat ost_name [interval]
The llobdstat utility displays a line of OST statistics for a given OST at specified intervals (in seconds).
Time interval (in seconds) after which statistics are refreshed. |
# llobdstat liane-OST0002 1 /usr/bin/llobdstat on /proc/fs/lustre/obdfilter/liane-OST0002/stats Processor counters run at 2800.189 MHz Read: 1.21431e+07, Write: 9.93363e+08, create/destroy: 24/1499, stat: 34, punch: 18 [NOTE: cx: create, dx: destroy, st: statfs, pu: punch ] Timestamp Read-delta ReadRate Write-delta WriteRate -------------------------------------------------------- 1217026053 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1217026054 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1217026055 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1217026056 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1217026057 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1217026058 0.00MB 0.00MB/s 0.00MB 0.00MB/s 1217026059 0.00MB 0.00MB/s 0.00MB 0.00MB/s st:1
The llobdstat files are located at:
/proc/fs/lustre/obdfilter/<ostname>/stats
The llstat utility displays Lustre statistics.
llstat [-c] [-g] [-i interval] stats_file
The llstat utility displays statistics from any of the Lustre statistics files that share a common format and are updated at a specified interval (in seconds). To stop statistics printing, type ctrl-c.h
Specifies either the full path to a statistics file or a shorthand reference, mds or ost |
To monitor /proc/fs/lustre/ost/OSS/ost/stats at 1 second intervals, run;
llstat -i 1 ost
The llstat files are located at:
/proc/fs/lustre/mdt/MDS/*/stats /proc/fs/lustre/mds/*/exports/*/stats /proc/fs/lustre/mdc/*/stats /proc/fs/lustre/ldlm/services/*/stats /proc/fs/lustre/ldlm/namespaces/*/pool/stats /proc/fs/lustre/mgs/MGS/exports/*/stats /proc/fs/lustre/ost/OSS/*/stats /proc/fs/lustre/osc/*/stats /proc/fs/lustre/obdfilter/*/exports/*/stats /proc/fs/lustre/obdfilter/*/stats /proc/fs/lustre/llite/*/stats
The lst utility starts LNET self-test.
lst
LNET self-test helps site administrators confirm that Lustre Networking (LNET) has been correctly installed and configured. The self-test also confirms that LNET, the network software and the underlying hardware are performing as expected.
Each LNET self-test runs in the context of a session. A node can be associated with only one session at a time, to ensure that the session has exclusive use of the nodes on which it is running. A single node creates, controls and monitors a single session. This node is referred to as the self-test console.
Any node may act as the self-test console. Nodes are named and allocated to a self-test session in groups. This allows all nodes in a group to be referenced by a single name.
Test configurations are built by describing and running test batches. A test batch is a named collection of tests, with each test composed of a number of individual point-to-point tests running in parallel. These individual point-to-point tests are instantiated according to the test type, source group, target group and distribution specified when the test is added to the test batch.
To run LNET self-test, load following modules: libcfs, lnet, lnet_selftest and any one of the klnds (ksocklnd, ko2iblnd...). To load all necessary modules, run modprobe lnet_selftest, which recursively loads the modules on which lnet_selftest depends.
There are two types of nodes for LNET self-test: console and test. Both node types require all previously-specified modules to be loaded. (The userspace test node does not require these modules).
Test nodes can either be in kernel or in userspace. A console user can invite a kernel test node to join the test session by running lst add_group NID, but the user cannot actively add a userspace test node to the test-session. However, the console user can passively accept a test node to the test session while the test node runs lst client to connect to the console.
LNET self-test includes two user utilities, lst and lstclient.
lst is the user interface for the self-test console (run on console node). It provides a list of commands to control the entire test system, such as create session, create test groups, etc.
lstclient is the userspace self-test program which is linked with userspace LNDs and LNET. A user can invoke lstclient to join a self-test session:
lstclient -sesid CONSOLE_NID group NAME
This is an example of an LNET self-test script which simulates the traffic pattern of a set of Lustre servers on a TCP network, accessed by Lustre clients on an IB network (connected via LNET routers), with half the clients reading and half the clients writing.
#!/bin/bash export LST_SESSION=$$ lst new_session read/write lst add_group servers 192.168.10.[8,10,12-16]@tcp lst add_group readers 192.168.1.[1-253/2]@o2ib lst add_group writers 192.168.1.[2-254/2]@o2ib lst add_batch bulk_rw lst add_test --batch bulk_rw --from readers --to servers brw read check=simple size=1M lst add_test --batch bulk_rw --from writers --to servers brw write check=full size=4K # start running lst run bulk_rw # display server stats for 30 seconds lst stat servers & sleep 30; kill $! # tear down lst end_session
The plot-llstat utility plots Lustre statistics.
plot-llstat results_filename [parameter_index]
Value of parameter_index can be:
|
The plot-llstat utility generates a CSV file and instruction files for gnuplot from llstat output. Since llstat is generic in nature, plot-llstat is also a generic script. The value of parameter_index can be 1 for count per interval, 2 for count per second (default setting) or 3 for total count.
The plot-llstat utility creates a .dat (CSV) file using the number of operations specified by the user. The number of operations equals the number of columns in the CSV file. The values in those columns are equal to the corresponding value of parameter_index in the output file.
The plot-llstat utility also creates a .scr file that contains instructions for gnuplot to plot the graph. After generating the .dat and .scr files, the plot llstat tool invokes gnuplot to display the graph.
llstat -i2 -g -c lustre-OST0000 > log plot-llstat log 3
The routerstat utility prints Lustre router statistics.
routerstat [interval]
The routerstat utility watches LNET router statistics. If no interval is specified, then statistics are sampled and printed only one time. Otherwise, statistics are sampled and printed at the specified interval (in seconds).
The routerstat output includes the following fields:
Routerstat extracts statistics data from:
/proc/sys/lnet/stats
The ll_recover_lost_found_objs utility helps recover Lustre OST objects (file data) from a lost and found directory back to their correct locations.
Running the ll_recover_lost_found_objs tool is not strictly necessary to bring an OST back online, it just avoids losing access to objects that were moved to the lost and found directory due to directory corruption.
$ ll_recover_lost_found_objs [-hv] -d directory
The first time Lustre writes to an object, it saves the MDS inode number and the objid as an extended attribute on the object, so in case of directory corruption of the OST, it is possible to recover the objects. Running e2fsck fixes the corrupted OST directory, but it puts all of the objects into a lost and found directory, where they are inaccessible to Lustre. Use the ll_recover_lost_found_objs utility to recover all (or at least most) objects from a lost and found directory back to their place in the O/0/d* directories.
To use ll_recover_lost_found_objs, mount the file system locally (using the -t ldiskfs command), run the utility and then unmount it again. The OST must not be mounted by Lustre when ll_recover_lost_found_objs is run.
ll_recover_lost_found_objs -d /mnt/ost/lost+found
Copyright © 2010 Sun Microsystems, Inc. All rights reserved.