3 posts
|
Hello,
My name is Dubravko Markic and i am working with the XFS filesystem as part of my project. I am using a Linux operating system, or better yet, a SuSe CORE 9 system. I have installed XFS filesystem on one of the partitions on my computer. My XFS filesystem has a default configuration, which means that no real-time subvolume is enabled. My question to you is: what do i do in order to enable real-time subvolume on my XFS filesystem? Also if i understood correctly under Linux one does not have the GRIO (guranteed rate I/O) option, this is only available under the IRIX operating system, right? Thanks a bunch Regards, Dubo |
138 posts
|
Dubravko Markic wrote:
> Hello, > My name is Dubravko Markic and i am working with the XFS > filesystem as part of my project. I am using a Linux operating system, > or better yet, a SuSe CORE 9 system. I have installed XFS filesystem on > one of the partitions on my computer. My XFS filesystem has a default > configuration, which means that no real-time subvolume is enabled. > > My question to you is: what do i do in order to enable real-time > subvolume on my XFS filesystem? If it is truly not enabled on your SuSE, then you will need to rebuild xfs after changing CONFIG_XFS_RT. But: penguin3:~ # zcat /proc/config.gz | grep XFS_RT CONFIG_XFS_RT=y penguin3:~ # uname -a Linux penguin3 2.6.5-7.193-debug #1 SMP Wed Jul 20 14:39:18 UTC 2005 i686 i686 i386 GNU/Linux This is a SLES9SP2 box, but it appears that recent SuSE kernels are RT-enabled. > Also if i understood correctly under > Linux one does not have the GRIO (guranteed rate I/O) option, this is > only available under the IRIX operating system, right? Thanks a bunch That is basically true, yes. There is a non-free GRIOV2 product in use with CXFS, but for your purposes, I think it is safe to say that there is no standalone GRIO equivalent on Linux. -Eric |
44 posts
|
Eric Sandeen <[hidden email]> writes:
> > That is basically true, yes. There is a non-free GRIOV2 product in > use with CXFS, but for your purposes, I think it is safe to say that > there is no standalone GRIO equivalent on Linux. It's not. In fact it's a standard feature now. The CFQ2 IO scheduler has IO priorities settable with ionice, including a RT class with 8 priorities. It's not available in SLES9 though, only in newer kernels (2.6.13+) and SUSE releases (like SL10.0) -Andi |
138 posts
|
Andi Kleen wrote:
> Eric Sandeen <[hidden email]> writes: > >>That is basically true, yes. There is a non-free GRIOV2 product in >>use with CXFS, but for your purposes, I think it is safe to say that >>there is no standalone GRIO equivalent on Linux. > > > It's not. In fact it's a standard feature now. Well, I stand corrected then :) > The CFQ2 IO scheduler has IO priorities settable with ionice, including > a RT class with 8 priorities. Well, that still sounds a bit different from the original irix GRIO implemenation, FWIW. grio - guaranteed-rate I/O DESCRIPTION Guaranteed-rate I/O (GRIO) refers to a guarantee made by the system to a user process indicating that the given process will receive data from a system resource at a predefined rate regardless of any other activity on the system. While 2.6 can set priorities on IO, it does not offer a hard guaranteed IO rate, does it? Now, I'm not necessarily saying one scheme is necessarily better or worse than the other, but they are different, I think. -Eric |
28 posts
|
In reply to this post by Andi Kleen
Andi Kleen wrote:
> Eric Sandeen <[hidden email]> writes: > >>That is basically true, yes. There is a non-free GRIOV2 product in >>use with CXFS, but for your purposes, I think it is safe to say that >>there is no standalone GRIO equivalent on Linux. > > > It's not. In fact it's a standard feature now. > > The CFQ2 IO scheduler has IO priorities settable with ionice, including > a RT class with 8 priorities. > > It's not available in SLES9 though, only in newer kernels (2.6.13+) > and SUSE releases (like SL10.0) > > -Andi > Ah, but is there a bandwidth reservation system to go with it? That is the missing link here, being able to say 'I need 10 Mbytes/sec until further notice'. Steve |
44 posts
|
On Tuesday 04 October 2005 17:41, Steve Lord wrote:
> Andi Kleen wrote: > > Eric Sandeen <[hidden email]> writes: > >>That is basically true, yes. There is a non-free GRIOV2 product in > >>use with CXFS, but for your purposes, I think it is safe to say that > >>there is no standalone GRIO equivalent on Linux. > > > > It's not. In fact it's a standard feature now. > > > > The CFQ2 IO scheduler has IO priorities settable with ionice, including > > a RT class with 8 priorities. > > > > It's not available in SLES9 though, only in newer kernels (2.6.13+) > > and SUSE releases (like SL10.0) > > > > -Andi > > Ah, but is there a bandwidth reservation system to go with it? That is > the missing link here, being able to say 'I need 10 Mbytes/sec until > further notice'. Indirectly there is. The RT priority defines how many time slots you get and the length of the timeslots are configurable using sysfs. If you know the bandwidth of the disk you can use that to define an approximation of the guaranteed bandwidth for a specific RT priority. On the other hand I don't really see how you can get real bandwidths. e.g. on most disks the bandwidth varies greatly depending on where the blocks are allocated and how much seeking it does. If you take that all in account you'll probably get a pretty slow worst case as baseline to divide. There is nothing that actually gives out bandwidths, but you could do that in user space. Jens may correct me if I'm wrong on anything. -Andi |
28 posts
|
Andi Kleen wrote:
> On Tuesday 04 October 2005 17:41, Steve Lord wrote: > >>Andi Kleen wrote: >> >>>Eric Sandeen <[hidden email]> writes: >>> >>>>That is basically true, yes. There is a non-free GRIOV2 product in >>>>use with CXFS, but for your purposes, I think it is safe to say that >>>>there is no standalone GRIO equivalent on Linux. >>> >>>It's not. In fact it's a standard feature now. >>> >>>The CFQ2 IO scheduler has IO priorities settable with ionice, including >>>a RT class with 8 priorities. >>> >>>It's not available in SLES9 though, only in newer kernels (2.6.13+) >>>and SUSE releases (like SL10.0) >>> >>>-Andi >> >>Ah, but is there a bandwidth reservation system to go with it? That is >>the missing link here, being able to say 'I need 10 Mbytes/sec until >>further notice'. > > > Indirectly there is. The RT priority defines how many time slots you > get and the length of the timeslots are configurable using sysfs. If you know > the bandwidth of the disk you can use that to define an approximation > of the guaranteed bandwidth for a specific RT priority. > > On the other hand I don't really see how you can get real bandwidths. > e.g. on most disks the bandwidth varies greatly depending on where > the blocks are allocated and how much seeking it does. If you take > that all in account you'll probably get a pretty slow worst case > as baseline to divide. ... [show rest of quote] If you get into this stuff seriously you have to dedicate hardware all the way from the cpu to the disks, make worst case estimates of how fast your disks will go. Multiply all this out and say that to record n streams of HD video without dropping a frame you need this much hardware. It is a bit of a black magic art, and not something you just go out and buy a PC to do. It tends to waste a lot of the potential bandwidth of the hardware, but you try explaining to CNN why their satellite feed just went black ;-) Steve > > There is nothing that actually gives out bandwidths, but you could > do that in user space. > > Jens may correct me if I'm wrong on anything. > > -Andi > |
9 posts
|
On Tue, Oct 04 2005, Steve Lord wrote:
> Andi Kleen wrote: > >On Tuesday 04 October 2005 17:41, Steve Lord wrote: > > > >>Andi Kleen wrote: > >> > >>>Eric Sandeen <[hidden email]> writes: > >>> > >>>>That is basically true, yes. There is a non-free GRIOV2 product in > >>>>use with CXFS, but for your purposes, I think it is safe to say that > >>>>there is no standalone GRIO equivalent on Linux. > >>> > >>>It's not. In fact it's a standard feature now. > >>> > >>>The CFQ2 IO scheduler has IO priorities settable with ionice, including > >>>a RT class with 8 priorities. > >>> > >>>It's not available in SLES9 though, only in newer kernels (2.6.13+) > >>>and SUSE releases (like SL10.0) > >>> > >>>-Andi > >> > >>Ah, but is there a bandwidth reservation system to go with it? That is > >>the missing link here, being able to say 'I need 10 Mbytes/sec until > >>further notice'. > > > > > >Indirectly there is. The RT priority defines how many time slots you > >get and the length of the timeslots are configurable using sysfs. If you > >know the bandwidth of the disk you can use that to define an approximation > >of the guaranteed bandwidth for a specific RT priority. > > > >On the other hand I don't really see how you can get real bandwidths. > >e.g. on most disks the bandwidth varies greatly depending on where > >the blocks are allocated and how much seeking it does. If you take > >that all in account you'll probably get a pretty slow worst case > >as baseline to divide. > > If you get into this stuff seriously you have to dedicate hardware > all the way from the cpu to the disks, make worst case estimates of how > fast your disks will go. Multiply all this out and say that to record n > streams of HD video without dropping a frame you need this much hardware. > It is a bit of a black magic art, and not something you just go out and > buy a PC to do. It tends to waste a lot of the potential bandwidth of > the hardware, but you try explaining to CNN why their satellite feed just > went black ;-) ... [show rest of quote] That's exactly why the Linux ioprio stuff has been designed the way it is right now - it's not overengineered for something we cannot support anyways. The CFQ io priorities will work well enough for general use, if you are basing your business on GRIO it's a different game completely. I don't want to add kernel infrastructure for something that is very specialized, especially because the code to do so would be 10 times bigger and more complex that the current stuff.. -- Jens Axboe |
138 posts
|
Jens Axboe wrote:
> That's exactly why the Linux ioprio stuff has been designed the way it > is right now - it's not overengineered for something we cannot support > anyways. The CFQ io priorities will work well enough for general use, if > you are basing your business on GRIO it's a different game completely. I > don't want to add kernel infrastructure for something that is very > specialized, especially because the code to do so would be 10 times > bigger and more complex that the current stuff.. Jens, I didn't mean to imply that you -should- have done a GRIO-type design (and I doubt that Steve did, either.) My only point was that GRIO and ioprio are two different IO control mechanisms. -Eric |
9 posts
|
On Wed, Oct 05 2005, Eric Sandeen wrote:
> Jens Axboe wrote: > >That's exactly why the Linux ioprio stuff has been designed the way it > >is right now - it's not overengineered for something we cannot support > >anyways. The CFQ io priorities will work well enough for general use, if > >you are basing your business on GRIO it's a different game completely. I > >don't want to add kernel infrastructure for something that is very > >specialized, especially because the code to do so would be 10 times > >bigger and more complex that the current stuff.. > > Jens, I didn't mean to imply that you -should- have done a GRIO-type > design (and I doubt that Steve did, either.) My only point was that > GRIO and ioprio are two different IO control mechanisms. Oh I agree, was mainly trying to clarify that they pertain to two different market segments. Sorry if that wasn't clear, it's not a criticism of GRIO (I don't really know anything about SGI's GRIO). -- Jens Axboe |
44 posts
|
In reply to this post by Eric Sandeen
On Wednesday 05 October 2005 16:11, Eric Sandeen wrote:
> Jens Axboe wrote: > > That's exactly why the Linux ioprio stuff has been designed the way it > > is right now - it's not overengineered for something we cannot support > > anyways. The CFQ io priorities will work well enough for general use, if > > you are basing your business on GRIO it's a different game completely. I > > don't want to add kernel infrastructure for something that is very > > specialized, especially because the code to do so would be 10 times > > bigger and more complex that the current stuff.. > > Jens, I didn't mean to imply that you -should- have done a GRIO-type > design (and I doubt that Steve did, either.) My only point was that > GRIO and ioprio are two different IO control mechanisms. I suspect for most people they will be pretty much equivalent. -Andi |
9 posts
|
On Wed, Oct 05 2005, Andi Kleen wrote:
> On Wednesday 05 October 2005 16:11, Eric Sandeen wrote: > > Jens Axboe wrote: > > > That's exactly why the Linux ioprio stuff has been designed the way it > > > is right now - it's not overengineered for something we cannot support > > > anyways. The CFQ io priorities will work well enough for general use, if > > > you are basing your business on GRIO it's a different game completely. I > > > don't want to add kernel infrastructure for something that is very > > > specialized, especially because the code to do so would be 10 times > > > bigger and more complex that the current stuff.. > > > > Jens, I didn't mean to imply that you -should- have done a GRIO-type > > design (and I doubt that Steve did, either.) My only point was that > > GRIO and ioprio are two different IO control mechanisms. > > I suspect for most people they will be pretty much equivalent. Indeed, only for the 'obscure' life-or-death type setups will there be a real difference. -- Jens Axboe |
44 posts
|
On Wednesday 05 October 2005 16:20, Jens Axboe wrote:
> Indeed, only for the 'obscure' life-or-death type setups will there be a > real difference. Even for those you could in theory do bandwidth allocation on top of the RT classes with some tweaking of the time slices and knowing the worst case transfer rate of the HD, couldn't you? Basically it would be an user space problem. -Andi |
9 posts
|
On Wed, Oct 05 2005, Andi Kleen wrote:
> On Wednesday 05 October 2005 16:20, Jens Axboe wrote: > > > Indeed, only for the 'obscure' life-or-death type setups will there be a > > real difference. > > Even for those you could in theory do bandwidth allocation on top of > the RT classes with some tweaking of the time slices and knowing the worst > case transfer rate of the HD, couldn't you? Basically it would be an > user space problem. There are still unknowns, the HD still being the biggest one of course. The problem is that you don't know the worst case HD performance, it might be doing all sorts of rewriting, calibration, error correct etc that can still screw you. So I think without definitely knowledge of what the HD will do in case of errors (or a way to control that which you definitely can on some drives), it's still pretty hazy. It gets better, but if you are looking for complete guarantees I don't think it's good enough. -- Jens Axboe |
44 posts
|
On Wednesday 05 October 2005 16:58, Jens Axboe wrote:
> There are still unknowns, the HD still being the biggest one of course. > The problem is that you don't know the worst case HD performance, it > might be doing all sorts of rewriting, calibration, error correct etc > that can still screw you. So I think without definitely knowledge of > what the HD will do in case of errors (or a way to control that which > you definitely can on some drives), it's still pretty hazy. It gets > better, but if you are looking for complete guarantees I don't think > it's good enough. Yes, but GRIO has exactly the same problem. I assume they need custom calibration for each IO subsystem. -Andi |
9 posts
|
On Wed, Oct 05 2005, Andi Kleen wrote:
> On Wednesday 05 October 2005 16:58, Jens Axboe wrote: > > > There are still unknowns, the HD still being the biggest one of course. > > The problem is that you don't know the worst case HD performance, it > > might be doing all sorts of rewriting, calibration, error correct etc > > that can still screw you. So I think without definitely knowledge of > > what the HD will do in case of errors (or a way to control that which > > you definitely can on some drives), it's still pretty hazy. It gets > > better, but if you are looking for complete guarantees I don't think > > it's good enough. > > Yes, but GRIO has exactly the same problem. I assume they need custom > calibration for each IO subsystem. Indeed it does, and yes if they really want to provide the type of guarantees that Steve listed, then that needs a custom box with either custom or known disk firmware options. If not you cannot give absolute guarantees and expect to always honor them. That's in addition to anything you may need to change in software, if using Linux you would need to audit/fix lots of things in the io path. -- Jens Axboe |
28 posts
|
Jens Axboe wrote:
> On Wed, Oct 05 2005, Andi Kleen wrote: > >>On Wednesday 05 October 2005 16:58, Jens Axboe wrote: >> >> >>>There are still unknowns, the HD still being the biggest one of course. >>>The problem is that you don't know the worst case HD performance, it >>>might be doing all sorts of rewriting, calibration, error correct etc >>>that can still screw you. So I think without definitely knowledge of >>>what the HD will do in case of errors (or a way to control that which >>>you definitely can on some drives), it's still pretty hazy. It gets >>>better, but if you are looking for complete guarantees I don't think >>>it's good enough. >> >>Yes, but GRIO has exactly the same problem. I assume they need custom >>calibration for each IO subsystem. > > > Indeed it does, and yes if they really want to provide the type of > guarantees that Steve listed, then that needs a custom box with either > custom or known disk firmware options. If not you cannot give absolute > guarantees and expect to always honor them. That's in addition to > anything you may need to change in software, if using Linux you would > need to audit/fix lots of things in the io path. > ... [show rest of quote] Definitely, from my memory, grio had a tool for measuring a disk subsystem. I think reality is closer to overspecing the hardware than to providing an absolute guarantee of bandwidth. Being able to prioritize individual I/O calls is just part of the picture. Steve |
9 posts
|
On Wed, Oct 05 2005, Steve Lord wrote:
> Jens Axboe wrote: > >On Wed, Oct 05 2005, Andi Kleen wrote: > > > >>On Wednesday 05 October 2005 16:58, Jens Axboe wrote: > >> > >> > >>>There are still unknowns, the HD still being the biggest one of course. > >>>The problem is that you don't know the worst case HD performance, it > >>>might be doing all sorts of rewriting, calibration, error correct etc > >>>that can still screw you. So I think without definitely knowledge of > >>>what the HD will do in case of errors (or a way to control that which > >>>you definitely can on some drives), it's still pretty hazy. It gets > >>>better, but if you are looking for complete guarantees I don't think > >>>it's good enough. > >> > >>Yes, but GRIO has exactly the same problem. I assume they need custom > >>calibration for each IO subsystem. > > > > > >Indeed it does, and yes if they really want to provide the type of > >guarantees that Steve listed, then that needs a custom box with either > >custom or known disk firmware options. If not you cannot give absolute > >guarantees and expect to always honor them. That's in addition to > >anything you may need to change in software, if using Linux you would > >need to audit/fix lots of things in the io path. > > > > Definitely, from my memory, grio had a tool for measuring a disk subsystem. > I think reality is closer to overspecing the hardware than to providing an > absolute guarantee of bandwidth. Being able to prioritize individual I/O > calls > is just part of the picture. ... [show rest of quote] The spec'ing part is the last piece in the puzzle, that will only tell you what the io subsystem will typically do. It wont tell you how it behaves in boundary or error conditions. It's trivial to say "oh the disk does 35MiB/sec at the end zone, lets cap it at 20MiB/sec" and have it work 99.9% of the time, the tricky part is the last 0.1%. Or whatever percentage you want to assign to it :-) -- Jens Axboe |
Free forum by Nabble | Disable Popup Ads | Edit this page |