Forum Post: RE: GUID/sequence

Status
Not open for further replies.
A

agent_008_nl

Guest
According to what is written in the link I sent: www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database you can expect the comb guid creation to be a bit slower than the plain guid as we have it now. The clustered indexes would be needed for the real benefits: faster reads. I'll copy a discussion from the peg, you can find more info there. And maybe some who involved in the discussion at exchange want to explain more. Re: clustered indexesImportant mainly because of your interaction with messages in the conversation. Click to teach Gmail this conversation is not important. pegx Tom BascomOn 6/4/14, 2:39 AM, Stefan Houtzager wrote: Has the possibity to mark an ... Jun 4 On 6/4/14, 2:39 AM, Stefan Houtzager wrote: Has the possibity to mark an ... Tom Bascom Jun 4 to peg On 6/4/14, 2:39 AM, Stefan Houtzager wrote: Has the possibity to mark an index as clustered in a progress db been considered bij psc by the way? A clustered index is one where the data is arranged in index order on disk. (The name strikes me as misleading -- "clustered data" seems more appropriate...) There can only be one such index per table because the data can only be in one "physical" order. When reading data in the clustered index order you can mostly skip reading the index -- so it is much faster (sort of like the oe11 TABLE-SCAN option except that the data is ordered). But when writing it can be *very* expensive -- if you insert a record in the middle everything above it has to be shifted. Personally I don't see the feature as being very attractive. I suspect that it gets a lot of attention because in the Microsoft world they get created automatically and they are a way to do fast table scans. So it is "received wisdom" that clustered indexes are an important feature. Somewhat like the idea that fields should have a known width that is never exceeded ;) The ordering part is, maybe, occasionally useful but the write-performance trade-off seems far too high to me. I also suspect that a great many table scans either don't care about the ordering or will need a sort order different from such a key. My understanding is that with MS SQL these are often found in association with the "primary" key which in most Progress databases that I have seen is probably not the order that you are likely to want data in. Lastly -- if a GUID is used for the key to cluster around it would likely be an even less useful sort order. Even if the GUID were to be a monotonically increasing sort order the replication/merge use-case, which IMHO is the strongest reason to consider GUIDs as keys, makes the order attribute basically useless. One Progress-specific issue that immediately springs to mind when thinking about Progress is that RECIDs would become volatile even within a session (not just across dump & load etc). You could read a record, someone could insert another record in the table, and your RECID would no longer point to what it used to point to. -- Tom Bascom Reply Reply to all Forward Rob FitzpatrickFollowing that train of thought re volatile recids, if an insert of one recor... Jun 4 Following that train of thought re volatile recids, if an insert of one recor... Rob FitzpatrickJun 4 Loading... Rob Fitzpatrick Jun 4 to peg Following that train of thought re volatile recids, if an insert of one record caused a change in the recids of all the following records (in clustered index order) then wouldn't any index entries for those records in other indexes on that table also require the updated recids? If so I wonder how SQL Server deals with that. Or are their data structures and physical storage layout so different from Progress' that the question isn't meaningful? -----Original Message----- From: Tom Bascom One Progress-specific issue that immediately springs to mind when thinking about Progress is that RECIDs would become volatile even within a session (not just across dump & load etc). You could read a record, someone could insert another record in the table, and your RECID would no longer point to what it used to point to. Reply Reply to all Forward Gus Bjorklundwhat makes you guys think that everything else moves when you add a row to a ... Jun 4 what makes you guys think that everything else moves when you add a row to a ... Gus BjorklundJun 4 Loading... Gus Bjorklund Jun 4 to peg On 6/4/14 9:17 AM, "Rob Fitzpatrick" wrote: Following that train of thought re volatile recids, if an insert of one record caused a change in the recids of all the following records (in clustered index order) then wouldn't any index entries for those records in other indexes on that table also require the updated recids? If so I wonder how SQL Server deals with that. Or are their data structures and physical storage layout so different from Progress' that the question isn't meaningful? what makes you guys think that everything else moves when you add a row to a table that has a clustered index ? -gus Reply Reply to all Forward 12 older messagesTim KuehnFrom the definition - "data is in index order on disk" Jun 4 From the definition - "data is in index order on disk" Tim KuehnJun 4 Loading... Tim Kuehn Jun 4 to peg On Wed, Jun 4, 2014 at 9:45 AM, Gus Bjorklund wrote: what makes you guys think that everything else moves when you add a row to a table that has a clustered index ? From the definition - "data is in index order on disk" -- Tim Kuehn: Senior Consultant - TDK Consulting Services Ontario PUG President PUG Challenge Americas Executive, Program Committee Chair Reply Reply to all Forward Rob FitzpatrickGus, I don't know how clustered indexes work. That's why I wrote my question ... Jun 4 Gus, I don't know how clustered indexes work. That's why I wrote my question ... Rob FitzpatrickJun 4 Loading... Rob Fitzpatrick Jun 4 to peg Gus, I don't know how clustered indexes work. That's why I wrote my question in conditional form. If the condition is false then my question is moot and can be ignored. -----Original Message----- what makes you guys think that everything else moves when you add a row to a table that has a clustered index ? -gus Reply Reply to all Forward DavidMSSQL will leave some gaps, especially when using something like a guid. I gu... Jun 4 MSSQL will leave some gaps, especially when using something like a guid. I gu... DavidJun 4 Loading... DavidMSSQL will leave some gaps, especially when using something like a guid. I gu... Jun 4 MSSQL will leave some gaps, especially when using something like a guid. I gu... David Jun 4 to peg On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote: Gus, I don't know how clustered indexes work. That's why I wrote my question in conditional form. If the condition is false then my question is moot and can be ignored. -----Original Message----- what makes you guys think that everything else moves when you add a row to a table that has a clustered index ? -gus MSSQL will leave some gaps, especially when using something like a guid. I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random. Also, the clustered key becomes the implied last component of all other indexes. The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column) Reply Reply to all Forward Tom BascomBecause that's what the Wiki that I found for "clustered indexes" says happen... Jun 4 Because that's what the Wiki that I found for "clustered indexes" says happen... Tom BascomJun 4 Loading... Tom Bascom Jun 4 to peg On 6/4/14, 10:12 AM, David wrote: On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote: Gus, I don't know how clustered indexes work. That's why I wrote my question in conditional form. If the condition is false then my question is moot and can be ignored. -----Original Message----- what makes you guys think that everything else moves when you add a row to a table that has a clustered index ? Because that's what the Wiki that I found for "clustered indexes" says happens. Various other material says the same thing. It passes the sniff test because of the "stored in order on disk" characteristic. I suppose it would not be a problem if records are always appended but I did say "insert a record in the middle". -gus MSSQL will leave some gaps, especially when using something like a guid. I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random. Also, the clustered key becomes the implied last component of all other indexes. The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column) So maybe it doesn't *always* cause everything to move but eventually gaps will get filled and when they do it is going to be really painful. Sounds like a bad idea to me. -- Tom Bascom Reply Reply to all Forward DavidLogically, the rows are ordered, but not necessarily physically. That is why ... Jun 4 Logically, the rows are ordered, but not necessarily physically. That is why ... DavidJun 4 Loading... David Jun 4 to peg Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. On 6/4/2014 9:44 AM, Tom Bascom wrote: On 6/4/14, 10:12 AM, David wrote: On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote: Gus, I don't know how clustered indexes work. That's why I wrote my question in conditional form. If the condition is false then my question is moot and can be ignored. -----Original Message----- what makes you guys think that everything else moves when you add a row to a table that has a clustered index ? Because that's what the Wiki that I found for "clustered indexes" says happens. Various other material says the same thing. It passes the sniff test because of the "stored in order on disk" characteristic. I suppose it would not be a problem if records are always appended but I did say "insert a record in the middle". -gus MSSQL will leave some gaps, especially when using something like a guid. I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random. Also, the clustered key becomes the implied last component of all other indexes. The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column) So maybe it doesn't *always* cause everything to move but eventually gaps will get filled and when they do it is going to be really painful. Sounds like a bad idea to me. Reply Reply to all Forward Thomas Mercer-Hursh, Ph.D.Is it, perhaps, the case that clustered index is used only on table which are... Jun 4 Is it, perhaps, the case that clustered index is used only on table which are... Tom BascomSorry, you lost me. I would have thought that the index is the "logical" part... Jun 4 Sorry, you lost me. I would have thought that the index is the "logical" part... Tom BascomJun 4 Loading... Tom Bascom Jun 4 to peg On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? -- Tom Bascom Reply Reply to all Forward DavidI can't recall the specifics, but there is some kind of pointer left where th... Jun 4 I can't recall the specifics, but there is some kind of pointer left where th... DavidJun 4 Loading... David Jun 4 to peg On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". Reply Reply to all Forward Gus Bjorklundthere are a variety of ways to implement storage for clustered index tables. ... Jun 4 there are a variety of ways to implement storage for clustered index tables. ... Gus BjorklundJun 4 Loading... Gus Bjorklund Jun 4 to peg On 6/4/14 11:07 AM, "Tom Bascom" wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? there are a variety of ways to implement storage for clustered index tables. and for handling inserts and page splits. you don't need to reorganize the whole table for every insert. that would negate whatever advantage you might get from the clustering. note that "physical order" does not have much meaning with modern disk and disk arrays. nobody stores things in disk sector order anymore. clustered indexes serve to reduce the I/O overhead but don't have to be perfectly optimized. good enough is good enough. regards, gus "Culture eats strategy for breakfast." -- Peter Drucker Reply Reply to all Forward Tom BascomThat would seem to defeat the whole "in order on disk" assumption that the be... Jun 4 That would seem to defeat the whole "in order on disk" assumption that the be... Tom BascomJun 4 Loading... Tom Bascom Jun 4 to peg On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? -- Tom Bascom Reply Reply to all Forward DavidYou have some control over how rare. If you have an ascending numeric key, yo... Jun 4 You have some control over how rare. If you have an ascending numeric key, yo... DavidJun 4 Loading... David Jun 4 to peg On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. Reply Reply to all Forward Stefan Houtzager Jun 5 to David, peg Seems like important content in the article I sent ( www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database ) is forgotten by some in this thread. "This article outlines an approach for using GUID values as primary keys/clustered indexes that avoids most of the normal disadvantages, adapting the COMB model for sequential GUIDs". So it is not about the normal guid with a clustered index. On Wed, Jun 4, 2014 at 5:44 PM, David wrote: On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager Reply Reply to all Forward Tom Bascom Jun 5 to me, peg We didn't forget that. It has been mentioned several times. I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids. -- Tom Bascom On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote: Seems like important content in the article I sent ( www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database ) is forgotten by some in this thread. "This article outlines an approach for using GUID values as primary keys/clustered indexes that avoids most of the normal disadvantages, adapting the COMB model for sequential GUIDs". So it is not about the normal guid with a clustered index. On Wed, Jun 4, 2014 at 5:44 PM, David wrote: On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager Reply Reply to all Forward Stefan Houtzager Jun 5 to Tom, peg Ah, dummy me. David wrote "The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts." indeed. On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom wrote: We didn't forget that. It has been mentioned several times. I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids. -- Tom Bascom On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote: Seems like important content in the article I sent ( www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database ) is forgotten by some in this thread. "This article outlines an approach for using GUID values as primary keys/clustered indexes that avoids most of the normal disadvantages, adapting the COMB model for sequential GUIDs". So it is not about the normal guid with a clustered index. On Wed, Jun 4, 2014 at 5:44 PM, David wrote: On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager Reply Reply to all Forward Marian Edu Jun 5 to me, peg have found this video to be quite helpful on explaining the differences - www.youtube.com/watch main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :) didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases. and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one On 06/05/2014 02:09 PM, Stefan Houtzager wrote: Ah, dummy me. David wrote "The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts." indeed. On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom wrote: We didn't forget that. It has been mentioned several times. I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids. -- Tom Bascom On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote: Seems like important content in the article I sent ( www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database ) is forgotten by some in this thread. "This article outlines an approach for using GUID values as primary keys/clustered indexes that avoids most of the normal disadvantages, adapting the COMB model for sequential GUIDs". So it is not about the normal guid with a clustered index. On Wed, Jun 4, 2014 at 5:44 PM, David wrote: On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager -- m.edu keep it simple http://www.ganimede.ro ro.linkedin.com/.../marianedu Reply Reply to all Forward Stefan Houtzager Jun 5 to Marian, peg the performance impact of using guid (string) as pk is more on the read end than on insert/create one And that's where the 'comb guid' with the clustered index comes in (see the article). On Thu, Jun 5, 2014 at 2:38 PM, Marian Edu wrote: have found this video to be quite helpful on explaining the differences - www.youtube.com/watch main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :) didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases. and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one On 06/05/2014 02:09 PM, Stefan Houtzager wrote: Ah, dummy me. David wrote "The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts." indeed. On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom wrote: We didn't forget that. It has been mentioned several times. I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids. -- Tom Bascom On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote: Seems like important content in the article I sent ( www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database ) is forgotten by some in this thread. "This article outlines an approach for using GUID values as primary keys/clustered indexes that avoids most of the normal disadvantages, adapting the COMB model for sequential GUIDs". So it is not about the normal guid with a clustered index. On Wed, Jun 4, 2014 at 5:44 PM, David wrote: On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager -- m.edu keep it simple http://www.ganimede.ro ro.linkedin.com/.../marianedu -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager Reply Reply to all Forward Tom Bascom Jun 5 to peg Interesting. I don't think I followed everything. I especially don't understand the point of apparently duplicating the data in both the index and in the table. I'm going to need to go back and watch it again. On 6/5/14, 8:38 AM, Marian Edu wrote: have found this video to be quite helpful on explaining the differences - www.youtube.com/watch main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :) didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases. and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one On 06/05/2014 02:09 PM, Stefan Houtzager wrote: Ah, dummy me. David wrote "The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts." indeed. On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom wrote: We didn't forget that. It has been mentioned several times. I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids. -- Tom Bascom On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote: Seems like important content in the article I sent ( www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database ) is forgotten by some in this thread. "This article outlines an approach for using GUID values as primary keys/clustered indexes that avoids most of the normal disadvantages, adapting the COMB model for sequential GUIDs". So it is not about the normal guid with a clustered index. On Wed, Jun 4, 2014 at 5:44 PM, David wrote: On 6/4/2014 10:20 AM, Tom Bascom wrote: On 6/4/14, 11:13 AM, David wrote: On 6/4/2014 10:07 AM, Tom Bascom wrote: On 6/4/14, 10:59 AM, David wrote: Logically, the rows are ordered, but not necessarily physically. That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL. Sorry, you lost me. I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion. Can you elaborate? I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be. Like a "jump to this block". That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on. I guess that if it is a rare event that it might not be too horrible? You have some control over how rare. If you have an ascending numeric key, you can pack rows pretty tightly. If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index. This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives. The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts. -- Kind regards, Stefan Houtzager Houtzager ICT consultancy & development www.linkedin.com/in/stefanhoutzager -- Tom Bascom Reply Reply to all Forward Gus Bjorklundmaybe. there are tradeoffs, always these d**n tradeoffs. when multiple users ... Jun 5 maybe. there are tradeoffs, always these d**n tradeoffs. when multiple users ... Gus BjorklundJun 5 Loading... Gus Bjorklund Jun 5 to peg On 6/5/14 8:38 AM, "Marian Edu" wrote: and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one maybe. there are tradeoffs, always these d**n tradeoffs. when multiple users are making inserts to the same index, if the key values are monotonically increasing then all the inserts can end up in the same leaf block of a b-tree. then there will be contention and for locks on the index block. the transactions serialize and you lose concurrency. regards, gus "Culture eats strategy for breakfast." -- Peter Drucker Reply Reply to all Forward Marian Edutrue, but my point was in normal circumstances one will read that data more t... Jun 5 true, but my point was in normal circumstances one will read that data more t... Marian EduJun 5 Loading... Marian Edu Jun 5 to peg On 06/05/2014 05:29 PM, Gus Bjorklund wrote: On 6/5/14 8:38 AM, "Marian Edu" wrote: and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one maybe. there are tradeoffs, always these d**n tradeoffs. when multiple users are making inserts to the same index, if the key values are monotonically increasing then all the inserts can end up in the same leaf block of a b-tree. then there will be contention and for locks on the index block. the transactions serialize and you lose concurrency. true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event' and that assertion holds true for all indexes, back to those darn trade-offs I guess... how often one insert data in a table, how often data will be read and how will help a designer/dba to put those on balance and try to find the lesser of two evils :) -- m.edu keep it simple http://www.ganimede.ro ro.linkedin.com/.../marianedu mobile: +40 740 036 212 skype: marian.edu Reply Reply to all Forward Tom BascomSome data is write-only. (Most implementations of auditing spring to mind...) Jun 5 Some data is write-only. (Most implementations of auditing spring to mind...) Tom Bascom Jun 5 to peg true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event' Some data is write-only. (Most implementations of auditing spring to mind...) -- Tom Bascom Reply Reply to all Forward Marian Edu Jun 6 to peg On 06/05/2014 09:33 PM, Tom Bascom wrote: true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event' Some data is write-only. (Most implementations of auditing spring to mind...) yeah, but in those cases one could argue there is no need for a primary-key... the most probably index on date (time-stamp) will lead to theracing condition Gus was mentioned on the b-tree leaf anyway since time flow isn't too random unfortunately, always increasing :) but that makes me think... a good index must have a good distribution, does this mean a sequence should not be used as a primary key? An always incrementing number (like order number mentioned before) will lead to a skew index and not so great data distribution... well, luckily it's Friday :) as for clustered index, wouldn't aordinary b-tree index on all table's fields will be the exact equivalent(not considering limitations)... this will of course duplicate the whole table data (table structure and b-tree index) feel like those kind of thoughts would make a great camp fire subject at pug's/exchange events if kept at a level where developers can follow instead of flag it as `dba only` :) - what makes a good index - when enough is enough(don't over index) - composed key vs. single key index (when, why,will a composed key be enough for a query if all fields required are there or will fetch the whole record from table anyway)

Continue reading...
 
Status
Not open for further replies.
Top