sql server - Where is the official documentation for T-SQL's "ORDER BY RAND()" and "ORDER BY NEWID()"? -
i'm looking official t-sql documentation "order rand()" , "order newid()". there numerous articles describing them, must documented somewhere.
i'm looking link official sql server documentation page this: http://technet.microsoft.com/en-us/library/ms188385.aspx
clarification:
what i'm looking documentation "order_by_expression" explains difference in behavior between nonnegative integer constant, function returns nonnegative integer, , function returns other value (like rand() or newid()).
answer:
appologize lack of clarity in original question. programming-related problems, solution problem figuring out question you're trying answer.
thank everyone.
answer in document: from: http://www.wiscorp.com/sql200n.zip
information technology — database languages — sql — part 2: foundation (sql/foundation) 22.2 <direct select statement: multiple rows> includes <cursor specification>.
at point have first half of answer:
select statment type of cursor, means operations can performed iteratively on each row. although haven't found statement in docs explicity says it, i'm content assume expression in order_by_expression executed each row.
makes sense happening when use rand() or newid() or ceiling(rand() + .5) / 2 opposed numeric constant or column name.
expression never treated column number. value generated each row used basis determining order of rows.
however, thoroughness, let's continue full definition of expression can be.
14.3 <cursor specification> includes order <sort specification list>. 10.10 <sort specification list> defines: <sort specification> ::= <sort key> [ <ordering specification> ] [ <null ordering> ] <sort key> ::= <value expression> <ordering specification> ::= asc | desc <null ordering> ::= nulls first | nulls last
which takes to:
6.25 <value expression>
where find second half of answer:
<value expression> ::= <common value expression> | <boolean value expression> | <row value expression> <common value expression> ::= <numeric value expression> | <string value expression> | <datetime value expression> | <interval value expression> | <user-defined type value expression> | <reference value expression> | <collection value expression> <user-defined type value expression> ::= <value expression primary> <reference value expression> ::= <value expression primary> <collection value expression> ::= <array value expression> | <multiset value expression>
from here descend numerous possibile types of expressions can used.
newid() returns uniqueidentifier.
seems reasonable assume uniqueidentifiers compared numerically, if expression newid() our <common value expression> <numeric value expression>.
similarly, rand() returns numeric value, , evaluated <numeric value expression>.
so, although wasn't able find in microsoft's offical documentation explains order when called using order_by_expression expression, documented, knew must be.
if you're trying determine why these behave differently, reason simple: 1 evaluated once, , treated runtime constant (rand()
), while other evaluated every single row (newid()
). observe simple example:
select top (5) rand(), newid() sys.objects;
results:
0.240705716465209 8d5d2b55-e5de-4ff9-ba84-bc82f37b8f3a 0.240705716465209 c4cbf1ca-e6d0-4076-b6a6-5048ea612048 0.240705716465209 9bfae5bb-b5b9-47de-b8f9-77aaefa5f9db 0.240705716465209 89ffd8a1-ac73-4ceb-a5c0-00a76d040382 0.240705716465209 bcc89923-735e-43b3-9eca-622a8c98ad7d
now, if apply order left column, sql server says, ok, every single value same, i'm ignore request , move on next order column. if there isn't one, sql server default returning rows in whatever order deems efficient.
if apply order right column, sql server has sort of values. introduces sort
(or topn sort
if top
used) operator plan, , going take more cpu (though overall duration may not substantially affected, depending on size of set , other factors).
let's compare plans these 2 queries:
select rand() sys.all_columns order rand();
the plan:
there no sort operator going on, , both of scans ordered = false
- means sql server has not decided explicitly implement ordering, does not mean order different on each execution - means order non-deterministic (unless add secondary order by
- in case, rand()
ordering still ignored because, well, it's same value on every row).
and newid()
:
select newid() sys.all_columns order newid();
the plan:
there new sort
operator there, means sql server must reorder rows returned in order of generated guid values on each row. scans of course still unordered, sort
applies order.
i don't know specific implementation detail officially documented anywhere, though did find this article includes explicit order newid()
. doubt you'll find official documents order rand()
in way, because doesn't make sense do, officially supported or not.
re: comment sql server assigns a seed value @ random
- should not interpreted a seed value **per row** @ random
. demonstration:
select max(r), min(r) ( select rand() sys.all_columns s1 cross join sys.all_columns s2 ) x(r);
results:
0.4866202638872 0.4866202638872
on machine, took 15 seconds run, , results same both min
, max
. keep increasing number of rows returned , amount of time takes, , guarantee continue see exact same value rand()
on every row. calculated once, , not because sql server wise fact not returning of rows. yielded same result (and took under 2 minutes populate entire table 72 million rows):
select rand() r #x sys.all_columns s1 cross join sys.all_columns s2 cross join sys.all_columns s3; select max(r), min(r) #x;
(in fact select
took long initial population. not try on single-core laptop 4gb of ram.)
the result:
0.302690214345828 0.302690214345828
Comments
Post a Comment