sql - Merge identical databases into one -


we have 15 databases of 75 tables avarage of million rows. same schema different data. have been given requirements client bring 15 1 database. each set of data filtered user’s login.

the changes application have been completed filtering. left task of merging databases one.

the issue conflicting pk , fk pk’s , fk’s of type int have 15 pk ids of 1.

one idea use. net , dbml insert records new records new database letting linq deal pk , fk , using code deal duplicate data.

what other ways there this?

it's never trivial job integrate databases when records don't have unique primary keys in databases. few weeks ago built similar integration script decided use entity framework.

first good news. ef's dbcontext api it's ridiculously easy insert complete object graph , make ef take care of newly generated primary keys foreign keys. reason why easy when object's state changed added all of adhering objects become added , ef figures out right order of inserts. great! made me build core of copy routine in few hours, have been many days if should have done in t-sql example. latter much more error prone too.

of course life isn't that easy. bad news:

  1. this takes tons of machine resources. of course used new context instance each copy step, still had execute program on machine decent processor , fair amount of internal memory. exact specifications don't matter, message is: test largest databases , see kind of beast need. if memory consumption can't managed machine @ disposal, have split routine in smaller chunks, take more programming.

  2. the object graph that's changed added must divergent. mean there should 1-n associations starting root. reason is, ef mark objects added. if somewhere in graph few branches refer same object (because there n-1 association), these "new" objects multiplied, because ef doesn't know identity. example of company -< customer -< order >- ordertype: when there 2 order types, inserting 1 root company 10 customers 10 orders each create 100 order type records in stead of 2.

    so hard part find paths class structure divergent as possible. won't possible. if so, you'll have add leaves of converging paths first. in example: first insert order types. when new company inserted first load existing order types context , add company. link new orders existing order types. can done if can match objects natural keys (in example: order type names), possible.

  3. you must take care not insert multiple copies of master data. suppose order types in previous example same in databases (although primary keys may differ!). order types source database should not reinserted in target database. moreover, must fix references in source data correct records in target database (again matching natural key).

so although wasn't trivial doable , job done in relatively short time. i'm sure other alternatives (t-sql, integration services, bids, if doable @ all) have taken more time or have been more buggy. , problem bugs in area may become apparent later.

i later found out issues describe under 2) related fetching source objects asnotracking. see interesting post: entity framework 6 - use gethashcode(). used asnotracking because performs better , reduces memory consumption.


Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -