2024-04-09

Cloning and Replacing a HDD with a SSD

One summer, I got into a groove of replacing spinning-platter hard drives with solid-state drives in (almost) all the computers in my house. Doing so brought new life into a few aging laptops (one, my wife's, literally went from booting up from a cold start in several minutes to less than one), and the home server (that spends most of its life as a glorified NAS, but also runs a couple virtual machines) showed a marked increase in performace.

I tried a couple different "packaged" software tools, but they either cost money, failed to work on drives not from a specific manufacturer, had very specific requirements (like the old and new drives had to be precisely the same size), were unable to handle any disk errors, or just plain didn't work. After a lot of trial and error, I found a way to do it, for free, in a consistent, repeatable way. I got pretty good with the process, but since it's been a long time, I worry that I might not remember exactly how to do it when the need arises. So, here's my process, as much for my own memory as for helping anyone else out.

The software I used is a "live" installation of GNOME Partition Editor. I've found that a USB stick is the most convenient these days, since optical drives are getting pretty rare. But anything that can be booted from should work. GParted's site has instructions for four different ways to make a bootable environment.

The only piece of hardware (other than bootable media) is a USB hard drive enclosure. For laptops, it's almost certainly a necessity, because you need to have both the old hard drive and the new SSD connected at the same time, and not many laptops have connections for multiple drives. Desktops, you might be able to get away with plugging the SSD directly into the system, if there are enough available connectors.

With both drives connected to the system, and the GParted boot media connected, boot up the computer into the GParted environment. (Getting the system to boot from the USB drive or other bootable media might require holding down a key on startup — personally, I start mashing the entire top row of keys on the keyboard, from Esc across all the function keys, to try to trigger a boot menu. If you get into Windows, holding the shift key when you click Restart from the Start menu should get you into a boot menu with the option to boot from something else.)


GParted's main window, from gparted.org

Once the GParted environment loads up, the most critical part is to identify which hard drive is which device name. Ideally, you should be able to see the hard drive with its existing partitions pretty clearly and distinctly from the new drive with no partitions (or just one empty partition). Use the dropdown button in the upper right to switch disks, and double- and triple-check that you know which drive is the "source" and which is the "target". For the purposes of this walkthrough, I'm considering the original hard drive "sda" and its shiny new replacement "sdb".

The next step is to copy the partition table, including the identifiers, from one disk to the other. While this can be done in one step, I like to keep it in two, because the order of parameters is backwards compared to what I'd expect, and I always feel like I'm wrong if I try to combine it.

Open up an XTerminal window by right-clicking the desktop, and run these two commands:

  • sudo sgdisk --backup=table /dev/sda
  • sudo sgdisk --load-backup=table /dev/sdb

I will say that I've seen some guides suggest that you should run another command to randomize the partition GUIDs, so that both disks don't have the same IDs. In my experience, though, if you do this, and then swap out the drive and try to run the new SSD in place of the old HDD, the OS may not recognize that the drive is the "same", and it'll be unable to find installed programs or files, even if they're still on the same drive letter or mount point. So, don't do that at this step.

With the partition tables copied, go back to GParted's main window, and refresh the drives. You should now see the same partition setup on sdb as you see on sda. Go to the first partition on sda, click Copy, switch over to the first partition on sdb, and click Paste. Repeat for all partitions. (I have come across partitions that are "unrecognized" that I can't copy/paste. Fortunately for me, I haven't needed whatever was on them, so I just keep my head in the sand and hope for the best. It's worked so far, knock on wood.) Once all the copy/paste commands are queued up, click the Apply button, and settle in for a long wait.

After everything's copied, now comes the fun part. Shut the system down, physically uninstall the old hard drive, install the new one in its place, and boot up the system. Assuming everything went according to plan, the system should boot up as if nothing had changed.

Once you've verified that the system accepts the new drive as its own, only then should you attempt to tinker. For instance, if the new drive is larger, you may want to move or resize partitions around. GParted is a good tool for doing that, as well. Make whatever changes you feel like, and then boot into the normal operating system to verify everything still works. Worst case, you can always go back to the original drive and re-clone.

Once you're happy with the new drive, you can decommission the old drive. I would not recommend plugging it into your running OS, since the partition GUIDs are duplicated and it might get confused. GParted doesn't seem to mind this; or you can use another running computer to plug the old drive into to reformat it. Removing and recreating partitions should get rid of the duplicate GUIDs, but if you want to be sure, you can boot into GParted again and run this command to give the partitions new GUIDs. (Make sure you are doing this on the old drive that you're not using anymore.) Note that sdx here is whatever device designation GParted gave the old disk. sudo sgdisk -G /dev/sdx

And that's it.

2020-03-14

Can I get a twist of lime?

So, this little bug has been going around, and enough people believe it's serious enough that it's in the best interests of the people of the world that we take every precaution not to let it spread.

It's not the first time something like this has come up. Ebola, Zika, Swine flu; they were all dangerous, even deadly. But this one, the coronavirus COVID-19, has provoked a response stronger than I've ever personally seen. Probably because this time, I can really see it. I mean, travel restrictions are nothing new, but I've never traveled anywhere I'd need a passport. Hearing about it on the news is pretty typical, too. But with churches, schools, and sporting venues closing down, and companies encouraging and being encouraged to work from home if at all possible, it just feels like this is so much bigger. And instead of my life continuing on mostly normally with perhaps a minor inconvenience or two, things are now very different, with no one really knowing when things might return to normal. 

What's weird is, the things that make me feel the most uneasy are the smallest things. Rush hour streets with barely any traffic? Eh, it's been known to happen from time to time. Empty grocery store shelves? Saw that all the time working in a North Carolina grocery store whenever a few white flakes fell from the sky. No, oddly enough, it's all the hygiene posters on just about every billboard in the office. 

I think part of it is how important it seems for being out of place. You might expect to see reminders of thorough hand washing in the bathrooms or above the kitchen sinks, but as the only things posted on the main notice board in the hallway next to the Employee of the Month awards? It's so odd that it can't help but call attention to itself. 

And, I'll admit, part of that comes from certain video games. When the game artists put together the world, they have to consciously decorate with posters and signs that one might expect to see; and in a world that is ravaged by a deadly virus, the signage they tend to come up with are focused on that, reminding people to practice good hygiene and get proper medical attention long after most of the world's inhabitants have fallen (leaving you mostly alone to face off against hordes of bad guys).

These simple reminders to wash your hands have become a sort of visual symbol of a world that tried to fight against a sickness, and eventually lost. 

I don't know if this virus will get out of control and leave us in the devastated society portrayed in so many works of fiction. But for now, all I think I can do is hold on to hope for the best, while trying not to panic and prepare for the worst. 

2019-11-26

What's wrong, Delta? Afraid of a little snow?

Last week, I got the news that my grandfather had died. The news was very bittersweet. I had talked to him a few weeks before, and I could tell he was just so tired of life. He had always been a very active person, but the last few years, his body just failed to keep up. Especially after my grandma died a little over a couple years ago, there just wasn't much for him to do or anyone to do it with. (Not that he was totally alone or that no one visited — he had 24/7 care, and family in the area that would check up on him often.) So, while I'm very sad to have him go, I'm happy he can be reunited with Grandma and other friends that have gone before him.

I'm also very fortunate that I was able to make it out there — and even more fortunate that I was able to make it home.

I have a cousin in the area who was very nice to allow me to come in and give me a place to stay. In talking with him as I made my travel plans, the most convenient flight out from Denver to San Francisco and back was with Delta. Despite having a layover in Salt Lake both ways, the times worked out decently well for the price.

The flight out was pretty uneventful, which in the context of travelling is a good thing. The flight back, however, is where things got interesting. My flight out was scheduled late in the day of the funeral, and we planned to go from the funeral to the airport for me to catch my flight.

Early in the morning, as we were getting ready to leave my cousin's house for the funeral, I got a text message from Delta indicating that there was a change to one of my flights. I checked using the Delta app (once again entering my confirmation number, since the app forgot it — by this time, I had it memorized, which would become very important). The flight from SFO to SLC was unchanged, but the connection from SLC to DEN had been delayed from that day to 6am the next morning, giving me an overnight layover in Salt Lake. Now, the convenient thing about the Delta app is, when you have a delay like this, it gives you a list of alternatives that you can re-book for no extra cost. Scrolling through the list, though, only showed me a list of flights that started the next day. I didn't want to impose on my cousin for another day (plus have to take another day off of work), so I kept scrolling until I finally saw an option that let me leave the same day. It started with the same flight from SFO to SLC, but the next flight was from SLC to Los Angeles, with about a 5-hour layover there until a 2:30am flight back to Denver. It wasn't great, but at least it got me home.

I clicked the button to book the flight. Then, I looked over my itinerary. SFO to SLC, SLC to LAX, and then LAX to DEN, departing at... 7pm the next day? My 5-hour layover in LAX had somehow turned into a 24-hour layover. And the option to reschedule had disappeared from the app.

With the app failing me, I decided to call the customer service number and talk to a human. They put me into a queue with a 20-minute wait, and the recording gave me the option to have them call me back instead of waiting on hold the whole time. I selected that, and we set off on the road for the funeral.

At this point, it's important to know that I have T-Mobile cell service. It's also important to know that, in the area where my cousin lives (and, as it happens, over most of the route between his place and where the funeral was scheduled), T-Mobile's coverage can charitably be described as "not great". Sure enough, about 25 minutes later, my phone alerted me to a voice mail message, which I played back and heard the automated attendant trying to connect me to an agent. They called back as promised, but I had no signal at the time.

So I called again, this time remaining on hold for the now-25-minute wait time, listening to really bad electronic hold music broken only by a recorded message telling me how I could make changes to my reservation using the Delta app. (Yeah, it was the app that got me into this mess in the first place, thanks.) After about 25 minutes, as promised, an agent picked up. I was able to give him my name and confirmation number, and I heard him comment on how strange my reservation looked now. "Let me see what I can do," I heard — right before we entered another T-Mobile dead zone and the call dropped.

This time, I borrowed my cousin's phone (connected to AT&T service, which had better coverage in that area). After another 25-minute wait (which included several more recorded messages telling me how great the Delta app was), I was able to talk to an agent, who noted that Denver was due for some bad weather, causing some flight complications. To his credit, he was able to find me a flight. It was on a New Zealand flight operated by United (shrug), with direct service from SFO to DEN. It was scheduled to leave San Francisco a little later (giving us more time to get there from the funeral), but arriving home at nearly the same time. Sounded like a win-win. After we got off the phone, I checked the Delta app, and it showed my new flight (though, since it was a United flight, without the ability to check in or make a seat selection or any of the great features provided by the "great" Delta app).

After the funeral, as we're heading to the airport, I decide once more to check the app. It still showed me booked on the New Zealand/United flight, with no indication that anything was amiss. For some reason, I decided to do a quick Bing search (hey, I get Rewards points for using Bing, don't judge) on the flight, and it displayed the status: Cancelled. My cousin asked if I wanted to just return to his house, where I could start making calls, but at that point, I figured we could just continue to the airport, and I could talk to the agents there in person to figure things out.

Once there, I went to the Delta desk. I kind of expected the answer that they wouldn't be able to help me after passing me over to United, and in that, I wasn't disappointed. Of course the Delta and United desks were on opposite ends of the terminal, so I had a bit of a walk over to United.

I got a little worried at the United desk, when they couldn't find my reservation in their system, on any flight. Delta's reservation number wasn't helpful, but the ticket number was — once I found it buried under several clicks in that "really great" Delta app. (There was an additional snag in that the computer wouldn't let the agent do anything with my ticket, but he got that straightened out talking to his tech support — something about the ticket using my full middle name where he tried searching by just my initial.) After a few minutes, though, he was able to put me on a United flight straight from SFO to DEN. He handed me the boarding pass, which showed a departure time of about 1:30pm, about three hours prior. But, not to worry; that was just this flight's original time; it had been delayed until 5:30, which gave me a fair amount of time to get through security and to the flight.

It turns out, this flight's plane had some mechanical issues, so they had replaced it with another plane from Chicago that unfortunately wasn't going to get into San Francisco until after 4. Well, unfortunate for the people who were originally booked on that flight, but fortunate for me, I suppose. I kind of felt bad as I sat in the waiting area, listening to people complain about being there for four hours, while I show up less than an hour prior.

The plane from Chicago ended up being a little delayed; and, after we boarded, they had an issue with the door seal that required a maintenance crew to give it a once-over and fill out some paperwork, further delaying take-off. Then, while I didn't have a window seat and couldn't confirm this, it sure felt like we taxied around to every runway before finally taking off at a little after 6:30pm.

But finally, 2 hours later, despite a snowstorm that was just kicking into high gear, we touched down in Denver, and I was as good as home — at just about the same time I was originally scheduled on my first Delta flight from Salt Lake. As if nothing happened.

And good thing, too; the storm dropped snow all through the night, with nearly 500 flights cancelled at Denver the next day. If I hadn't made it home that night, it could've easily taken me a couple days to get home, right in the middle of the Thanksgiving travel rush.

2019-10-04

Entity Framework 6, SQL, and nullable strings

I ran into an issue that appears to be caused by Microsoft attempting to protect me from myself. Although, truth be told, it wouldn't have been an issue if things were a little better designed.

Imagine, if you will, a SQL Server database with a table of transactions. One of the fields on this table is a CorrelationId. It's a text field that is populated by a different system to tie transactions together (for example, two sides of a transfer from one customer to another). This field always gets populated on new transactions; the uncorrelated ones will just be the only one with a given CorrelationId. However, this system is not new; it was converted to replace an older system that did not have a defined CorrelationId. So, although the five million or so transactions created by this system have a CorrelationId, there are 12 million "legacy" records that have a CorrelationId of NULL.

So, say, for a given transaction, you want to find all correlated transactions. In SQL Server, you might use a simple query like this:

SELECT *
FROM dbo.TransactionTable
WHERE CorrelationId =
(SELECT CorrelationId FROM dbo.TransactionTable WHERE Id = @TransactionId)

And this would work, for the most part (except for legacy records, since SQL will fail to match on the NULL value — but we can ignore this for now). If you took this query into SQL Management Studio and looked at the execution plan, you'd see a nice thin line from the index seek on the CorrelationId, showing that it found and processed a tiny number of matching records, resulting in a very quick response.

However, if you were trying to do this programmatically from a C# application using Entity Framework 6, you might write some code like:

var query = from txn in transactionTable.Entities
where txn.Id == transactionId
join txn2 in transactionTable.Entities on txn.CorrelationId equals txn2.CorrelationId
select txn2;

The problem is, in C# code, null values are equal to another; while in SQL, "null" is considered "unknown", and doesn't equal itself. (The theory is, you can't know if one "null", or unknown value, equals another "null"; so equality tests between "null" and "null" are false.) Instead of leaving it up to the programmer to explicitly code for this condition, Entity Framework "helpfully" writes the join clause that it gives to SQL server in this manner:

JOIN [dbo].[TransactionTable] AS [Extent2] ON (
([Extent1].[CorrelationId] = [Extent2].[CorrelationId])
OR
(([Extent1].[CorrelationId] IS NULL) AND ([Extent2].[CorrelationId] IS NULL))
)

The extra check for IS NULL on both sides has two unfortunate side effects in this case:

  1. If the transaction is one of the legacy records, it will return a positive match on all 12 million other legacy records with a null CorrelationId.
  2. If the transaction has a CorrelationId, because of the IS NULL, SQL Server will investigate the 12 million null values in the CorrelationId index, resulting in a big fat line from the index seek in the execution plan, and a return time of a couple seconds or more.

The really annoying part is that there doesn't appear to be a way to stop this. Even if you explicitly add a check for a not-equal-to-null on your target table, Entity Framework still wraps the equality test with checks for IS NULL. The result is almost comical. For instance, adding txn2.CorrelationId != null either in the join statement or as a where clause, results in this (with contradictory statements highlighted):

[Extent2].[CorrelationId] IS NOT NULL
AND (
([Extent1].[CorrelationId] = [Extent2].[CorrelationId])
OR
(([Extent1].[CorrelationId] IS NULL) AND ([Extent2].[CorrelationId] IS NULL))
)

Even trying to break up the work into two statements didn't help. This code:

var corrId = from txn in transactionTable.Entities where txn.Id == transactionId select txn.CorrelationId;
var txns = from txn in transactionTable.Entities where txn.CorrelationId == corrId select txn;

Resulted in this SQL:

WHERE ([Extent1].[CorrelationId] = @p__linq__0)
OR
(([Extent1].[CorrelationId] IS NULL) AND (@p__linq__0 IS NULL))

Granted, this is a really bad situation to be in to begin with. Indexes on text fields tend to perform poorly, and having such a huge number of null values in the index is likewise unhelpful. A better design would be to rip the text field off into another table, or somehow otherwise convert it into an integer that would be easier to index (something we've had to do in other tables on this very same project, where we've had more control of the data).

I'm willing to bet that Microsoft's translation goes completely unnoticed in over 99% of the cases where it occurs. And, if I had the time to make a design change (with all of the necessary changes to all points that hit this table, some of which I don't have direct control over), it could have been resolved without fighting Entity Framework. Even just populating all of the legacy transactions' CorrelationId with random, unique garbage would've solved the problem (though with a lot of wasted storage space that would've made the infrastructure team cry).

In the end, it was solved by creating stored procedures in the datbase to do correlated transaction lookups (where the behavior could be controlled and expected), and having C# code exectue those directly (bypassing EF6) to get transaction IDs. Standard Linq queries would then use those IDs, instead of trying to search the CorrelationId.

This whole exercise was prompted by a script that I had to run to get a bunch of data from a decent number of transactions. It took nearly eleven hours to complete, finishing close to 1am after I started it. If I had time to go through this debugging and implement the fix, it turns out I could've gotten it done in about a third of the time.

2018-05-12

The Church and the BSA

The news broke that the Church of Jesus Christ of Latter-day Saints has officially announced that it will no longer sponsor Boy Scouts of America troops. As a member of that church who currently serves in a calling as a Cub Scout den leader, and as a father of four boys — one of which just earned his Eagle Scout, and one who likely will within the next year — and being married to someone who currently volunteers as a Roundtable Commissioner for the BSA district, I have some thoughts.

First off, this has been a long time coming. The Church has members all across the world, and the Boy Scouts is an American institution. While Church leaders have often noted that the BSA is, essentially, the Young Men's organization within the Church, this has been completely unavailable outside of this country. We've heard rumors for a long time of the Church looking to institute something that would be available to members everywhere.

There are certain benefits to dumping the BSA from the Young Men's program. I've heard from numerous sources how the youth budget is often seen as unfair, with the Young Men getting far more allocation than the Young Women. However, the main reason for this is that the Church pays much of the registration fees, awards, and so forth required by the BSA — which is something that the Young Women do not have to deal with. The BSA also has many rules, regulations, and so forth that can be difficult for a ward full of unpaid volunteers to navigate. (One of the responsibilities my wife has is to help new scout troops and packs get through some of this — and the ones requiring help are not limited to LDS Church-sponsored troops.)

The move could also allow more freedom in Young Men's groups in their activities. Without having to follow strict rules and a proscribed list of activities and merit badges that boys are supposed to earn, it could be more easily tailored to the needs and desires of the people in the program (kids and leaders alike).

On the other hand, there are some things that would be a great loss. The structure and required activities are sometimes a benefit. When I asked my wife about her experience growing up in Activity Days (the program for girls the age of Cub Scout boys), she expressed her disappointment that it was little more than a cooking class. Her leaders were skilled in the kitchen and enjoyed doing that, so, without any real incentive to do anything else, that's what they did, week after week. Not being a very outdoorsy person myself, if I didn't have the Cub Scout requirements to drive me, I would likewise have very little incentive or direction to do some of the things that the boys in my care would enjoy (or should learn).

With the lack of BSA sponsorship, access to some of the campgrounds and resources that the Church currently enjoys will be lost. So although there will be more money to go around, some activities will cost more.

The Boy Scouts of America is a national institution, recognized by people regardless of their religious affiliation. Anecdotally, having an Eagle Scout rank is something that employers see as an asset in potential employees. Certainly, a Church program would be less likely to carry the same weight, even if its requirements were at all similar to those of the Eagle Scout.

There is very little commonly known about the new youth program-to-be. The official separation — and, I presume, the implementation of the new program — is scheduled for the beginning of 2020. Time will tell how many of the benefits and losses will be offset by the new benefits (and drawbacks) of the new program. My belief in an inspired leadership gives me great hope that many of these things will be addressed, and that the new program (which, incidentally, is to replace the current programs for boys and girls) will be successful.

One common refrain in the news is that the entire LDS membership of the Boy Scouts is going to go away. While that's most certainly an exaggeration, there is some truth to that. Without the encouragement inside of the Church to join Scouting, it's very likely that fewer boys will feel inclined to join. Also, something I've heard from many parents and have experienced myself, is that there seems to be very little time to fit in all of the extracurricular activities that all of your kids are involved in. With the Church bringing in its own program, boys that want to do Scouts will now have two activities to balance (Scouting and whatever the new program is) — and there's no guarantee that meeting times won't conflict. Also, although religion has been an integral part of the Boy Scouts (belief in God is a requirement, even if the manner of worship is completely open), I have heard various reports of non-LDS-sponsored packs and troops using Sunday as an activity day, which is something that most members of the Church try to avoid. Additionally, an LDS-sponsored pack and troop typically takes care of paying the dues for its members. Joining another troop will mean paying those membership costs out of the families' own pockets. (I have not had much personal experience in this, so I do not know how willing and how much other troops can assist with dues and fees for families that may need the help.) So even though the option to join Scouting is not out of the question, it will be more difficult for LDS youth to make that commitment. The BSA membership is probably going to take a significant hit. While it might not be 100% of LDS youth, it wouldn't surprise me to see it be very high.

The timing of the announcements has certainly been interesting. Several publications have noted that the announcement of the LDS/BSA split came within a week of the BSA announcing that they would change their program name from "Boy Scouts" to "Scouts" (as part of their move to include girls in the program), leading many to speculate that the name change was part of the cause. I won't deny that it certainly looks that way, but I suspect the reality is much more complicated.

The announcement of the split consisted of simultaneous press releases from both the Church and the BSA. While it may not be outside of the realm of possibility, I have a hard time believing that a coordinated PR move happens that quickly, which leads me to believe that both sides knew about this for a long time. Now, it's certainly possible also that the Church knew of the BSA's plans ahead of time as well, so there might yet be some merit to the idea that the Church made plans to leave because of the direction the BSA was moving. It's also possible that the BSA knew the Church was planning on striking out on their own with their own youth program, and the decision to include girls was made to try to mitigate the potential loss of LDS boys from their ranks. In other words, the cause-and-effect might be reversed.

Still, from the perspective of public perception (or at least the narrative that some press are driving), it does look like the Church is abandoning the Scouts because of the changes in the Scouts. While it may sound conspiratorial, I have to wonder if it wasn't timed to make the Church look bad, so everyone can point and laugh at the church, "Look at how those backwards Mormons run from simply having to include girls in their program!" On the other hand, if the timing were reversed, it may appear that the BSA would be the ones to appear reactionary, and the story would be, "Look at how the BSA is so desperate for members now, that they're changing their name to include girls!" The timing from the BSA could have been more defensive than offensive.

All told, I'm kind of looking forward to the change. While Scouting has had some great benefits for my boys and the other boys in the Church, I won't deny the administrative side has been a stress to deal with; and I'm hoping the new program will bring some positive change. I don't know if my own younger boys will continue in Scouting (my oldest is already an Eagle and moving into his adult life; my second oldest will be aging-out at nearly the same time the split becomes official), but it will be a choice we will prayerfully consider in the months to come.

2017-06-24

Forget this "Free Healthcare" business!

I had one of those shower moments where I started replaying conversations and debates I've ever had or witnessed. For some reason, my mind had settled on the idea of universal healthcare (not something I've argued much on either side, but definitely witnessed a lot). Proponents often describe this as "free healthcare", which leads to opponents arguing that "it's not free" since it's paid for by taxes. I've even seen one argument that you'd have a hard time convincing a doctor to use his skill and many years of medical school learning and training for no cost.

"Yeah," said the voice in my head, "just like police and firemen should expect to be paid for their service."

And that's when it occurred to me. The proposal shouldn't be "free healthcare"; it should be "make healthcare a public service". Because that's really the truth. No one's really suggesting that anything be "free". They're suggesting that the costs be covered by society as a whole (i.e., government, paid through taxes), rather than by the individual using the service at that point in time.

While I can understand the appeal of calling it "free", I think proponents do the discussion a great disservice by using that word. It implies, at best, a fundamental misunderstanding of economics, and, at worst, a lie covering it up (since both sides know that health care costs actual money, that it's not really "free" at all).

Do I think a mere change in word choice will clear up the whole discussion? Absolutely not. There are still plenty of points to argue — quality of care, the ability of government to manage, and the actual cost for the public, just to name a few — I do think it would at least let us get past the part where we argue about "free" being "free" or "not free".

2017-06-19

Pass a recordset to C# by way of XML

UPDATE: I have to discourage using this trick. For reasons I do not yet know, it doesn't seem to work with a large dataset. I do not know the exact point at which it fails, I just know that it does. I noticed that a significant number of values that should have been updated with text, actually got updated with nulls. As much as I would love to investigate this and try to see what is wrong and whether it's a failure in C#, SQL, or some combination, unfortunately, it's more important that my work actually get done; so I've had to abandon the XML route entirely.

Original post follows.


It's been a while since I've posted, well, anything. But I learned of a neat trick that I thought I'd post.

I'm currently working on a program that is converting data from two different sources into a single database. A lot of it is just done with carefully crafted SQL statements, but there are a few steps where I have to take data from one source and use some C# code to do some kind of processing before storing it in the target database. Since the data set is on the order of millions of rows, processing these records one at a time can be prohibitively time-consuming. And, since I have limited access to the SQL Server itself, using SQL CLR isn't a great option. (I probably could get the access if I needed to, but it will be an additional step to have to remember and configure when this goes to production, and the fewer moving parts I create for myself, the better.)

One of the tricks I've implemented is to use multi-threading to let the different steps run simultaneously — one thread extracts the records and puts them into a ConcurrentQueue<>, another thread processes that and puts the results into another queue, and a third thread updates the records in the database.

I've been trying to come up with ways to do the update in batches. There are ways to create a stored procedure that will take a table parameter, and ways to call that stored procedure by binding the parameter to an equivalent DataSet, but I didn't like the idea of creating a DataSet object just to pass the records in. It just seemed too "heavy" to me. (Though it might've been faster than calling a command object in a loop for records one-by-one.)

Another option was to create a VALUES table and build the command text dynamically. But, since I was working with strings, I didn't like the idea of building dynamic SQL and having to escape quotes or any other special characters that might cause SQL to choke. (Not to mention it's just bad practice, even if my code is unlikely to be used as a SQL injection vector.)

So, I came up with the idea of passing in values as an XML document. By building the XML with Linq-to-XML C# code, all necessary character escapes would be performed automatically. I could pass in as many values at once as I felt comfortable with, and let SQL do the work in a batch instead of one at a time.

To give some context to this code, I am taking email addresses that were encrypted in the source database, and converting them to their decrypted values in the target database. At this point, my queue consists of objects that have two properties: EncryptedEmail and DecryptedEmail. Earlier in my conversion work, I've simply copied the encrypted strings over into the Email field of the table, so all this method has to do is update the table and changing the Email field to its decrypted value.

var recordsToUpdate = GetRecordsToUpdateBatch(250); //Retrieves up to 250 records off of the queue at a time
if (recordsToUpdate.Any()) {

var xdoc = new XDocument(
new XElement("emails",
recordsToUpdate.Select(r => new XElement("email", new XAttribute("encrypted", r.EncryptedEmail), new XAttribute("decrypted", r.DecryptedEmail)))
)
);

using (var connection = new SqlConnection(GlobalSettings.DatabaseConnectionString)) {
await connection.OpenAsync();
using (var command = connection.CreateCommand()) {
command.CommandText = @"
WITH emails AS (
--Convert the XML document into a table that SQL can use normally SELECT Tbl.email.value('@encrypted','varchar(256)') AS Encrypted, Tbl.email.value('@decrypted','varchar(256)') AS Decrypted
FROM @emails.nodes('/emails/email') AS Tbl(email)
) UPDATE cust SET cust.Email = emails.Decrypted
FROM dbo.Customer cust
INNER JOIN emails ON cust.Email = emails.Encrypted;
"
;
command.CommandType = System.Data.CommandType.Text;
var param = command.CreateParameter();
param.ParameterName = "@emails";
param.SqlDbType = System.Data.SqlDbType.Xml;
param.Value = new SqlXml(xdoc.CreateReader());
command.Parameters.Add(param);

await command.ExecuteNonQueryAsync();
}
}

await Task.Run(() => Thread.Sleep(1));
}