CU2 and Maintenance Mode

Well, had an interesting Saturday. Our SCCM setup is a CAS with two primary child sites. Our test site, however; is just a single primary site. We updated our test site to CU 2 with no real incident (just a little goof as seen in my previous post). So Friday, we updated our CAS and primary child sites. Since Friday is date night for me, I gave the system a glance over and everything looked good, so off the wife and I went. Saturday came around, woke up, gathered up the wife and kids and we went out for a day of fun in the sun. While we were wandering around in Cabela’s, when I noticed on my phone I’d missed a phone call and text from my boss, as well as a voicemail from him. Come to find out, the sites were not replicating.

So, we rush home, and I start looking into the site. I got in contact with the vendor we have doing our Windows 7 OSD ZTI deployment (one of those genius types) and he told me how he had looked through the logs and seen that one of the sites was in maintenance mode. After doing some research, and trying a few things, I realized I was in over my head. Since this project is pretty hot, that meant we had to get this working ASAP, so I called Microsoft and opened a ticket with them. Soon, I was on the phone and learning some new stuff about Config Manager.

We started off by turning on some logging in SCCM by setting the registry key HKEY_LOCAL_MACHINESOFTWAREMicrosoftSMSTracingSqlEnabled to 1. Then, we created an empty file in the inboxesrcm.box directory named Configuration Data.pub in one of the child sites. This forced the SQL replication to try and replicate that publication. Watching the rcmctrl.log file on the sites, we waited for the error “The publisher reported 1 tables missing. The publisher will send messages when tables are available.” Once we saw this error, we ran a query on the SQL server:

This is where we saw that the PullDPResponse table was missing on our CAS site. So, we verified the table was not there and then went to the child site, right clicked the table and script it to the query window. Copied the script and ran it on our CAS SQL server. After that, we again created an empty file in the inboxesrcm.box directory named Configuration Data.pub. After about 10 minutes, things started working and our sites were replicating!

This only took about 6 hours or so to get through this, during which time I learned a couple of other things. Our servers have a GPO applied to them that disables the Windows Firewall service since we are not using it. We also have a GPO being applied that turns the firewall off. This is actually an issue for config manager, it needs the Windows Firewall service running, even if it’s turned off. This is why we had these errors about ports 1433 and 4022 weren’t open. The support guy also told me that this can cause, “Strange things to happen.” He didn’t elaborate, but I thought that was interesting just the same. So, my next task is going to be requesting our servers be put into a GPO where the Windows Firewall service is running.

After this experience, I can see it’s probably a good idea for a Config Manager Admin to have a good understanding of SQL replication. I guess I’ve got something else I’m going to have to start studying…