Policy Churn issues

We’ve had this issue where OSD starts to fail because the client’s can’t download the policy from the MP’s. We’d seen this in the past and it was caused by some kind of corruption where a package that was referenced by the task sequence was getting it’s policy updated in a table at one of our primary sites a few times a second. Because of this we would get messages that the policy was locked and could be read.

We started by looking the log on the MP to see what it was saying. This is what we saw:

You can see on line 7 that the MP has no policy to give. We saw this in several places for most machines. In the past, we had fixed this by deleting the referenced package and recreating it, however, the packages we were referencing this time were critical packages and we couldn’t use the same fix. So, we made a change to the package (added a comment) to see what happened and when we next checked the PolicyPV.log, here’s what we saw:

At this time we had a ticket open with Microsoft and they saw this (specifically the constraint issue in line 1) and I ran a query:

On the CAS, this showed 0 rows, but when I ran it on our primary sites, I got back over 6,000 rows. Now, there were a few rows where the Days Tombstoned was under 30, but the majority were 584. This was interesting, because around 584 days ago, our site server (which is a VM) had a “Purple Screen” and we started having some issues around then. Not many, just little things that popped up once in a while. We also saw a lot of familiar package info in that list, ones that we had to delete and recreate.

Now, during this time, we had a ticket open with Microsoft and the technician (who was awesome and dug right into this issue) had us look at the SoftwarePolicy table. We ran the following query on all our databases:

Wherever got back rows on that, we would delete those items from SoftwarePolicy:

This did solve our problem and policy started to download on our workstations. However, the error we had referenced the ResPolicyMap table and we went back and addressed this. We found out that there was a old bug that could cause this (I believe ours was caused by the server crash since all the problem entries had the same number of days and the fix was simple. We set the isTombStoned=1 on the rows that matched up to PolicyAssignment where isTombstoned=1. Below are all the queries I used, I’ve commented out the updates so you don’t accidentally run them.