MICE is a method that imputes missing data using simulation from an inferred posterior distribution. See the Appendix of http://www.statcan.gc.ca/ads-annonces/12-001-x/5857-eng.pdf to see the exact way we simulated data.
With this in mind, MICE is a random method that will yield differing results every time it is run. Therefore, a typical testing procedure where we try to get exact agreement between our implementation and an existing, mature package is not possible. However, we should get roughly the same results for any given instance, and more importantly, the distribution of results should be very similar. So what we will aim to do is really a comparison of simulations between our implementation and those of R and Stata. Right now, predictive mean matching is directly comparable between all implementations, but there are some details in our other asymptotically Gaussian approach that are different in the other packages. I will write a follow up post shortly describing the different approaches taken by different packages. For now, the takeaway for testing is that we will perform simulation studies to assess the similarity between several MICE implementations.