Last Friday, I broke 2 of my own rules and I paid dearly for it.
First, it’s Friday and I never tackle major work or anything that could leave the day with work undone and me unsettled. It’s a sure fire way to let your work ruin some if not all of your weekend.
Least to say, whatever I had planned to do on “do the easy” and “tie up loose ends” Friday didn’t happen.
I had plans to go to a Halloween dog parade on Saturday morning. I needed some time to get a costume together and prep. When Saturday came, I was in a hurry, still had lingering stress and ran late.
So what was the cause?
A recent Civi upgrade happened. I did the normal staging double step upgrade dance.
But then an error on prod slipped through and caught me off guard.
I had a solution, and a plan, yay. Replace the venerable Reportplus extension with the newer Chart Kit extension. Be diligent and test fully on staging first => Get Involved signups line chart, check. Contribution chart by day, check.
And it seemed to be working perfectly fine on staging. All systems go.
Cool, and we all like to save time so I imported the search display (a great feature Search Kit has to offer btw). Except in my eagerness and excitement about the new Chart Kit data visualization extension, I failed to also activate it on production. Yeah I realize, that was a silly first mistake.
OK, activate the extension and all is well, right? Nope, it utterly fails on production.
And the error persisted. Uninstall, reinstall. Clear cache. All the typical normal stuff.
Did some data linger in the db? Well, time to take the safe route and do a restore to before all of this happened, which was just an hour before and minimal activity occurred during that time. Transferring the data into the new database was no issue.
But… that led to a restore glitch that previously worked just fine and now burned up more time working thru the restore that succeeded on the third attempt. On Friday.
Chasing a symptom and not the root cause is a problem.
You see, staging is only effective if it is EXACTLY the same as production, or darn near close to it. And in my case, it really was…with a couple exceptions.
Debug mode was on staging.. Obviously never in production.
Turns out it was an extension conflict with caching, which doesn’t occur in “debug mode”.
Here’s the deal:
- Staging is only valuable if you exactly replicate steps from staging to production.
- Checking if it’s the cache is a golden rule of debugging.
- Conflating a mistake with the real cause that’s yet to be diagnosed and chasing it down first because you think it’s the root issue, can waste a lot of time.
Doing it on Friday makes it all 10x worse. I now see in retrospect that there wasn’t a real urgency but degradation of a CRM and losing features feels bad. And I wanted to make it “all better.” And that makes us humans easy to lose the calm cold logic.
But I’m human and perfection isn’t real; it’s just that there is always room to improve.
