I work for a company that does off shore for other companies from the same group. There are two types of activities that are done worldwide (100+ countries): internal support and support to external business partners (mostly clients, some suppliers, etc.).
As I work with networks and data transmission, and I lead a team that interfaces with external business partners almost 100% of the time, it is common to get some support requests for problems that really aren’t our problems.
It is something that is annoying because when there is a problem — specially in production environment — we need to stop working on everything else and solve that. Out of those, 99.999% of the times the problem isn’t on our end: we work with thousands of business partners and have the structure in place to work with them: servers, network, firewalls, etc. Everything is working fine, debugged and stable — there are some destabilizing changes from time to time, of course, but those are rare and very well tested. Then, there comes the question from the external business partner: “We can’t connect to your server. What did you change? We need to submit that data today!”
Last time this happened, I had a meeting with an Indian company from 1:30 AM to 3 AM here… And what was the issue? They could manually submit data, but their automation couldn’t.
Anyone that can solve a hard problem — such as adding 2+2 — can see that there is something wrong with that and that something is probably at their end… But no, this didn’t happen. So, we scheduled a meeting and after one hour asking questions and asking about changes, they said that they changed the server: manual process (working) was from one host, automated process from another. When asked to manually test from the new host, it didn’t work as well. The problem? They changed from one server to another and didn’t update their own firewall policies… And then, it is our fault… 🙂
Second time there even was a paging sent to our team: data corrupted. Oh well, the customer had to fix it and not us as we don’t change data in any way. Even within our own company, business first assumes it is our fault and then we have to insist to have them go to the customer.
Now, back to the subject:
- Make a change procedure, schedule times, analyze impact, etc.
- Keep a backup of all of your changes. Never demise the server after the change, you might need something from it back or reverting back to the old server in case something goes wrong
- Check everything that has to be changed (servers, configurations, user profiles, firewalls, IP addresses, etc.)
- Have a “recent changes” log and don’t assume that one change is irrelevant, specially when talking to other companies trying to solve a problem
- Check your company first: go to your IT team or contractor, ask what has been changed, why it worked before and what is exactly the problem you’re having
- If it is a connectivity problem, check that you can connect to other sites that provide a similar service / interface, check that you can get out of your network to the Internet from the same host that will be used and that is presenting the failure
- Supply the support team a traceroute / tracert output from the failing host to their server
Give the support team from both companies enough information to solve your problem. Guessing is too time consuming, error prone and isn’t something specialists like doing without enough information to discard the most obvious problems first.
“Help us to help you”, I always say (no, I didn’t create this expression, but I really enjoyed it!).