TravelClick reservations are not being processed
Incident Report for Mews
Postmortem

Problem

On Friday at 13:30 UTC, TravelClick switched all connected properties at once to a new API endpoint we started using. We checked the transition just to verify that requests are still coming to us, we haven’t checked that reservations are delivered to the properties properly, because it was one of many similar transitions we’ve already successfully done. However in this case, reservations were not being delivered to our system.

At 19:00 UTC we were notified by properties that they are not receiving reservations and started checking what is the issue.

Action

After investigation, we figured out, that there were 2 issues, one with the new endpoint and one bug in our system. Around 21:00 UTC, we have released a hotfix for both issues, so reservation processing was restored. After the fix was confirmed to be working, TravelClick was able to resend all reservations since the 13:30 UTC, so no information was lost at the end of the day.

Cause

We are actively working on making Mews PCI Level 1 certified. In order to achieve that, we can't be receiving raw credit card data from channel managers. In order to achieve that, we implemented a connection to a PCI proxy system, through which we receive reservations from channel managers. It means that channel manager will send the reservations with raw credit card data to the PCI proxy system, that will store the raw data internally, replace it with their token and and forward the reservation to us. So the date we receive doesn't contain credit card data any more, only a token.

In order to make this work, we have to set up an endpoint in the PCI proxy system for each channel manager, define pattern where the raw credit card data are located, so they can pick it up and replace with token and specify where to forward the updated call to Mews.

Unfortunately, there was a bug in the endpoint definition and in our system that was processing the forwarded reservation from TravelClick, so since the switch, none of the reservations we received from TravelClick was delivered to the property. The bug was not causing any visible error in our system, so we weren't aware of any issue and considered the transition to be successful.

During the investigation we discovered, that our access to PCI proxy system has expired. It turned out we would not need it at the end, as there is no related information, but it was first place we needed to look and it slowed down the investigation.

Solution

  • This week, we have defined new endpoint with correct behavior in PCI proxy system, and coordinated transition to that new endpoint with TravelClick without any issues. All is working correctly since.

  • We will set up monitoring on the amount of reservations delivered from channel managers that will detect anomalies (like this, nearly 10 hours without delivered reservation) and report to us.

  • We will improve monitoring in case we receive reservation we can't process.

  • We will schedule similar endpoint transitions and configuration changes not to be happening on Friday (if the other party is involved, wel’ll try to push them not to do it as well).

  • We will fix the PCI proxy system access.

Posted May 31, 2019 - 15:41 CEST

Resolved
This incident has been resolved.
Posted May 18, 2019 - 00:51 CEST
Monitoring
We have identified the cause and deployed a hotfix on our side to resolve the issue. We can see reservations coming to the hotels properly.

In the upcoming days, we will also provide full post-mortem for this incident. We will investigate the root causes and ensure that the whole problem is solved properly. And also ensure that similar issue cannot reoccur in the future.
Posted May 17, 2019 - 23:29 CEST
Investigating
Since 13:33 UTC today, we have been failing to process TavelClick reservations. We are changing the process of receiving reservations from TravelClick via PCI proxy. During this transition there seems to be a configuration issue and all reservations from all hotels are being delivered to single hotel, which our system correctly rejects.

We are investigating where the configuration issue is.
Posted May 17, 2019 - 21:49 CEST
This incident affected: Channel Managers.