DE{CODE}: Site Monitoring: The Intersection of Product, UX Design & Research

Don’t you hate it when you find out your site or your client’s site is down…from your client? Don’t get blindsided ever again! Join WP Engine Senior Product Manager Bryan Smith, UX Associate Researcher Kate Meyer, and Senior Product Designer Kameron Fehrmann as they walk through WP Engine’s Site Monitoring solution, which makes this problem a relic of the past. In this session, you will get a detailed look at how Site monitoring works and how the intersection of UX Design, UX Research, and Development came together to ensure product-market fit.

Video: Site Monitoring: The Intersection of Product, UX Design & Research

Session Slides


Full Text Transcript

BRYAN SMITH: Hello, everyone. My name is Bryan Smith. I’m a product manager here at WP Engine. Thanks so much for joining us today. We’re here to talk to you about site monitoring, the intersection of product, UX design, and research. Joining me today are Kate Meyer, one of our UX researchers, and Kameron Fehrmann, one of our product designers. On the next slide, I’m going to talk to you about what site monitoring is. So we’ve just released this new product. It’s called “site monitoring.” It’s available as an add-on for WP Engine customers. And with it, you’ll be able to monitor any of your site environments that are associated with your account. And we will tell you if there’s any kind of outages that we see on the site or on our platform. 

And next, I’m going to go through our agenda real quick. So I will go over the product and give you an overview as well as a technical deep dive. But before we do that, I’m going to pass it to Kate Meyer to tell us a bit about how we got to this product. She’ll go through some user research techniques that we’ve used. Then she’ll turn it over to Kameron, who will go through our product design and iteration. And then I’ll finish this off with the product overview and a deep technical dive. Over to you, Kate. 

KATE MEYER: All right. Thanks, Bryan I’m Kate. I am a UX researcher here at WP Engine, currently focused on improving our site building offerings. So Bryan showed us this great new product we have now. But how did we end up here? I want to go back to the beginning of our timeline and demonstrate how we use design thinking to get us from knowing base information about our users to product release in just a few months. I’m going to focus on the research aspect of our process and share with you how you can implement these practices no matter what your role is, even if you don’t have a UX researcher on your team. 

As I mentioned, we used these three phases of design thinking for this project. Design thinking is an industry-standard framework. That’s essentially a user-centric approach for solving a problem. I’m going to break down our work into these three phases here. This past summer, we wanted to learn how to not just fix what was wrong with WP Engine’s user portal but learn how we could take it to the next level. 

So to address this, we used generative research. This was in the form of interviews with a variety of our users. We asked them all a set of pre-written questions, all the same questions for all users, just to generate an understanding of things like their jobs, goals, challenges, et cetera. If an interview process sounds overwhelming to you, you can also utilize a survey tool for the same purpose. We actually used SurveyMonkey along with interviews just as another way to get feedback. 

During this phase of the research, we found it really interesting that, despite differences in users’ roles, they actually shared a lot of common goals and pain points. And this is a really great pattern to see. We actually do want to see these similarities across different types of users, because it helps us scope our work. And while we were conducting these interviews and the surveys, we started to notice some common themes within the context of hosting websites. 

Two of the goals we kept hearing about were wanting just one single tool to monitor and maintain websites and also catch issues before clients and site visitors noticed anything is wrong. However, a common pain point was that our user portal isn’t currently enabling users to meet these goals. And actually, this agency owner summed it up pretty well by saying, “If I can resolve an issue before the client sees it, that’s fantastic. I don’t want to get those client calls of, ‘Oh, hey, guess what? I went to my website. It’s not there. What are you doing today?'” 

Once you’re seeing these common themes coming up again and again when you’re hearing from users, then you know it’s the time to ideate. We knew that we needed to improve the user portal to help users proactively take care of their sites. And we also knew that our engineering team could leverage partner technology to address one aspect of users’ pain point, and that was through uptime monitoring. 

So at this point, it would have been really easy for us to just dive right in and start building something immediately. But if you want to build the right thing the first time, it is important to still get user feedback and input during this phase. It’s also really important at this phase to have the full team be involved. 

You do want to be coming up with some great ideas, but you also need to make sure that your ideas are feasible and that they’re still aligning with what your users really need. So during this phase, our designer worked with the engineering team to come up with an idea and ensure feasibility of it for site monitoring. And then with the product manager and designer, I plan out some concept testing so we could put our idea in front of users. 

Concept testing is a type of research that helps you learn if the idea you have matches the expectations and needs of your users. So we show them our idea and ask them questions about it. In this particular case, we used midfill LE mockups like you see on the left here that the designer created with the help of the engineering team. But the great thing about this type of research is that you can show something as simple as just pen on paper. It doesn’t even have to look good. 

This technique is really great, because it lets your users focus on the ideas instead of the visual presentation. And again, at this phase, we really want to learn if your idea is headed in the right direction. And another aspect of this is you don’t have to show it to tens or hundreds of people. You can use as few as five participants for this type of research because, by that point, you should start seeing some common themes in their reactions to your idea. 

So during our concept testing, we learned that our users’ expectations were aligning with our plans for this product. So that put us in a good position to start building it. We still wanted to gather feedback, though, so we decided to go for a closed beta. And what this looks like was having some users opt in, add the feature to their account, and then ask them for feedback throughout their process of using the new feature. And so having this small group of users with access to the product is a really great way to test out the usability of it, work out bugs, and just understand how you might be able to meet their expectations better before the product is released to all of the users. 

So Kameron’s going to continue the story from here. But before we leave the research piece behind, I do want to wrap up what I hope that you take away from my story. So again, this framework can help you get from understanding user needs to building a new product. And anyone on your team can learn from your users through various methods and at any and all phases of a project. When you keep your users at the center of building, this is how you’re going to make sure that your product is as effortless as possible for your users and to give yourself a competitive edge. Thank you. Kameron. 

KAMERON FEHRMANN: Thanks so much, Kate. Hey, y’all. I’m Kameron. I’m a senior product designer here at WP Engine. I also work with builder tools and our e-commerce products, and I’m super excited to talk to you all about site monitoring today. So here’s kind of where we’re at in our timeline. We’ve gone through, done our generative research, done some concept testing, and now we’ve released to beta. We do have a survey out that we’re kind of listening to people, and this was actually the point that I came into the project. 

I kind of quickly got caught up on the previous research. Kate and Bryan were super instrumental in this. Honestly, if we hadn’t already had some of the collaboration cadences set up between design and research, product and engineering, things would not nearly have gone as smoothly. So they were great partners in getting me caught up to speed in the middle. I know some of you probably understand how that is, working in the agency life. We knew that this foundation was kind of great for our beta but that there was more we wanted to do with it. 

So we kind of did a fast-follow after we released to beta to improve the design a little bit more. First and foremost, we started out with our WP Engine status. We heard from users that they weren’t quite sure if the outages they were experiencing were as a result of something they had done internally or if it was a WP Engine problem that was, frankly, out of their control. So we added in this status for people so that they could actually see, hey, something’s going on with WP Engine. It’s us, not you or vice versa. 

We also added in the Add, Remove, or Pause feature for monitoring. This was basically a way for people to add or remove monitors and then also pause monitoring when needed, and it was just a way for people to customize their experience a little bit more. And lastly, as you can see here, we surfaced outages pretty heavily. We wanted to make sure people could clearly see what was happening with their sites and definitely communicate with people. And this is something we heard as well, that they wanted to be able to see their outages and take care of the problems as soon as possible and as quickly as possible. 

And here’s a kind of before and after of where we started with the beta and where we ended up before going to release. As you can see, some pretty big differences. We specifically focused on the columns. We heard from people that they weren’t quite understanding what the columns were or what they were for or what any of the things within them meant. 

So we made the outage status a lot more clear as to whether or not something was in an outage and what that meant. And then we also added in some more actionable links. We added in the definition of what an outage is and then a link to a support article about site monitoring so that people could go and find more information if they wanted to. 

The other thing we did was tie this more closely to our internal design system. It was so great being able to pull from a kind of library of components for myself as the designer and as the developers, so we could all kind of make our workflows go faster. If you don’t already have a design system that you’re working with, I highly recommend one. They just make your workflows so much easier, and they make everything go a lot faster. So we were able to go from what you see on the left to the right pretty quickly because of this design system. 

And here’s what that workflow, that iteration kind of looked like as we were working through the beta. So we started. We released. I was getting feedback from our users with our survey and also from the developers working on the product. I made some design changes. I would talk with the triad. We might have some feedback just amongst us, and then I’d hand it off to the developers. They might have some feedback. We might have some discussion and then we’d release to beta and the cycle would start over. 

So just to check-in here, we’ve gone through, released to beta. We’ve listened to people in our beta survey. And now, we’re ready to start in on alerts and get that experience kind of going. So alerts, we knew that people wanted alerts, needed alerts. This was something that we heard from users was super important and would make monitoring even more valuable to them. 

We also knew that users wanted to be notified of a problem before it’s a problem for their clients, kind of like you heard in our quote. They don’t want to receive a call from a client that there’s an issue or an outage with their site and they didn’t actually know about it themselves. That’s not good. 

The other thing about this is we actually included two more development teams to this work, because we wanted to be able to meet our release timeline. That cycle that you kind of saw became super important because there were more teams. More hands make lighter work but also can make things more complicated. But luckily, we were able to take care of that for their cadences. The thing we kind of had to figure out with alerts was the channels that we wanted to use. 

What we heard from users primarily was that email was their preferred channel of choice over Slack or SMS, so we decided to stick with emails first. And then we kind of had to go from there and think about all the different email scenarios. We wanted to make sure that our messaging was super clear and actionable for people, that they were able to understand and take action as soon as possible when they received an alert. 

The other thing we had to think about was, when somebody’s signing up for an alert, we want to make sure that we’re confirming they’re subscribed. This is just kind of best practice with user experience. And then on the opposite end, making sure that the unsubscribe function is actually pretty seamless for people and it’s a pretty easy and good experience, all things considered. So, yeah, we went through and did some more user testing and some more research for this. And we really wanted to make sure, like I said, the messaging was understandable and actionable. 

So here’s, again, just a side-by-side of the before and after testing. Not a lot of crazy differences here. Primarily, we heard from users that they wanted to know what the specific errors were and they wanted more information, so that’s what we tried to give them. We tried to give them the error codes and any more information we could and just kind of clarify that content a little bit. And after this, honestly it was just a matter of working towards the release. So honestly, I just want to highlight some of these key takeaways that I talked about and these key collaboration points that we have throughout this project. 

First and foremost, the triad operating model was super important to us. Once again, that was design and research, product, and engineering all kind of working together as a team to get this product launched. We would frequently have syncs and touch-bases on design, research, engineering. And we would ask questions, collaborate. 

We even set up our own Slack channel. I do recognize that not everybody can or is able to do this, but creating those collaborative relationships between design and product are really important. And they’re really key for making sure that you have that alignment and accountability at that enterprise or agency level when creating products. 

The other thing that I’ll mention is design and research having such a close partnership. I recognize that not everybody works with a designer or a researcher, but you are able to still be a user experience advocate if you want to. There are plenty of UX groups out there that provide great resources and best practices, so you can still be a usability advocate even if that’s something that isn’t your primary role or isn’t something you often do. 

The other thing I’ll mention is actually the partnership with development. I worked super closely with all of the development teams on this project. I often found myself coming to them, asking if I was crazy for creating a design or something, and they were always so open to working with me and kind of providing all sorts of insights, asking questions. 

It was great. We had a really great collaborative kind of relationship. So I will say, if you’re working with a designer, don’t hesitate to kind of get your hands dirty and collaborate with them. We love working with developers that are willing to sit there and understand the problems we’re trying to solve and kind of work towards that common goal together. 

Another thing on that, I actually did involve myself in a lot of the Agile ceremonies and cadences that these teams have. So being able to sit in, backlog refinements or sprint plannings and ask questions, have them ask questions of me in the context of the development work was super valuable. And last but not least, async collaboration. This was really key. We’re a global company. We have teams spanning the globe, and we’re all really busy. 

So being able to create specifically Slack channels across teams for all of us to collaborate was really key. Kate and I could post about research and design. We could get feedback, ask questions without having to wait for a review or a meeting. And I think I just want to call that out, that the situation doesn’t have to be perfect– perfect, excuse me, in order for us to collaborate. You can do it asynchronously. You don’t need to wait for a meeting. Everything doesn’t have to be exactly right in order to get things done. So that’s my time. Thank you all so much. Bryan, I’m going to let you take it away and talk about our product overview. 

BRYAN SMITH: Thanks so much, Kameron. All right. As promised, I’m going to jump into a product overview and then we’ll do a technical deep dive before we close out. So site monitoring and portal. For those that add the add-on, they will have access to a new portal page. It’s called “site monitoring.” And from this page, you can add monitors, pause and delete. Kameron alluded to this a bit, but this is the page that you do that from. 

Also, from this page, you’ll be able to view outages, uptime, average response time for a selected date range. You’ll also be able to link to site-specific error logs when we detect outages, so all of that is possible from this page. There will also be links to the Alert Preferences page, which we’ll go into here in just a second. 

OK. So I want to jump into a video real quick and then we’ll jump back to the slides, but this will be an actual demo walkthrough of what that page looks like in portal. Just one thing to call out, it was recorded before some of those images that you saw from Kameron. So we are updating this. Don’t take this as exactly what it looks like, but it’s a good approximation of what you’ll see in portal. 

Menu. You’ll see a site monitoring link. We’ll pull up this page here, and you’ll see that I have a list of all the site environments that I’m monitoring. I can see those response times and the list of all of them that are currently monitored. I clicked that WP Engine status link up at the top, and it took me to this WP Engine Status page. Kameron mentioned that earlier as well, but that is available there. 

When I click the Add Monitor button, I can easily do that just from a single click. I would say that’s a huge piece of this product and integration, is just the ease with which you can pause, delete, or provision monitors. Here, I’m pausing a monitor. You’ll see a little Resume button pop up there. Yeah. If I hit Resume, that unpauses it. 

And keep in mind, what a pause actually does is it just stops the ping monitor from actually pinging the site. So whenever that’s paused, it’s not actually sending that ping. Here, we’re going to remove a monitor. You’ll see a confirmation screen. Because when you do delete one of these monitors, it actually removes all of the associated outage history associated with that. So just bear that in mind. 

And that’s the page in portal. All right. Going to jump back to the slides now and talk to you a bit about the email alert, so there’s a few different templates. Kameron alluded to this a bit earlier, but I’ll take a little bit deeper dive here. So once you opt in for email alerting, you’ll receive an email template that looks something like this, showing that you’re now subscribed for monitoring alerts for your sites. It will give you a link to our Support Center article, which will give you more information just on how this product works. And down at the bottom, there is a link to the Site Monitoring page that I just showed you. 

OK. So when we detect an outage on your site, you’ll get an email that looks like this. It’ll have the site name, when we detected the outage. It’ll also show that WP Engine status. Now, this status is important, because it will show you the current status of the platform, of the hosting platform. So if this looks good but you’re still getting this email, that’s indicative that there is actually a site-specific issue. 

It’s not specific to the hosting infrastructure, but actually there’s something on your site or your domain. And in the content of the email here, it’ll show you that response code that we’re seeing. And then down at the bottom, there will be a link to that Site Monitoring page. There is also a link to the access logs, because this will be your next best step to try to diagnose what is happening and why you’re seeing this outage email. 

All right. And then when that resolves, you’re going to see another email that shows you that your site is back up. The outage is no longer occurring. We’re no longer detecting it. This will also tell you which site is back up. It’ll tell you how long that site was down and, again, links at the bottom. Same links down at the bottom. 

So I mentioned this is a page that you can get to from the Site Monitoring page and portal. This is where you actually set up your alert preferences. So from here, you can enable or disable alert channels. You can input email contacts. Email contacts come from your portal users list, so you’ll see that down there at the bottom on the left. It’s just a checkbox. 

We already have the name and email address. You don’t have to enter that. Again, it’s pulling that from your portal contacts. But do mention here that this will be a page from which you can enable Slack integration. We don’t have that just yet, but it is on our roadmap. It’s something that we’re just about to start work on. So currently, just email alerts, but Slack is on the roadmap. 

All right. I mentioned we would be getting into some technical details here to give you just an idea of how this all works behind the scenes. So it’s all possible through what we call our “Site Monitoring Agent,” and this is an intermediate layer between our user portal and what the user is doing there and our partner New Relic, whose monitoring and alerting APIs we’re consuming. So the Site Monitoring Agent essentially centralizes the New Relic resources. 

This is the layer that creates, updates, and deletes monitors as well as alerts. And it’s also the place where we can just reconcile and catch any sort of errors, make sure that nothing gets removed accidentally. So this is that interstitial layer, so let’s go in a little bit into some of the things that are happening in a user flow. So let’s walk through a typical user flow. So a user will sign up. What happens there is an entitlement check is made to portal to see if they have access to the site monitoring. 

And in the case that they do, it checks the WP Engine entitlement service to get that OK. Once it passes that check, the user can then create a monitor from that Site Monitoring page that I showed you earlier in portal. So they manually provision that monitor by clicking that Add Monitor button. And behind the scenes, it’s sending a request to the New Relic Synthetics API to actually provision that monitor. 

Now, while you’re also on that page in portal, you can view data. You can view historical data. From what we’ve seen from pinging the site that you have set up, you can also see the average response time, link out to access logs. So here, a customer can view that data on that page. What’s happening behind the scenes is we’re actually hitting a different New Relic API. It’s their NerdGraph API. So the Site Monitoring Agent sends a request to retrieve that data and display it. And all that’s happening, again, through the NerdGraph API through New Relic. 

A couple of other use cases that would be common would be the Edit Monitor scenario. So this could be pausing an existing monitor, in which case the agent will send a patch request to the New Relic Synthetics API. You could also de-provision a monitor. This would be deleting a monitor from that portal page that I showed you earlier. This is sending a delete request to that Synthetics API. A customer can also change configuration. Maybe you want to change the URL for the domain that we’re sending that ping check to. 

In this case, the agent sends a patch request to update that monitor. Also, a user can cancel the site that has site monitoring. And in this case, what we would be doing is just sending a delete request automatically to that Synthetics API to de-provision that monitor. Or in the case that a customer might cancel the entire account that has a bunch of different site monitors on it, a request to de-provision all those monitors is sent automatically when that’s detected. So all of these things are important to the user flow, and the Site Monitoring Agent is what makes that possible. 

All right. So I mentioned it earlier but, as we look ahead, we are certainly planning on integrating Slack as an additional alert channel. We’re also exploring SMS as well, so stay tuned for more additions in the future. This is our V1. We’re excited about it, and we’re really happy to be able to launch it here at DE{CODE}. But this really is V1. We have so many plans in store. These are just a couple of them. But also stay tuned for more configuration options with the monitoring, more improvements to the user portal, and we’ll continue to follow that research and design iterative process that we have been that’s led us to this point. 

So with that, thanks to the other presenters, Kate and Kameron. And thank you all for joining us today. Have a great day, and go check out site monitoring Thanks, everyone. 

Get started.

Build faster, protect your brand, and grow your business with a WordPress platform built to power remarkable online experiences.