The Art of SQA

Thursday, July 28, 2011

Should I Enter a Bug?

This is a question I've come across many times. When should you write a bug report, and when should you just send an email?

Generally speaking, it's best to be safe and enter a bug, since the bug tracking system will keep a history of any back-and-forth conversation that anybody can view. You just don't get that with email. I've seen many justifications for sending an email instead of entering a bug report, but in most cases, it still makes sense to enter a bug report.

Some common reasons for sending an email are:

1) "I don't have time to enter a bug." I'm sorry, but this is a pretty weak reason. If you have time to send an email, you have time to enter a bug. I've gotten long, drawn-out emails with detailed bug descriptions. These emails must have taken 10 to 15 minutes to write. It would have been just as easy to enter the same information in the bug tracker.

Granted, you really may be in a hurry and don't have time to do a "proper" bug report. You may not have time to confirm it can be reproduced. You may not have time to figure out who should be in the notification list. However, entering a bug with as much detail as you have and saying, "I'll clean this up tomorrow" is better than not entering a bug at all (as long as you really clean it up tomorrow). At least it will be tracked, and all stakeholders will be able to see it if they look.

Entering a quick bug will take no more than one extra minute when compared to a quick email. If you see a bug, enter it. Period.

2) "I'm not sure if it's a duplicate." For the most part, you should use the search function of the bug tracker. If you're short on time, see #1. If you wind up entering a duplicate, who cares? As long as you made an effort to find the duplicate, it's no big deal to have your bug closed in the next bug triage.

The only time this might be legitimate is if you're almost positive you've seen the bug before and can't find it in the bug tracker. Then, a quick, "I think this is a duplicate, do you know the bug number?" might be in order. However, if nobody knows the number, or you don't get a response, enter the bug. Again, a duplicate bug every once in awhile is no big deal.

3) "I don't like the bug tracker and email is just easier." There are three options: learn to live with it, find an alternative bug tracker, or find an alternative job. I've worked with many different bug trackers. Some were great. Some were horrible. I lived with the horrible trackers until they were replaced. Don't let the tool you use prevent you from doing your job.

4) "This is a serious bug, and we're going live tomorrow." It's still no excuse. Enter a quick bug (see #1). If you want to be absolutely sure the right people are aware of it, send an email saying, "I just found a show stopper. See bug #X!"

5) "I'm not sure if it's a bug." This is common. You don't want to enter a bunch of "as designed" bugs. On the other hand, you don't want bugs falling into an email void. First, find the specs and see if it is a bug. If so, enter it. If the specs aren't clear, you might email the project manager or some other engineers with a brief description of the problem. If you don't get an answer right away, enter the bug anyway.

However, this really shouldn't be a problem for your projects. A test engineer really should know the specs as well as the developers. If you don't know how the product is supposed to work, how can you test it? Of course, if you are looking at another project and stumble across what might be a bug, this is when you'll wind up having to send an email (or just hunting down the Project Manager and asking).

Thursday, March 10, 2011

Insecurity through Perspicuity

I was having a discussion with a colleague about security. In particular, we were discussing whether or not we should use standard HTML ports for an Internet-facing application. I thought we shouldn't use them because it would increase the risk unecessarily.

When I mentioned this, his response was, "you can't rely on security through obscurity."

I agreed. However, I still insisted that the ports should be changed. My logic was that using standard ports exposes your service to more people, which is insecurity through perpicuity and an unecessary increase in the risk of being attacked.

One example that is sometimes given as an argument against security through obscurity is that you can't just hide your front door with bushes, leave the door unlocked, and expect nobody to break in. This is true, but it doesn't mean that just because you think your house is secure, you should put a note outside that says, "Be back in a week, I put my $50,000 cash in the safe."

By exposing standard ports, you're guaranteeing that every port scanner, even those configured for a minimum scan, will find your server. You may think it's secure, but when a 0-day exploit is discovered, there is a window of opportunity from when it's disclosed to when the patch can be applied on your server where you are vulnerable. No matter how closely you track security vulnerabilities, 0-days are always a risk.

If you're using standard ports, the number of people who are aware that you're running a vulnerable service is going to be many times higher than if you were running on non-standard ports. That means your risk of being attacked before you can patch is also many times higher.

Security in Consumer Network Products

Security is important for all software and hardware network products. However, when a security vulnerability is found in a consumer-grade product, all too often, I hear the argument that since the only attack vector is from the consumer's LAN, the priority to fix the vulnerability is low. After all, if users have intruders on their LAN, the vulnerability is the least of their worries.

Although this may be true to a certain extent, I still would argue against this. There are some reasons to investigate and address security vulnerabilities that may not be a real-world threat to consumers' data.

First, somebody could have a misconfigured wirelesss router. An attacker could get on their LAN withough their knowledge and wind up accessing their data stored on a NAS because of a bug that "nobody would realistically exploit." To be honest, this falls under the "they're on the LAN, so they have bigger problems" umbrella, but it is always an additional attack vector to consider.

Second, when the public finds out about an obvious exploitable security hole, especially one that is easily fixed, it makes the product and company look bad. People start to wonder what other problems are hidden in your product if you let out something that easy to find and fix.

Granted, not every security bug can or should be fixed, otherwise, you'd never release the product. For example, a potential Denial of Service (DoS) attack on a network device may not be a problem if that device is going to be behind a firewall. Even if an attacker gets on the LAN, are they really going to try to crash your media player?

However, regardless of the likelihood of a vulnerability being exploited, each known vulnerability should still be investigated to see how they might affect your product. I've seen cases where a DoS vulnerability was being triggered by a third-party device that was unintentionally sending malformed packets. We got reports from users that their device was crashing for no apparent reason. The logs didn't help, and we were unable to reproduce it. It was only after running a security scanner against the product that we found the vulnerability and were able to tie it to the crashes reported by users.

Of course, the stakes are raised when dealing with SMB or enterprise products. Even with consumer products, once you expose a single port to the WAN, security becomes critical, not just important.

Wednesday, November 24, 2010

SQA and the Scientific Method

My son has been learning about scientific method in his science class. As I've been helping him with his homework, I realized that I use scientific method when I find a bug.

For example, suppose you're testing remote access software installed on a Windows client. You're noticing that on one system, it keeps losing connection to the server. This is something I ran into once. Now, if you're a test monkey, you'll write up a bug saying, "Brokey brokey, no worky" and let development figure it out.

However, if you're reading this blog, you like making it easy for developers. So, you'll wind up asking yourself, "Why does this one system have a problem with disconnecting from the server?" At this point, you've just started approaching this from a scientific point of view.

Next, you'll do some research and elminate variables. What's unique about the one system with the problem? What could cause the connection to drop? Is it the network it's connected to? Is it a bad cable? Does it just not like me?

Once you've decided what could be causing the problem, you'll start with the first hypothesis. You'll want the simplest and easiest to test, so maybe it's the network. You'll test the hypothesis by moving the "bad" computer to the same network as the "good" computer. In fact, you could even use the same network cable that the "good" computer used. If it still fails, you've eliminated three variables (network, cable, and port on the switch). If it works, you've gotten it down to three.

If it still fails, it's back to the hypothesis and experiment loop. You'll want to keep eliminating variables until you find the cause of the problem. Maybe it's faulty hardware. Maybe it's another app. Maybe it's a feature unique to the computer.

In my case, the failing system was a laptop. After some experimentation, I traced the problem to the SpeedStep feature. If I turned that off, it worked fine. I entered the bug. When a developer got it, the root cause was found in minutes. It turned out that the API used to time the 60 second keep alive packet failed if the processor speed changed. When the app launched, the CPU usage was high, so the processor ran at full speed. Once it went idle, it slowed down, which slowed the timer down. Then, it missed the keep alive packet and the server assumed the client had disconnected and closed the pipe.

A good bug report starts with a question, then some reseach. After that, it's a cycle of coming up with a hypothesis, testing it, and repeating until you can prove a hypothesis and find the cause. Finally, you report the findings to a developer through a bug report and, hopefully, get the bug fixed.

Friday, May 21, 2010

Visual Studio 2010 Launch Event

Earlier this week, I went to the Microsoft Visual Studio 2010 launch event. It was informative and gave me a few ideas about how my testing is going to have to change in the upcoming months.

Microsoft seems to be pushing multi-touch screens. Testing a multi-touch screen is going to be different. At first, it's going to have to be mainly manual testing since I don't see any tools that would allow automating it. In fact, I just don't see how you can automate testing gestures. Sure, you could have some sort of simulator or even wire directly into the input stream, but a fully-automated system won't be able to account for human movement. The automated gestures would be too perfect. Touch screen interface testing is all about how it "feels," and you can't automate human perception (at least not yet).

Two more hot points are SharePoint and Windows Phone 7. SharePoint doesn't directly affect my testing, but the company I work for is moving towards it. It's going to change how I organize my test plans and testing tools as well as how I find information such as PRDs and MRDs. It's definitely much better than throwing everything in a network share, but it's going to take some getting used to.

Windows Phone 7 will likely affect my testing. I would be surprised if I don't wind up having to test some apps on that platform in the near future. It looks like it has some cool features, but we'll have to see how it does against the big boys in the mobile market. Sure, the market is expanding, but there are already some major players in there with a strong market share. Even MS will have a hard time gaining a large chunk of share, no matter how good they can make Windows Phone 7.

I did notice a couple of things during the event. Microsoft supplied lunch at the event. In with my sandwich was some fresh fruit. I thought it was kind of ironic that Microsoft chose an apple, so they wound up giving out apples at a Microsoft event.

The other thing I noticed was that just about every person giving a demo wound up lost in their own application. They couldn't find menu items. They didn't see a missing brace in the code. They didn't know where the icon to launch their demo was on the desktop. It pretty much confirms "Troy's Theory of Technical Throngs:" the amount of time it takes to find something on a computer is directly proportional to the number of people watching you.

Wednesday, December 16, 2009

ISPs and Tech Support

I won't name my ISP, but this story unfortunately can apply to most of them.

Last Saturday, my Internet connection was working just fine. I went out for a few hours, came back, and went to look something up. Pages were taking a minute or two to load. I ran a speed test. It took awhile to get loaded, but once it loaded, it was fine. Speeds were good.

My first thought was DNS. So, I switched DNS servers. The problem was still there. I restarted my router. It was still there. So, I thought of latency and started doing some pings. That's when the problem became clear. Pings were low, but I was losing about 1 out of every 10 pings. I tried a few servers, including my ISP's page. They all showed packet loss. A quick test at http://pingtest.net confirmed a packet loss of 8%.

I started thinking my router might be having problems. So, I booted off a Linux live CD, cloned my MAC address to that of the router, and plugged my laptop directly into the ONT in my garage. I got an IP address and repeated the ping test. It showed 10% loss. I pinged my gateway, and it showed loss as well.

So, at this point, I figured I had done my homework. The problem was either my ONT, the ISP's gateway, or something in between. I tried resetting my ONT, but it didn't help. I decided it was time to call tech support. The problem was, I didn't have the number. Luckily, my connection was still somewhat working, and I managed to find it, after about 30 minutes of waiting for pages to load.

As soon as the support agent answered, I had a feeling I was doomed. It was Saturday night, and the call was obviously outsourced. Still, I had some faith in the system. The call went something like this:

Agent: How can I help you?

Me: I'm experiencing 10% packet loss between the ONT and the gateway. This is causing web pages to take several minutes to load, if they even load at all. I've already ruled out my router by connecting my laptop directly up to the ONT. I reset the ONT, but it didn't help. I'm going to need somebody to come out here and fix it.

Agent: I'm sorry you are having problems. I have a few things you can try.

At this point, I know I'm going to have to play along for a few minutes. After all, he doesn't know that I actually know what I'm talking about.

Agent: Could you try rebooting your router. Go to the router, disconnect power for 30 seconds, then reconnect it.

I know this won't work, but I play along anyway. "Ok. I'm doing that." My router was upstairs, and I was downstairs, so I figured about 90 seconds was enough time to have waited to tell him it was done "rebooting."

Agent: Can you try going to [link for the ISP's speed test]?

Me: Sure.

I waited for the page to load. While I'm waiting, I ran a few pings. Packet loss is up to 30%.

Me: It's still a not loading. It's a blank page. This isn't a speed problem, though. I'm seeing packet loss, and it's getting worse.

Agent: How long has it been like this?

Me: It was fine this afternoon. Three hours later, I saw some problems with 10% loss. I called you, and in the last 15 minutes, it went up to 30%.

Agent: Okay. Let me check with a network technician.

After 5 minutes on hold:
Agent: He wants you to connect your router back to the ONT, not your laptop.

Me: I already did that.

Agent: Okay. Try resetting your router. Power it off, hold down the reset button, and bring it back up.

I have plenty of port forwarding rules, firewall rules, custom DNS entries, and static DHCP entries. I'm not about to wipe all those out for something that's not even the router's problem. So:

Me: Okay, but I need to back up my settings first.

Agent: Oh, that would be a good idea.

Again, I figured a couple of minutes is enough time to have "reset" the router.

Me: Okay, it's reset. I'm trying the speed test page, and it's not loading at all now.

Agent: Let me check with the technician.

While on hold, I connected the laptop back up to see how much worse it had gotten. I didn't get an IP address. I rebooted my router. It couldn't get an IP address. I've reach 100% packet loss.

Agent: He's going to reprogram your ONT.

This sounded promising. I was wrong, but I did get my hopes up for a little bit.

Me: Okay. By the way, it's gotten worse. My router won't even get an IP address now. I'm seeing 100% loss and have a completely dead connection.

Agent: Okay. Please hold.

15 minutes later, he came back.

Agent: Okay. He wants you to reset the ONT. Unplug the power to the ONT for 1 minute.

Me: Sure. By the way, can you just connect me to the network technician so I don't have to keep waiting on hold.

Agent: Sorry, we don't have a way to do that.

Me (rolling my eyes at this point and getting frustrated): I'm resetting it. Do I have to disconnect the battery?

I knew the answer, but just had to ask.

Agent: No. Just the AC

BUZZ! Sorry, wrong answer. Unplugging the AC from a device with a battery backup inside it isn't going to do much.

Me: I unplugged it. The lights are still on.

Agent: One second.

On hold again.

Agent: Okay. Sorry. Unplug the battery too.

Me: Okay.

I unplugged the battery, and the ONT went dead. I waited 60 seconds, and plugged everything back in.

Me: Okay. It's coming up. The fail light is flashing, just like it should while it tries to connect....okay....it's still flashing. It should've connected by now...still nothing.

Agent: Okay. Please hold.

......

Agent: Hmm...can you try going to the speed test site?

Me: The ONT's not connecting. I won't be able to.

Agent: Can you try restarting the router?

Me: The ONT's not connecting. The router has nothing to do with it. My router's not even getting a link now. My connection is dead.

Agent: Please wait....

About 15 minutes later

Agent: The technician can't even reach your ONT. We're going to have to send somebody out.

FINALLY!

Me: Thank you.

Agent: Unfortunately, our ticketing system is down, so I can't create the ticket. Somebody will call you.

Me: When will that be?

Agent: I don't know. Our system is down.

Me: Well, it's Saturday night. If your system comes back up as soon as you hang up, will somebody be calling me tonight, or will it have to wait until Monday at the earliest?

Agent: Our system is down.

Me: Yes, I know, but will I get a call, say, tomorrow, if the system comes back up, or do you only call back during regular business hours?

Agent: During regular business hours.

Me: Okay. Thank you.

Agent: Is there anything else I can do for you?

I look at my phone. It's been an hour and 43 minutes since I called.

Me: No. It's been almost two hours as it is, so I don't think there's much else you can do.

Agent: Thank your for choosing [ISP]

So, at that point, after spending over 1 1/2 hours on the phone, it looked like my connection was going to be down awhile. They were going to send somebody out, which was what I asked for 2 minutes into the call.

Sunday morning, I turned on my cell phone. I had a voice mail from the ISP. They were sending somebody out at 2:00 and I needed to call back if that wouldn't work. Had I known they might call, I would have left the phone on. But, at least it was going to get fixed.

I went to the store. At 1:30, I got a call from the dispatcher saying that the technician was on his way. I started heading home. At 1:35, I got a call from the technician saying he was at my house. I told him I was 5 minutes away.

I rushed home, let him into the garage, and explained what I had been seeing. At this point, the ONT was dead. The first thing he did was replace it. It took him five minutes. I came back out in the garage and saw him staring at the ONT. The fail light was flashing. Apparently, the ONT wasn't the problem.

He stood there for a good five minutes, just staring at the pretty blinking light. Then, he started to talk to somebody on his phone. After he hung up, he said he was going to try re-authorizing the ONT and went back to his truck. He came back and said his connection from the truck was down and would have to call it in for somebody to do manually.

Irony.

A few minutes later, he rang the bell. He said the problem wasn't the ONT or in the box in the front yard. It was possibly in the hub, so he was leaving, but wanted me to know he was still working on it.

He finally came back and said it was fixed. It took an hour, but in this case, it was understandable. It was a problem in the central office. Normally, a CO problem would impact dozens of people, and they'd know the problem was there because of the high volume of calls. In this case, it was only me, so it didn't look like a CO problem. I guess I was just lucky.

Wednesday, April 22, 2009

Data-Driven Testing

A few years ago, I undertook a project to make automating printer drivers easier for QA. The idea was to use INI files to define the tests so that people who don't know how to develop can automate new tests. I wound up using a set of INI files to define applications, driver UIs, and test cases.

I didn't realize it at the time, but I was essentially making a data-driven test. Test cases were added simply by editing a simple file. It even went beyond the traditional data-driven test setup by supporting black box monkey testing, default values for most functions (such as falling back to CTRL + O to bring up the open dialog in an application), and remote monitoring and control of the test through the network.

Data-driven testing makes life easier for QA as well as the automation developer. Instead of having to add lines of code, anybody could just edit a file to add test cases. It's much more complicated to create data-driven automation, and it isn't appropriate for every test plan, but when it does work, it works well.

The Art of SQA

Thursday, July 28, 2011

Should I Enter a Bug?

Thursday, March 10, 2011

Insecurity through Perspicuity

Security in Consumer Network Products

Wednesday, November 24, 2010

SQA and the Scientific Method

Friday, May 21, 2010

Visual Studio 2010 Launch Event

Wednesday, December 16, 2009

ISPs and Tech Support

Wednesday, April 22, 2009

Data-Driven Testing

Followers

Blog Archive

About Me

Thursday, July 28, 2011

Thursday, March 10, 2011

Wednesday, November 24, 2010

Friday, May 21, 2010

Wednesday, December 16, 2009

Wednesday, April 22, 2009

Subscribe To The Art of SQA

Followers

Blog Archive

About Me