While a project faces several issues during its lifecycle, poor reliability issues are critical as these can lead to failed projects. These are also difficult to resolve. If the functionality does not work, then it is possible to find the root cause and implement corrective action. If the problem is intermittent, then it is a big challenge to even diagnose the problem.
I would like to highlight two instances of poor reliability and the corrective action that helped.
In the first case, a Personal Computer (PC) running Microsoft Windows 95 was used along with an custom built add-on card to provide interactive audio video services over cable television system. The services were disrupted sometimes and the reason was that PC crashed. The service could be restored only by rebooting the computer. As there were several software components, a careful check of the application software did not reveal a problem, the fault was assumed to lie with operating system software. The short term fix was to detect the PC crash and provide a hardware trigger to reset the PC. The long term fix was done by moving to an embedded hardware with reliable real time operating system.
In the second instance, the PCMCIA modem that worked with laptops for Wireless Internet connectivity was used in an embedded environment for transferring equipment health data. During the tests, it was found that the modem operation was intermittent. We tried to reproduce the error in the laptop environment apart from contacting the vendor for advice. The vendor suggested using a new version of the modem cards. After extensive debugging with alternate wire-line modems, which had high reliability, we traced the problem to bugs in the TCP/IP stack supplied by the real time OS vendor. As these problems surfaced during the later part of project, this led to crisis situation, requiring fire fighting actions which are costly and detrimental.
In both the above cases, the issues resulted from trying to use Commercial Off The Shelf (COTS) HW/SW for aggressive time to market and low cost product needs, while ignoring the reliability issues. By focusing on the reliability requirements during the requirements phase and ensuring appropriate design choices as well as early prototyping to find out any reliability issues, projects can handle such issues effectively.
"Blue Screen of Death" (Credit:Masem Via Commons) |
In the first case, a Personal Computer (PC) running Microsoft Windows 95 was used along with an custom built add-on card to provide interactive audio video services over cable television system. The services were disrupted sometimes and the reason was that PC crashed. The service could be restored only by rebooting the computer. As there were several software components, a careful check of the application software did not reveal a problem, the fault was assumed to lie with operating system software. The short term fix was to detect the PC crash and provide a hardware trigger to reset the PC. The long term fix was done by moving to an embedded hardware with reliable real time operating system.
In the second instance, the PCMCIA modem that worked with laptops for Wireless Internet connectivity was used in an embedded environment for transferring equipment health data. During the tests, it was found that the modem operation was intermittent. We tried to reproduce the error in the laptop environment apart from contacting the vendor for advice. The vendor suggested using a new version of the modem cards. After extensive debugging with alternate wire-line modems, which had high reliability, we traced the problem to bugs in the TCP/IP stack supplied by the real time OS vendor. As these problems surfaced during the later part of project, this led to crisis situation, requiring fire fighting actions which are costly and detrimental.
In both the above cases, the issues resulted from trying to use Commercial Off The Shelf (COTS) HW/SW for aggressive time to market and low cost product needs, while ignoring the reliability issues. By focusing on the reliability requirements during the requirements phase and ensuring appropriate design choices as well as early prototyping to find out any reliability issues, projects can handle such issues effectively.
1 comment:
Well I am a project manager and have been going through the guide to Scrum Body of Knowledge by Scrumstudy which provide a complete guide for the scrum project. I highly recommend this books to all those who are planning to implement scrum in your organization. You can go directly to http://www.SCRUMstudy.com for first chapter is available there.
Post a Comment