Sunday, October 27, 2013

Dealing with black/gray swans

A black swan  in a complex system, as popularized by Nassim Taleb  is a metaphor for a large impact, rare event that comes as a complete surprise to all stakeholders. A gray scan is a metaphor for an event with large impact  with very low probability, with the result that most stake holders ignore.  Usual risk management practices deal with known knowns where the adverse event occurrence as well as impact are both predictable.
Picture of swans
Credit:Arjuna based on Marek Szczepanek(Wikimedia)

As the world is more digitized and interconnected and is dependent on large complex information systems,  stakeholders are increasingly facing  black/gray swans.  The impact increases as most of these are unique, connected and closed systems like mobile phone network, power grid and applications based on Internet.  The glitches and shutdowns are regularly chronicled in IEEE Spectrum risk factor blog.

In an excellent paper "Management of Hidden risks", IEEE Computer, January 2013, (paywall)  the author Kjell Jorgen Hole recommends few suggestions to deal with gray swans based on the experiences from the outages in Norwegian Mobile phones, Electronic voting systems, and bank payment authorization systems based on public key infrastructure. The suggestions include  identifying the dependencies between systems and ensuring that the system can continue to run for a minimum period by using back up system (for Mobile networks), providing an alternative mechanism (like paper based ballot for e-ballot) and alternative authentication mechanisms and confirmation messages (for banks). It is useful for project/engineering managers to learn from these and  plan for  dealing with gray swans.

Monday, October 21, 2013

Supply and demand for Project Management skills in India

The recently released report by  KPMG-PMI on  Schedule and Cost overruns on Infrastructure projects in India  is interesting to read. The summary is that lack of qualified project manager and other engineering resources along with  external issues such as regulatory delays,  site handover, poor scope management are the major causes for Schedule overrun.   Poor resource,  procurement and scope  management are identified as some of  the major issues causing cost overrun.  As per the report, these  can be addressed easily  by improving the PM capability of the organization.
Credit:KPMG In India & PMG Study

Lack of skilled manpower, impacting schedule  is felt deeply in  Coal, Steel sectors followed by Power, Roads&Highways and then by Railways, Telecommunication. Scope creep, Design change, Environmental safety are major causes affecting cost overrun in Steel and Civil Aviation sectors. Telecom sector seemed did no have cost overrun.

86% of survey participants have  expressed the need for  PMOs to address the Schedule and Cost overruns and some have already established an internal PMO/engaged an external PMO. 56%  have reported use of Risk Management practices and  76% stated that their Risk management is effective.  Therefore, the conclusion that  poor risk  practice could be the reason,  based on the fact that 53% are behind schedule and 34% overspend seemed a bit surprising, though the report  covered various issues that impacted the project in the preceding chapters.

Taking the case of construction sector, the demand for project managers is estimated at 70,000 in 2010 moving to 2,27,500 by 2022.  Supply was projected to be 1,20,000 as per the twelfth five year plan. As the number of civil engineering seats have not grown well in the past(possibly due to the lure of CS/IT), it was recommended to introduce PM as a subject in Civil Engineering.

The report also provides a set of recommendations to expedite infrastructure projects. Three tier PMO (National,State, Implementation level) and setting up internal Project Academy in each organization as was done successfully by  IT industry are some of the recommendations. The report also had few case studies on how different organizations are dealing with the challenges. While the report looks good overall,  it could have been improved by use of visuals from Indian construction scene, rather than some stock photography of western country infrastructure projects. 

When I joined  engineering 29 years back, I opted for Electronics & Communication Engineering, which was least popular. I was glad that I  was able to land a government job through campus placements, when I finished the course. My civil engineering friends  had a hard time getting jobs. Some of them moved to Computer Science(CS)/Information Technology(IT) for their  post graduation course and  had made a good career subsequently.  Looks like what is required is the reverse now, as there is a glut of CS/IT engineers, many of whom could be easily trained for Project Management through short term bridge courses, which the report authors  seem to have missed.

Monday, October 14, 2013

Dealing with poor reliability issues

While a project faces several issues during its lifecycle, poor reliability issues are  critical as these can lead to failed projects. These are also difficult to resolve. If the functionality does not work, then it is possible to find the root cause and implement corrective action. If the problem is intermittent, then it is a big challenge to  even diagnose the problem.
Blue Screen of death in a presentation (via commons)
"Blue Screen of Death"  (Credit:Masem Via Commons)
I would like to highlight two  instances of poor reliability  and the corrective action that helped.

In the first case, a Personal Computer (PC) running Microsoft Windows 95 was used along with an custom built add-on card to  provide interactive audio video services over cable television system.  The services were disrupted sometimes and the reason was that PC crashed.  The service could be restored only by rebooting the computer. As there were several software components,  a careful check of the  application software  did not reveal a problem, the fault was assumed to lie with operating system software. The short term fix was to detect the PC crash  and provide a hardware trigger to reset the PC. The long term fix was done by moving to an embedded hardware  with reliable real time operating system.

In the second instance, the PCMCIA modem that worked with laptops for Wireless Internet connectivity  was used in an embedded environment for transferring equipment health data.  During the tests, it was  found that the modem operation was intermittent. We tried to reproduce the error in the laptop environment apart from contacting the vendor for advice.  The vendor suggested using a new version of the modem cards. After extensive debugging with alternate wire-line modems, which had high reliability, we traced the problem to  bugs in the TCP/IP stack supplied by the real time OS vendor.  As these problems surfaced during the later part of project, this led to crisis situation, requiring fire fighting actions which are costly and detrimental.

In both the above cases, the issues resulted from trying to use Commercial Off The Shelf  (COTS) HW/SW for aggressive time to market  and low cost product needs, while  ignoring the reliability issues. By focusing on the reliability requirements  during the requirements phase and ensuring appropriate design choices as well as early prototyping to find out any reliability issues, projects can handle such issues effectively.

Monday, October 07, 2013

Making sense of state of Project Management

Several professional and consulting organizations  publish surveys of  Project Management  every year.  I used to be a  big believer in the past  but became skeptical  in the recent years, as there seems to be weaknesses/bias  in the survey design, administration and analysis.  I give couple of examples  to  support my change in belief and suggest the need for  organization relevant surveys.

Standish Groups' CHAOS study is famous for  painting a bleak picture of software due to high rates reported in its survey findings. In 1994, it was reported that  a shocking 16 percent projects were  successful, another 53 percent of the projects were challenged, and 31 percent failed outright. While the numbers improved in subsequent years, still the  issue highlighted remained the same that software projects are out of control. In 2010, J. Laurenz Eveleens and Chris Verhoef of  Vrije Universiteit Amsterdam published "The Rise and Fall of the Chaos Report Figures"(PDF opens in a new window), which highlighted the major flaws in the study and its impact based on an independent database  of Projects while following the methodology of Standish Group. The  research concluded  that the Standish definitions of successful and challenged projects  are misleading, one-sided, pervert the estimation practice, and result in meaningless figures.

Recently I have come across PMI 2013 Pulse of the Profession report (PDF opens in a new window) and read with interest the claim that  organizations risk, on average, $135 million for every billion dollars spent. Low-performing organizations, however, risk 14 times more money than their high-performing counterparts, Talent management, Standardization of practices and tools and Strategic alignment were identified as key focus areas to become high maturity organisations, which have reported 90% project success. In order to investigate the survey in more detail and I accessed the question wise responses in Pulse Interactive Report (accessible to members). 

Based on my preliminary analysis, I  found that the definition for success used for the survey is delivering the project's initial scope  within the   initial time and budget estimates. When I looked at the reasons for failure, I found  "Overall Change in organization's priorities" and "Inaccurate requirements gathering" as top ranked. This is not surprising if the triple constraint is the  one driving the survey design. In order to apply the findings, the question that needs to be asked is whether your organization is still  following the triple constraint.  The PMBOK 4th Edition leaves out the  definition of  the project success. In the fifth edition it is defined  in relation to last baselines approved by authorized stakeholders. As the survey participants may not have consistent idea of project success, the responses may not be consistent. The survey findings need to be taken with a pinch of salt, if your project is exploratory and software intensive.

As we enter the last quarter of the year, this is right time to assess the project performance  during the preceding twelve months within your organisation with a custom survey designed to identify the strengths and improvement areas of interest  to plan for  the next  year. Even if the number of projects is small, the results will be much more relevant and useful.

What has been your experience with relating to the surveys and also in house surveys? Please share the same.