Monday, April 10, 2023

Selecting technology for a software system


Selecting technology for a new software system is challenging. There are very diverse technologies available and opinions on each are equally diverse. You may find some people with the opinion that the selection isn't hard because there is only one good choice (or close to it) - in their opinion given their requirements and assumptions.  

Why do I feel the selection is difficult? There are a number of reasons. 

  • Is there a perfect programming language for all needs?
    • No; each language has trade-offs.
  • In a number of cases, a specific language may be a good choice but you also need to consider different runtime environments for a language and the inherent trade-offs.
  • Different business domains / uses may match up better to specific languages or technology stacks.
  • A language / runtime isn't a solution on its own; you also need to consider the overall environment in which the software will operate.
  • Some software systems require higher security and some technology stacks are more mature in that area.
  • Some systems are especially sensitive to the speed of implementation.
  • The expected lifetime of a system and maintenance needs affects choices.
  • The maturity of the 3rd party ecosystem for a language / tech stack affects choices.
  • Another item which tends to be considered but usually without thinking about it is - what languages are your staff skilled with?
  • And then you have the choices affecting performance, scalability, reliability, cost, etc.
The above list is a bit abstract in nature.  What are some very real and important examples?

A common language that comes up often as the "golden choice" is Python. Is Python a good language - yes, for appropriate uses/domains.  I won't dive into what are good uses for it, I tend to prefer knowing how to rule out items so I can get to a short list for final selection. In the case of Python, what aspects are worth considering? 
  • Memory handling
  • Threading / Giant lock
  • Build time / dependencies
  • Backers of the language
One aspect of memory handling that is problematic (at a minimum for CPython) is that there is no control of limits by the runtime system. In standalone / local applications, this isn't a huge burden but if you are using a containerization / orchestration type systems such as Docker or Kubernetes (K8S) then you run into a dilemma.  Docker / K8S enable you to set memory limits and when those limits are exceeded, generally the container is killed and restarted.  The CPython runtime isn't aware when there is memory pressure and isn't tuned to attempt to prevent over-allocation.  To prevent containers from over allocating memory and being killed - more headroom memory is needed per container instance.  This can reduce the efficient use of memory which increases cost.  If you try to run more CPython processes in a container to improve generally efficiency - you can cause more workloads to restart due to a single over-allocation. The result is poor user experience. Increasing the number of container instances requires adequate memory for each which translates into higher costs.  

CPython also has a giant lock which impedes its multi-threading capabilities (Python GIL / Threading). So managing high request rates in a web application may provide some challenges. You may think it is easily resolved by use of some messaging type system (maybe Kafka, MQ, etc) but if the client libraries don't work-around the inherent issue then you end up with problems that may be hard to diagnose and even harder to fix. 

If you are using a standard interpreted version of Python then you don't really need to worry about build time related to your actual code but you may need to deal with the time required for building any required native dependencies. There are many libraries that work with Python but they are implemented using native libraries which the Python runtime loads and uses. This often means you need C/C++ and / or other compilers to enable building the native libs. This can be slow and error prone when working with containers initially. Given time and effort, you can create a solid build process but this is something that brings the overall complexity of creating, containerizing and deploying a software system to levels similar to compiled languages.

Every popular language remains popular by changing enough to meet new needs and challenges in the software development ecosystem. In some cases, languages are backed by committees (C++) and others have a community/large organization as the primary backing (Java, .Net). For Python, Guido appears to maintain primary control and he has done a wonderful job. The question is - someday, when Guido decides to step away completely - who or what controls the direction of the language and will it diverge much from Guido's current direction?  One link regarding the "no future Python 4" path is: No Python 4.
If you invest heavily in Python and something changes significantly (and quickly) then it could be very costly to change. At the same time, if a language looses popularity after its creator moves on then you may also be in a bad state.

The above coverage of Python is mainly targeted at CPython. The analysis for a different runtime, such as Jython, would need an independent review.  Performance, backing/maintenance and implementation details can be very different.

Another common language stack is NodeJS / JavaScript. It is often touted as high performance and has a large ecosystem of 3rd party libraries.  What aspects of this might affect the consideration for some software system?

Two item that comes up regularly for me are
  • The maturity of the 3rd party ecosystem. 
  • The expected lifetime of a system.
The problem I tend to run into is that there are many 3rd party libraries but the quality differs drastically and the long term support/maintenance of some frameworks is lacking. If you have a system which you expect to maintain for 5-10 years then you probably want to base it on technology which gets continued incremental maintenance over time.  Relying on unmaintained libraries and frameworks when security is important seems like a poor bet. You also don't want to do massive rewrites of a system every year because some new "great idea" arrived.  I've run into situations where core libraries used in NodeJS applications were abandoned by authors in favor of other completely different solutions.  I'd even agree with the authors decisions to abandon something which was no longer a good fit - but it isn't a good place to be in if you heavily rely on that software in a large or complex system.

Another popular language is Java - what are some considerations for it?
  • Speed of system implementation
  • Business domain for new system
Java is a good general language but it isn't normally considered as the first choice when "speed to market" is the most critical aspect. If you have a very limited time frame and no other requirements push you towards Java then another language may be more appropriate.  That is also true if your business domain is potentially in a scientific area where there are fewer 3rd party libraries available compared to a language such as Python. 

And if you have requirements which cross a number of these items resulting in no "perfect for this use" selection then you have hard decisions to make. 

This post could go on for many more examples and languages but hopefully provides a useful analysis for some of the challenges involved.

Wishing you the best!
Scott