CS 530 - Advanced Software Engineering

DevOps and Code Management

Reference: Sommerville, Engineering Software Products, Chapter 9

 

DevOps

Traditionally, separate teams were responsible software development, software release and software support. The development team passed over a 'final' version of the software to a release team. This team then built a release version, tested this and prepared release documentation before releasing the software to customers. A third team was responsible for providing customer support. The original development team were sometimes also responsible for implementing software changes. Alternatively, the software may have been maintained by a separate 'maintenance team'.

There are inevitable delays and overheads in the traditional support model. To speed up the release and support processes, an alternative approach called DevOps (Development+Operations) has been developed. Three factors led to the development and widespread adoption of DevOps:

  1. Agile software engineering reduced the development time for software, but the traditional release process introduced a bottleneck between development and deployment.
  2. Amazon re-engineered their software around services and introduced an approach in which a service was developed and supported by the same team. Amazon's claim that this led to significant improvements in reliability was widely publicized.
  3. It became possible to release software as a service, running on a public or private cloud. Software products did not have to be released to users on physical media or downloads.

DevOps principles

Benefits of DevOps

Code management

During the development of a software product, the development team will probably create tens of thousands of lines of code and automated tests. These will be organized into hundreds of files. Dozens of libraries may be used, and several, different programs may be involved in creating and running the code. Code management is a set of software-supported practices that is used to manage an evolving codebase. You need code management to ensure that changes made by different developers do not interfere with each other, and to create different product versions. Code management tools make it easy to create an executable product from its source code files and to run automated tests on that product.

Source code management, combined with automated system building, is essential for professional software engineering. In companies that use DevOps, a modern code management system is a fundamental requirement for 'automating everything'. Not only does it store the project code that is ultimately deployed, it also stores all other information that is used in DevOps processes. DevOps automation and measurement tools all interact with the code management system.

Code management systems provide a set of features that support four general areas:

All source code management systems have a shared repository and a set of features to manage the files in that repository. All source code files and file versions are stored in the repository, as are other artefacts such as configuration files, build scripts, shared libraries and versions of tools used. The repository includes a database of information about the stored files such as version information, information about who has changed the files, what changes were made at what times, and so on. Files can be transferred to and from the repository and information about the different versions of files and their relationships may be updated. Specific versions of files and information about these versions can always be retrieved from the repository.

Features of code management systems

In 2005, Linus Torvalds, the developer of Linux, revolutionized source code management by developing a distributed version control system (DVCS) called Git to manage the code of the Linux kernel. This was geared to supporting large-scale open source development. It took advantage of the fact that storage costs had fallen to such an extent that most users did not have to be concerned with local storage management. Instead of only keeping the copies of the files that users are working on, Git maintains a clone of the repository on every user's computer.

Benefits of distributed code management

Branching and merging are fundamental ideas that are supported by all code management systems. A branch is an independent, stand-alone version that is created when a developer wishes to change a file. The changes made by developers in their own branches may be merged to create a new shared branch. The repository ensures that branch files that have been changed cannot overwrite repository files without a merge operation. If Alice or Bob make mistakes on the branch they are working on, they can easily revert to the master file. If they commit changes, while working, they can revert to earlier versions of the work they have done. When they have finished and tested their code, they can then replace the master file by merging the work they have done with the master branch.

DevOps automation

By using DevOps with automated support, you can dramatically reduce the time and costs for integration, deployment and delivery. Everything that can be, should be automated is a fundamental principle of DevOps. As well as reducing the costs and time required for integration, deployment and delivery, process automation also makes these processes more reliable and reproducible. Automation information is encoded in scripts and system models that can be checked, reviewed, versioned and stored in the project repository.

Aspects of DevOps automation

System integration (system building) is the process of gathering all of the elements required in a working system, moving them into the right directories, and putting them together to create an operational system. Typical activities that are part of the system integration process include:

Continuous integration simply means that an integrated version of the system is created and tested every time a change is pushed to the system's shared repository. On completion of the push operation, the repository sends a message to an integration server to build a new version of the product The advantage of continuous integration compared to less frequent integration is that it is faster to find and fix bugs in the system. If you make a small change and some system tests then fail, the problem almost certainly lies in the new code that you have pushed to the project repo. You can focus on this code to find the bug that's causing the problem.

In a continuous integration environment, developers have to make sure that they don't 'break the build'. Breaking the build means pushing code to the project repository which, when integrated, causes some of the system tests to fail. If this happens to you, your priority should be to discover and fix the problem so that normal development can continue. To avoid breaking the build, you should always adopt an 'integrate twice' approach to system integration. You should integrate and test on your own computer before pushing code to the project repository to trigger the integration server.

Continuous integration is only effective if the integration process is fast and developers do not have to wait for the results of their tests of the integrated system. However, some activities in the build process, such as populating a database or compiling hundreds of system files, are inherently slow. It is therefore essential to have an automated build process that minimizes the time spent on these activities. Fast system building is achieved using a process of incremental building, where only those parts of the system that have been changed are rebuilt.

Continuous integration means creating an executable version of a software system whenever a change is made to the repository. The CI tool builds the system and runs tests on your development computer or project integration server. However, the real environment in which software runs will inevitably be different from your development system. When your software runs in its real, operational environment bugs may be revealed that did not show up in the test environment. Continuous delivery means that, after making changes to a system, you ensure that the changed system is ready for delivery to customers. This means that you have to test it in a production environment to make sure that environmental factors do not cause system failures or slow down its performance.

Benefits of continuous deployment

In an enterprise environment, there are usually many different physical or virtual servers (web servers, database servers, file servers, etc.) that do different things. These have different configurations and run different software packages. It is therefore difficult to keep track of the software installed on each machine. The idea of infrastructure as code was proposed as a way to address this problem. Rather than manually updating the software on a company's servers, the process can be automated using a model of the infrastructure written in a machine-processable language. Configuration management (CM) tools such as Puppet and Chef can automatically install software and services on servers according to the infrastructure definition

Defining your infrastructure as code and using a configuration management system solves two key problems of continuous deployment. Your testing environment must be exactly the same as your deployment environment. If you change the deployment environment, you have to mirror those changes in your testing environment. When you change a service, you have to be able to roll that change out to all of your servers quickly and reliably. If there is a bug in your changed code that affects the system's reliability, you have to be able to seamlessly roll back to the older system. The business benefits of defining your infrastructure as code are lower costs of system management and lower risks of unexpected problems arising when infrastructure changes are implemented.

Characteristics of infrastructure as code

A container provides a stand-alone execution environment running on top of an operating system such as Linux. The software installed in a Docker container is specified using a Dockerfile, which is, essentially, a definition of your software infrastructure as code. You build an executable container image by processing the Dockerfile. Using containers makes it very simple to provide identical execution environments. For each type of server that you use, you define the environment that you need and build an image for execution. You can run an application container as a test system or as an operational system; there is no distinction between them. When you update your software, you rerun the image creation process to create a new image that includes the modified software. You can then start these images alongside the existing system and divert service requests to them.

DevOps measurement

After you have adopted DevOps, you should try to continuously improve your DevOps process to achieve faster deployment of better-quality software. There are four types of software development measurement:

As far as possible, the DevOps principle of automating everything should be applied to automating software measurement. You should instrument your software to collect data about itself and you should use a monitoring system to collect data about your software's performance and availability. Some process measurements can also be automated. However, there are problems in process measurement because people are involved. They work in different ways, may record information differently and are affected by outside influences that affect the way they work.

Useful links