research-school-2024/writeups/formal.md
Arija A. 290455c64e
Make the credits wording nicer.
Signed-off-by: Arija A. <ari@ari.lt>
2025-03-09 23:09:53 +02:00

64 KiB

Effect of General Security Checklists on Secure Development Practices and Code Quality in Novice Developers

Abstract: This paper examines how security checklists impact the secure development practices and code quality in novice developers, especially within web application development using the Flask framework. In a controlled experiment, four university students were asked to implement a sleep tracking system using the Flask web microframework, then later asked to improve it by either using a short or a comprehensive security checklist that guided them in their development. This research studies how such checklists, informed by the current state of industry and academic standards, drive the identification and mitigation of common security vulnerabilities, such as XSS, SQL injection, and poor key management. Using automated and manual code reviews, this study assesses the efficiency of such checklists in improving both security and general code quality, and hence their potential value in academic and professional environments.

Authors: Arija A. <ari@ari.lt> & Simona R. <simona.ramanauskaite@vilniustech.lt> (co.)

Credits: https://researchschool.tech/ for making the ResearchSchool project happen.

Date: 2025-01-24 (January 24th, 2025)

Last modified: 2025-01-26 (January 26th, 2025)

Table of Contents

  • § License
  • § Introduction
    • § Structure of the Project
  • § Background
    • § Web Application Security Challenges
    • § The Role of Security Checklists
    • § The Flask Framework and Its Security Considerations
    • § Research Motivation
  • § Objectives
    • § Literature Review
    • § Experiment Design
    • § Results Analysis
    • § Documentation
  • § Methodology
    • § Research Stages
    • § Manual Analysis Criteria
    • § Study Group Composition
      • § Short Checklist
      • § Extended Checklist
    • § Task
    • § Methodological Rigor
  • § Results
    • § Baseline
    • § Quantitative Results
    • § Automatic Code Analyzers' Results
    • § Manual Code Review
  • § Analysis of the Results
    • § Key Findings
  • § Recommendations
  • § Conclusion
  • § Future Research
  • § Literature and Citations

License

Copyright (C) 2025 Arija A. <ari@ari.lt>

This work is licensed under the GNU Affero General Public License (AGPL) version 3.0 or later. You may obtain a copy of the license at:

https://www.gnu.org/licenses/agpl-3.0.html

This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

If you modify this work and distribute it, you must include this notice along with the modified work and provide a copy of the AGPL license.

Introduction

The integration of best security practices into software and library development is becoming increasingly trivial as we head towards a more digitised world. As cyber threats continue to evolve, affecting increasingly more important sectors of our society, ensuring that applications are secure from the outset is no longer optional but a necessity. This paper explores how security can be integrated into the development process of software and libraries, focusing more so on Web applications where a lot of the modern-day world and e-commerce run. The paper will verify whether, through a developed set of guidelines and checklists from current standards in the industry and academic literature, its application will provide any substantial security and quality in web applications by novice developers.

This research is based on a controlled experiment in which university students were asked to develop a sleep tracking system using Flask. By applying two different security checklists, a short and focused one and one more comprehensive, the study was able to evaluate such tools in treating critical security vulnerabilities like XSS, SQL Injection, and weak key management. This paper reflects on the findings and discusses the implications for both the security and quality of code produced in academic settings and professional environments alike. We review in detail the manual and automated code reviews carried out on students' projects, using specific tools such as Pyright, Pylint, Bandit, and OWASP ZAP, together with a manual security audit.

In the next sections, we present the methodology of the research, describe the security checklist design, and then analyze the results of the experiment to see how these findings can be used to drive best practices in (web) application development both in academia and industry.

Structure of the Project

A flow chart demonstrating the timeline of this research: Literature analysis, checklist creation, system development, testing, improvements, more testing, results discussion, and ending with a results analysis.

Background

In modern-day world, where everything is going digital and major parts of our society depend on the technology and critical software developed by people, secure development is the only way to avoid critical exposure of data or failing of important systems. Moreover, high code quality is important to enable these systems to evolve and adapt over time, rather than becoming an unmanageable legacy that requires much effort to change. We have, therefore, joined the ResearchSchool project to help investigate how secure programming practices and code quality in general are impacted by security checklists. By systematically changing the development process and following proper security practices, we can get better at writing better applications.

Web Application Security Challenges

Web applications, due to their openness and networked nature, are vulnerable to a wide range of attacks. Amongst the most common attacks we can observe XSS, SQL injection, CSRF attacks, and authentication problems. These vulnerabilities, if left unaddressed, can lead to undesirable consequences including data breaches or data loss, system compromise (client- or server- side), and loss of trust. The risks of these vulnerabilities are heightened in applications developed by novice programmers, who may not have sufficient awareness or training on secure coding practices, which is why it is imperative that all developers, regardless of status, stay up to date with best security standards, and continuously integrate security in their development processes.

Because of these challenges, different organizations and standards have developed security frameworks and guidelines that can help to mitigate these risks. Among the best known frameworks is the OWASP Top 10 List, which outlines the most critical web application security risks. OWASP Top 10 is a list of the most dangerous, common vulnerabilities that a developer should know about and try to avoid during the development phase to avoid future compromises. Nevertheless, even with such recommendations, researchers proved that many developers still cannot properly implement proper security in their code, highlighting the difficulty of writing secure code and implementing application features in a secure manner.

The Role of Security Checklists

Security checklists are significant in that they bridge between theoretical and practical aspects concerning secure software development. Typically, a security checklist comprises a set of guidelines to minimize threats identified by an application threat model before, during, and even after the implementation of new features. This proactive approach should not target code security alone but code quality as well, which is an important concern to ensure that developers write secure and maintainable code. Such maintainability is important to enable future changes and enhancements of the code later on in the future.

A good security checklist should be complete and detailed, it should also evolve over time as more and more features get added or removed. It should also be ranging over issues such as possible threats, secure coding, configuration management, encryption strategies, and other very important things that can be done in order to minimize the attack surface of the threat model of the application. By systematically covering these threats, security checklists can be of great help in enhancing the general security stance of software, libraries, and web applications.

In both academic and industrial settings, security checklists have been used to implement code security by developers in a well-structured approach to address the most common vulnerabilities. However, it is still not clear whether such checklists really improve the security of the code without affecting other code-related issues such as quality, performance, and maintainability. It is also important to note that this is pretty hard for inexperienced developers to handle, especially for those with limited experience or when secure practices sometimes complicate the codebase. The situation presents an important consideration of a balance between stringent security and the practicality of keeping code maintainable and efficient.

The Flask Framework and Its Security Considerations

Flask, a lightweight Python web framework, which falls in the subcategory of microframeworks, is often used in academic settings to teach web development due to its simplicity and flexibility of this library. While Flask provides many built-in features to aid development, it also presents security challenges, particularly for novice developers. Flask does not automatically enforce secure coding practices, leaving developers responsible for implementing features like input validation, secure password hashing, and protection against common attack vectors. This makes Flask an ideal candidate for studying the effectiveness of security checklists, as it requires developers to be mindful of security concerns from the very start.

In the context of this research, Flask was chosen as the framework for the students' web applications due to its popularity in such settings and its potential for exposing security vulnerabilities when not used properly. By evaluating the impact of security checklists on Flask-based applications, this research aims to determine the effectiveness of these checklists in improving both the security and the quality of students' code.

Research Motivation

This research is motivated by the increasing relevance of security in the development of web applications and the recognition that security vulnerabilities, if not addressed early in the development process, can have long-lasting and severe consequences. The application of security checklists in this research is aimed at providing insight into how novice developers can be guided to write more secure code while paying attention to code quality. Additionally, this research covers security versus code complexity trade-offs, which have important lessons for educators and industry practitioners who are seeking to incorporate security best practices into their software development processes.

Objectives

The primary aim of this study is to investigate, as the title suggests, the impact of general security checklists on secure development practices and code quality in novice developers. To achieve this objective, the following steps were undertaken:

Literature Review

A literature review on secure development practices and code review was conducted. This laid out the foundational work that was used to inform the subsequent phases of this research.

Our literature review started with the analysis of Simona R.'s presentation, "Kodėl būtinas atsargumas naršant internete?", which in translation means "Why is caution necessary when surfing the internet?". This source allowed having a glance at common security threats like XSS and SQL injection, and also gave some conceptual framework for analyzing these kinds of vulnerabilities in the context of attacks. From the basic knowledge gleaned, other secure coding practices and methodologies were initiated.

In composing a comprehensive checklist about software security development, different references were tapped into. The general principle of secure coding in software was found through the OWASP Developer Guide; this indeed contains all the standards and best methods pertaining to software security across the development cycle, as was supported by the NIST Special Publication 800-218 (Secure Software Development Framework version 1.1). Further, NIST FIPS-140-2 set forth key security requirements, while the SANS Institute's SANS-389 documented provided practical approaches toward secure coding. These combined provided a broad understanding of how to identify, review, and mitigate security risks in software systems.

At this point, we also overviewed different vulnerabilities within popularly used software applications to get an understanding of their root causes and implications. While this provided helpful background on how vulnerabilities develop, it was less directly applicable to our research questions. Similarly, we tried to bring in knowledge of system administration in order to better inform our studies on managing systems securely. Again, this knowledge was more generally useful rather than directly contributing to the specifics of our research focus, however, still helped to guide us through the landscape of vulnerability research.

Experiment Design

An experimental approach was employed, wherein students were tasked to create a web application using Flask. The experiment involved several key components:

  1. Formation of the student groups: Students were organized into testing groups for individual work to perform more diverse research on the students.
  2. Development of the security checklists: Two checklists, a short and a more comprehensive one, were created to improve security and quality of code if followed correctly.
  3. Research task creation: Students received a set of 10 bullet points outlining the desired functionalities of the web application, while implementation specifics were left to their discretion.
  4. Code testing: The students' code underwent static, dynamic, and experimental testing to evaluate its security, quality, and functionality.

Results Analysis

Following the completion of the experiment, the results were analyzed through the following steps:

  1. Comparison of pre- and post-checklist code: A thorough review of the students' code before and after they were provided with the security checklist was conducted.
  2. Statistical compilation: The findings were compiled into statistical data to quantify the effects of the checklist on secure development practices and code quality.
  3. Conclusions: Based on the analysis, conclusions were drawn regarding the effectiveness of security checklists in enhancing secure development and code quality among novice developers.

Documentation

Finally, all statistical findings and conclusions were compiled into a comprehensive research paper in detail, which you are now reading.

Methodology

This research employs a mixed-methods approach in analyzing students' programming practices; both manual and automated analyses were utilized. The automated tools to be used include Pyright (for Software Quality Assurance), Pylint (for Software Quality Assurance), Bandit (for Static Application Security Testing), and OWASP ZAP (for Dynamic Application Security Testing), which helped in the identification of probable issues within the code. However, there was more emphasis on manual analysis for the derivation of more nuanced context-aware insights from the collected submissions of code. This is done for the simple reason that manual statistics can capture minute details about "code smells" and security vulnerabilities which may not be possible or captured through automated outputs.

Research Stages

The research was broadly conducted in two stages:

  1. In the initial phase, students were tasked with creating a web application using the Flask web microframework. Importantly, they received no guidance regarding implementation details or best practices, allowing for an authentic assessment of their coding approaches and decision-making processes in development.
  2. After the students had completed their web applications (sleep tracking systems), they were provided with either one of two different security checklists (depending on their assigned research group) for review and refinement of their code. This stage assessed how structured security guidance given impacted their coding and how vulnerabilities could be identified or their awareness thereof.

Manual Analysis Criteria

The criteria for manual analysis focused on identifying various types of vulnerabilities, including but not limited to:

  • Static secret keys and passwords.
  • Lack of input validation.
  • XSS vulnerabilities.
  • Lack of CSP (Content Security Policy) and Anti-CSRF (Cross-Site Request Forgery) tokens.
  • Flaws in password storage and handling.
  • SQL injection.
  • Inadequate error handling.
  • Bad cryptographic practices.
  • Misconfiguration problems.

These criteria were systematically applied to ensure a comprehensive evaluation of the code quality and security practices of the students' submissions.

Study Group Composition

The study involved four university students, divided into two independent research groups. Each student was assigned one of two distinct security checklists designed to guide their programming practices:

  1. Short checklist: This is a 32-item list, totaling 2.4 KB in size, and includes quick, concise directions to help prevent vulnerabilities. Each item lists a brief directive along with the corresponding vulnerability it helps prevent. Example: "Validate and sanitize all user inputs. (XSS, SQL injection)"
  2. Extended checklist: This is a 56-item checklist that stretches over 10 KB. It asks specific questions about the code the user has implemented and gives suggestions for trying to avoid problems. Each of the items also lists the potential vulnerabilities. Example: "Are all user inputs validated and sanitized both on the back-end and front-end? For instance, verify the usage of regex patterns for validation of email formats. (SQL injection, XSS)"

A short form of the checklist was compiled as a derivative of the extended one, including minimalistic content while retaining all the essential considerations in security. The extended checklist is a detailed framework that ranges over a large spectrum of vulnerabilities from out of bounds access to XSS attacks, applicable in front-end and back-end development as well as general administration practices.

Short Checklist

  • Validate and sanitize all user inputs. (XSS, SQL injection)
  • Use parameterized queries and prepared statements. (SQL injection)
  • Perform output encoding for dynamic content. (XSS)
  • Make use of Content Security Policy (CSP). (XSS, data injection)
  • Implement proper authorizations for user actions. (Authorization)
  • Enforce strong password policies. (Brute force attacks, account takeover)
  • Implement anti-CSRF tokens. (CSRF)
  • Set SameSite attribute for cookies. (CSRF)
  • Encrypt or hash sensitive data in transit and at rest. (Data breach)
  • Minimize storage of sensitive data. (Data exposure)
  • Audit and harden security configurations. (Misconfiguration, exploitation of default credentials)
  • Keep libraries and dependencies up-to-date. (Outdated library vulnerabilities)
  • Implement comprehensive logging. (Undetected attacks/vulnerabilities)
  • Configure alerts for suspicious activities. (Brute force attacks, unauthorized access)
  • Continuously monitor for vulnerabilities. (Undetected vulnerabilities)
  • Regularly back critical data up. (Data loss)
  • Test data backup processes. (Data loss)
  • Follow secure coding guidelines. (Various vulnerabilities)
  • Set permissions to least privilege wherever possible. (Privilege escalation)
  • Remove unused functionality before deployment. (Attack surface exposure)
  • Use peer-reviewed open source cryptographic modules. (Weak cryptography)
  • Protect cryptographic keys properly. (Key exposure)
  • Use strong cryptographic functions for handling of data from strong crypto systems. (Weak cryptography)
  • Restrict uploads/data to required file MIME types. (Malicious file upload)
  • Scan uploaded/sent files/data for malware. (Malicious file upload)
  • Authenticate users before allowing uploads. (Malicious file upload)
  • Disable execution privileges for upload directories. (Remote code execution)
  • Isolate development environments from production. (Data exposure, unauthorized access)
  • Apply security patches when possible. (Exploitation of known vulnerabilities)
  • Automate vulnerability scans. (New vulnerabilities)
  • Ensure that the reverse proxy and the firewall are correctly configured. (Information disclosure)
  • Ensure application security even if the source code is public. (Data exposure)
  • Pass security and linting checks with common tools. (Various vulnerabilities and style)

Extended Checklist

  • Are all user inputs validated and sanitized both on the back-end and front-end? For instance, verify the usage of regex patterns for validation of email formats. (SQL injection, XSS)
  • Are database interactions done through parameterized queries? In SQL queries use prepared statements to avoid injections. (SQL injection)
  • Does output encoding take place at every instance of dynamic content? Make use of HTML encoding functions to escape the user-generated content so that it cannot cause XSS. (XSS)
  • Is CSP applied to limit script execution sources? Set a CSP header only allowing scripts from trusted domains. (XSS, data injection)
  • Are proper authorizations in place for every action taken by users? Always check user roles and session cookies before allowing any user or admin functionality. (Authorization flaws)
  • Are indirect object references applied wherever applicable? In URL's use GUID's or any other unique identifiers instead of incremental IDs. (IDOR)
  • Does the application have anti-CSRF token? Create a unique token for each and every form submission which must be validated on server-side. (CSRF)
  • Does an application set the SameSite attribute for cookies? Set SameSite=Strict or SameSite=Lax for session cookies. (CSRF)
  • Is multi-factor authentication implemented? Require temporary TOTP code and/or verification email in order to avoid issuance of session tokens to unauthorized people. (Account takeover)
  • Strong password policies are in place and require passwords to be at least 18 characters in length, mixing letters, numbers, and symbols. (Brute force attack)
  • Sensitive data is encrypted in transit/rest. Use TLS for transport of data, use an extra layer of encryption for more sensitive data, never store passwords in plain text. (Data breach)
  • Sensitive data is minimized. Where possible, store only the last four digits of the credit card number and NOT the full number. (Data exposure)
  • Are the security configurations audited? Perform quarterly server configuration audits against security benchmarks. By default block all ports, check for rootkits, ensure remote access configuration is secure. (Misconfiguration, basic security)
  • Are default settings reviewed and hardened? Change default passwords and disable unnecessary services on servers. (Misconfiguration, Exploitation of default credentials)
  • Are the libraries and dependencies up-to-date? Automate dependency updates using Dependabot, etc. (various outdated library vulnerabilities)
  • Is there a process to continuously monitor vulnerabilities? Send vulnerability alerts by subscription or with services such as Snyk. (Undetected vulnerabilities)
  • Does it provide comprehensive logging, which includes capturing critical actions? Log all login attempts, including those failed, with timestamp and IP. Utilize software such as fail2ban to avoid mass brute-force attempts and block IP addresses. (Brute force attacks, Undetected attacks/vulnerabilities)
  • Does your application implement measures to prevent botting? Implement mechanisms like CAPTCHA to avoid spam and botting attempts. (Brute force attacks, data control)
  • Are there configured alerts for suspicious activities detected within logs? Configure alerts on multiple failed login attempts from the same IP address. (Brute force attack, unauthorized access)
  • Is user-supplied URL validated against a whitelist? Only allow URLs matching specific patterns or domains. (Open Redirect, SSRF)
  • Are outgoing requests monitored for abnormalities? Rate limiting on outgoing requests will help in identifying SSRF attacks. (SSRF)
  • Are third-party services checked for security compliance before integration? Review of the security policy and implication of third-party APIs before their use. (Third-party risks)
  • Are backups of critical data done regularly and tested for restoration? Schedule daily backups and conduct quarterly restoration tests. (Data loss)
  • Are secure coding guidelines followed? Ensure that your code using generally accepted security practices and error checking. (Various vulnerabilities)
  • Is your application well-defined, with roles and responsibilities for each part? Ensure your application does not handle everything in one place which can lead to error-prone code. (Design flaws, code smells)
  • Are permissions set to least privilege? Restrict applications, processes, and service accounts to only what they need to function. (Privilege escalation)
  • Are elevated privileges managed appropriately? If the application needs to operate with increased privileges, increase them as late as possible and decrease them as soon as possible. (Privilege escalation)
  • Have you removed functionality not required before deploying? Make sure all features and files unused are removed in order to minimize attack surfaces. (Attack surface exposure)
  • Is your security configuration store human-readable? This allows for auditing and helps maintain the security configurations transparently. (Misconfiguration)
  • Are the development environments isolated from the production? Provide access only to the authorized development and test groups to minimize the risks. (Data exposure, unauthorized access)
  • Do you have a software change control system? This is going to manage and track the changes of the code within development and production. (Change management issues)
  • Is your cryptographic module peer-reviewed open source? Using trusted cryptography increases the level of security. (Weak cryptography)
  • Are crypto-functions done on a trusted system? Ensure all cryptography operations are performed in a trusted environment. (Weak cryptography)
  • Do your cryptographic modules fail securely? They shall be designed to prevent access by an unauthorized party during failure. (Data exposure)
  • Are random elements created using approved methods? Use a cryptographic random number generator for all random data generation. (Predictable random values, cryptographic insecurity)
  • Are your cryptographic modules in compliance with current standards, such as FIPS 140-2? Compliance means your cryptography meets recognized security benchmarks. (Weak Cryptography)
  • Is there a policy in place for key management? Clearly detail how keys will be generated, stored, rotated, and destroyed. (Key Management Vulnerabilities)
  • Are secret keys properly protected from unauthorized access? Store keys in a secure vault or hardware security module. (Key Exposure)
  • Are uploads restricted to required file types? Only allow specific types of files and perform a file header check rather than relying on extensions. (Malicious file upload)
  • Does your application authenticate a user before uploading? Prevents an attacker from uploading a malicious file that they might subsequently access or execute. (Malicious file upload)
  • Are execution privileges disabled for upload directories? Will minimize the risk of a user uploading some scripts to be executed by a user. (Remote code execution)
  • Do you allow-list file names and types? Use an allow-list to reference existing files to avoid unauthorized access or execution of files. (Malicious file upload)
  • Does the client not see the absolute file path? Never disclose internal paths; application files should be read-only by the client. (Information disclosure)
  • Are the uploaded user files scanned for malware? Scanning can be put in place for the detection and mitigation of potential threats from uploaded content. (Malicious file upload)
  • Have you classified all the data sources? Classification of data as trusted and untrusted helps a great deal in carrying out efficient input validation. (Input validation issues)
  • Is each input data from an untrusted source validated? Make sure the input string corresponds to expected formats and constraints to avoid injection attacks. (Injection attacks)
  • Is canonicalization used while validating inputs? This will make sure that the format of input provided is standardized to help defeat obfuscation attacks. (Injection attacks)
  • Are threat models made as part of design? It involves identification of potential threats and vulnerabilities at an early phase in a project. (Understanding your threat model)
  • Are security patches to software and systems applied in a timely manner? Devise a regular schedule to apply updates and patches within a week of the release. (Exploitation of known vulnerabilities)
  • Is the security software installed on every device? Ensure each has antivirus and anti-malware solutions installed and current, such as ClamAV. Firewalls and log monitoring should be installed. (Malware infection)
  • Are scans for vulnerabilities and abnormalities automated on a regular schedule? Perform monthly scanning of applications and systems for the opening of new vulnerabilities. Access to log files and usage of unusual resources. (New vulnerabilities)
  • Is reverse proxy correctly configured, is the reverse proxied application correctly configured to run in a Production environment? Ensure application is in production mode over debug mode so it will stop leaking sensitive information along with debug logs. (Information disclosure)
  • Would your application be secure even if everything, but the secret keys protecting the data and application, were public? Make sure your application, even if it was open source, is secure, no matter how open the source is. (Data exposure)
  • Does your application pass security and linting checks of common sanity checkers and linters such as pylint, pyright, pyflakes, bandit, and safety? Such tools are useful to ensure your code complies to utmost security standards. (Various vulnerabilities and style)
  • Does your testing framework implement static and dynamic code analysis? Ensure better security by using various types of code analysis. (Testing)
  • Does your application ensure that buffers are protected, boundaries are checked, and buffer overflows are catchable? If not, implement ways to secure allocated buffers and add boundary and limit checks. (Data handling, exposure, and overflow)
  • Are cryptographically secure keys used for access of systems over passwords? Ensure security and convenience by using keys over passwords. (Secure access)

Task

Students were asked to create a web system for sleep tracking using Python and the Flask framework that performs the following functionality:

  1. Registration: Register users through a form containing email, password, user description, and a checkbox that confirms the acceptance of system policies. The data filled in this form should be stored in a database while granting regular user rights upon registration.
  2. Login: Create user login with a landing page which then shall ask for email and password. If the credentials match with the database, then login and redirect the user to the dashboard.
  3. Dashboard: Create a dashboard with menu items: logout, add record, view statistics, search my results, and system users. This last menu will show up only for admin users.
  4. Logout: Allow the logging out of the user. It should take the user to the landing page upon selection.
  5. Record addition: Design a form for users to input their sleep record, including date, hours slept, dreaming or not, and the quality level of sleep. Store such information in the system.
  6. Statistics: Create the functionality for viewing user-specific and total average sleep statistics.
  7. Search: Provide the searching facility for users to get sleep results for any given date and the last nine preceding days.
  8. Administrators: Admin users are supposed to be able to view the other users and should have the ability to delete/promote these same ones through delete and promote user options.
  9. Delete user: System shall remove users from the database after clicking delete.
  10. Promotion: Provide functionality for promoting users to admin which should update the rights in the database for this user.

Each functionality may have vulnerabilities due to implementation details. We left it to the students to assess the security of their implementations initially when tasked to create this web application, later, in Stage II, they were supplied one of the two general security checklists to guide them in their refinement.

Methodological Rigor

To minimize methodological variability in the study's results, each participant received only one type of checklist, without additional information other than that contained in the checklists. This was done by design to minimize exogenous variables.

Results

This section presents the findings from both stages of the research, which involved contributions from four students.

Baseline

  • 4 independent students.
  • 2 security checklists - 2 study groups.
  • Groups:
    • A, B - short checklist.
    • C, D - extensive checklist.

Quantitative Results

Together, the students produced a total of 1,765 lines of Jinja2 markup, 1,682 lines of Python code (Flask back-end), and 316 lines of CSS. The combined output from both stages amounted to approximately 404KB of content, as measured using the du command in Linux.

In the first stage, the students generated 877 lines of HTML, 711 lines of Python, and 158 lines of CSS, resulting in approximately 184KB of content. The second stage slightly exceeded these figures, with 888 lines of HTML, 971 lines of Python, and 158 lines of CSS, culminating in approximately 220KB of content.

The growth ratios across the two stages reveal a modest increase of 0.6% for HTML and a more significant rise of 15.5% for Python, while CSS remained unchanged with a growth ratio of 0%. Overall, the projects' size increased by 22.2%.

All growths were calculated using the formula:

(b - a) / (b + a)

Automatic Code Analyzers' Results

Tools used:

  1. Pyright in strict type checking mode: Pyright's strict mode activates the most comprehensive type-checking rules, enforcing rigorous type inference and diagnostics to ensure code quality and reduce potential runtime errors in Python applications.
  2. Pylint: a static code analysis tool for Python that evaluates code for errors, enforces coding standards, and identifies code smells, providing suggestions for refactoring without executing the code.
  3. Bandit: a security-focused tool that scans Python code to identify common vulnerabilities and security issues, helping developers improve the security posture of their applications.
  4. OWASP ZAP's automated scan: an open-source web application security scanner designed to dynamically find security vulnerabilities in web applications during development and testing phases.

Versions:

  1. Python: 3.13.1
  2. Pyright: 1.1.391
  3. Pylint: 3.3.3
  4. Bandit: 1.8.2
  5. OWASP ZAP: 2.15.0

Results:

  • Stage I
    1. Pyright (typeCheckingMode=strict)
      1. A: 65 errors, 0 warnings, 0 informations
      2. B: 33 errors, 0 warnings, 0 informations
      3. C: 72 errors, 0 warnings, 0 informations
      4. D: 65 errors, 0 warnings, 0 informations
      • Average: 58.75 errors per project
    2. Pylint
      1. A: 8.17/10
      2. B: 7.61/10
      3. C: 6.59/10
      4. D: 5.87/10
      • Average: 7.06/10
    3. Bandit
      1. A: 0 undefined, 2 low, 0 medium, 1 high
      2. B: 0 undefined, 1 low, 0 medium, 1 high
      3. C: 0 undefined, 1 low, 7 medium, 1 high
      4. D: 0 undefined, 2 low, 0 medium, 1 high
      • Average: 0 undefined, 2 low, 2 medium, 1 high
    4. OWASP ZAP (automated scan)
      1. A: 3 medium, 4 low, 4 informational
      2. B: 2 medium, 2 low
      3. C: 3 medium, 3 low, 3 informational
      4. D: 4 medium, 3 low, 4 informational
      • Average: 3 medium, 3 low, 3 informational
  • Stage II
    1. Pyright (typeCheckingMode=strict)
      1. A: 69 errors, 0 warnings, 0 informations
      2. B: 59 errors, 0 warnings, 0 informations
      3. C: 96 errors, 0 warnings, 0 informations
      4. D: 107 errors, 0 warnings, 0 informations
      • Average: 82.75 errors per project
    2. Pylint
      1. A: 8.00/10
      2. B: 6.47/10
      3. C: 7.70/10
      4. D: 4.98/10
      • Average: 6.79/10
    3. Bandit
      1. A: 2 low, 0 medium, 0 high
      2. B: 1 low, 0 medium, 0 high
      3. C: 1 low, 0 medium, 0 high
      4. D: 1 low, 0 medium, 1 high
      • Average: 1 low, 0 medium, 0 high
    4. OWASP ZAP (automated scan)
      1. A: 4 medium, 4 low, 5 informational
      2. B: 5 medium, 4 low, 7 informational
      3. C: 5 medium, 7 low, 7 informational
      4. D: 5 medium, 7 low, 7 informational
      • Average: 5 medium, 5 low, 7 informational

Manual Code Review

  • Stage I
    1. A
      • Weaknesses
        • Static and exposed credentials as well as keys. Violates the open security model.
        • Lack of input validation.
        • Non-reuse of connection leads to worsened performance.
        • Lack of type hints and documentation leads to worse maintainability and clarity.
        • Lack of type hints also leads to more error prone code.
        • Lack of early-exit, trap conditions.
        • Does not escape template literals in templates, making it vulnerable to XSS.
        • Storing of the user role (admin status) in the session rather than fetching from the database/hybrid approach.
        • Does not make use of Anti-CSRF tokens.
        • Depends on the client for management of the login. Once the session is leaked, there is no way to revoke it.
        • Improperly indicates user errors by not returning the proper return code.
        • A minimal Flask configuration which can be hardened.
        • Minimal, inadequate error handling.
        • debug=true in production.
      • Strengths
        • Modularity by using blueprints.
        • Separated out database connection logic.
        • Makes use of secure password hashing and storage.
        • Uses dynamic redirects (url_for) over static in-place URLs.
        • Uses built-in SQL functions like AVG() over manually calculating the average, allowing it to be executed in C.
        • Uses parameterised SQL queries, mitigating SQL injection.
    2. B
      • Weaknesses
        • Static and exposed credentials as well as keys. Violates the open security model.
        • Lack of input validation.
        • Lack of type hints and documentation leads to worse maintainability and clarity.
        • Lack of type hints also leads to more error prone code.
        • Lack of early-exit, trap conditions.
        • Does not escape template literals in templates, making it vulnerable to XSS.
        • Storing of the user role (admin status) in the session rather than fetching from the database/hybrid approach.
        • Does not make use of Anti-CSRF tokens.
        • Stores the passwords in plain-text in the database over using a secure hashing function.
        • Depends on the client for management of the login. Once the session is leaked, there is no way to revoke it.
        • A minimal Flask configuration which can be hardened.
        • Minimal, inadequate error handling.
        • debug=true in production.
      • Strengths
        • The database connection is being reused.
        • The user errors are properly accompanied by the correct HTTP code.
        • Uses dynamic redirects (url_for) over static in-place URLs.
        • More appropriately used short-circuit statements.
        • Uses parameterised SQL queries, mitigating SQL injection.
    3. C
      • Weaknesses
        • Static secret key of the Flask application.
        • Lack of input validation.
        • Non-reuse of connection leads to worsened performance.
        • Lack of type hints and documentation leads to worse maintainability and clarity.
        • Lack of type hints also leads to more error prone code.
        • Does not escape template literals in templates, making it vulnerable to XSS.
        • Storing of the user role (admin status) in the session rather than fetching from the database/hybrid approach.
        • Does not make use of Anti-CSRF tokens.
        • A minimal Flask configuration which can be hardened.
        • Stores the passwords as plain-text in the database.
        • Depends on the client for management of the login. Once the session is leaked, there is no way to revoke it.
        • The SQL queries executed are templated using format strings, leaving it vulnerable to SQL injection.
        • The database connection is not being reused.
        • No error handling.
        • debug=true in production.
      • Strengths
        • It still has the problem of static as well as exposed credentials and somewhat violates the open security model, the credentials are separated into config.py, which we can deem secret. Though, environment variables would be better.
        • Uses dynamic redirects (url_for) over static in-place URLs.
        • More appropriately used short-circuit statements.
    4. D
      • Weaknesses
        • Static and exposed credentials as well as keys. Violates the open security model.
        • Lack of input validation.
        • Lack of type hints and documentation leads to worse maintainability and clarity.
        • Lack of type hints also leads to more error prone code.
        • Storing of the user role (admin status) in the session rather than fetching from the database/hybrid approach.
        • Does not make use of Anti-CSRF tokens.
        • Depends on the client for management of the login. Once the session is leaked, there is no way to revoke it.
        • Calls to 3rd-party libraries without mirroring them, which can lead to a compromise.
        • A minimal Flask configuration which can be hardened.
        • Uses MD5, a deprecated and considered insecure hashing function.
        • debug=true in production.
      • Strengths
        • The database connection is being reused.
        • The user errors are properly accompanied by the correct HTTP code.
        • Uses dynamic redirects (url_for) over static in-place URLs.
        • More appropriately used short-circuit statements.
        • Uses parameterised SQL queries, mitigating SQL injection.
  • Stage II (improvements only)
    1. A
      • Added user input validation.
      • Began making better use of early-exit conditions.
      • Templates now correctly encode the dynamic parts of the templates, alleviating the risk of XSS.
      • Added better, more comprehensive error handling.
      1. Overall improvement: Minimal. Inadequate review of the checklist, however, there are some improvements.
      2. Checklist elements applied: 7/32
    2. B
      • Separated database credentials into a separate hashmap, which we can inherit that would be programmatically filled in, not manually. The application secret key is still static.
      • Added comprehensive user input validation.
      • Began making better use of early-exit conditions.
      • Templates now correctly encode the dynamic parts of the templates, alleviating the risk of XSS.
      • Added CSP and Anti-CSRF tokens to the application.
      • Added support for secure hashing algorithms, only the password hash being stored in the database.
      • Added much more comprehensive error handling.
      • Removed debug mode in production.
      1. Overall improvement: Great. Checked every point and clearly put in effort to improve the application, resulting in better security.
      2. Checklist elements applied: 26/32
    3. C
      • It still has the problem of static as well as exposed credentials and somewhat violates the open security model, the credentials are separated into config.py, which we can deem secret. Though, environment variables would be better.
      • Added comprehensive user input validation.
      • Began making better use of early-exit conditions.
      • Data is now encoded and stored encoded in the database, making it safe to embed into the template without a risk of XSS.
      • Added support for secure hashing algorithms, only the password hash being stored in the database.
      • Hardened the default Flask (session) configuration.
      • Added error handling.
      • Removed debug mode in production.
      • Added SSL.
      1. Overall improvement: Moderate. Part of it is laziness as stated in the checklist review, which is understandable considering the checklist length.
      2. Checklist elements applied: *17/56
        • Theoretical applied: 31/56
    4. D
      • Separated out the relevant credentials and generates the application secret key at runtime using secrets - a cryptographically secure randomness generator.
      • Added comprehensive user input validation.
      • Added CSP and Anti-CSRF tokens.
      • Hardened the default Flask (session) configuration.
      • Added more secure password hashing, removing MD5.
      • Separated logic, making the application mode modular and easier to manage and improving code quality significantly.
      • Adding explicit logging, saved to app.log, using the logging module.
      • Strong password policies are now enforced.
      • The database connection is being reused, leading to better performance.
      • Explicitly added 3rd party to CSP.
      1. Overall improvement: Near perfect. Most of the flaws were alleviated besides some caveats for code quality, however, securitywise, it is majorly improved.
      2. Checklist elements applied: 38/56

Analysis of the Results

One of the most pervasive issues in the projects during Stage I of the research was the poor handling of sensitive data: hard-coded credentials, static secret keys, and improperly handled passwords. These were present in all four projects and directly betrayed the open security model by exposing sensitive data. Most of these projects adopted more secure practices after the application of the security checklists in Stage II. Even though the projects did not show major improvement in the mere storage of the credentials, the credentials were separated out of most of the code. Proper password hashing was also implemented for more secure password storage by only storing the hash digest of the password. The introduction of a secure password hashing function significantly mitigated the risks from plaintext password storage and weak hashing algorithms like MD5.

Input validation was another critical area that needed much attention because initially, most of the projects did not validate the inputs given by users; this could result in possible injection attacks like SQL injection, Cross-Site Scripting, and other injection or input validation attacks. After following the security checklists, the projects improved their input validation practices significantly, which also improved the error handling as well.

Another common weakness was the lack of CSRF protection in the original projects. This made the applications vulnerable to CSRF attacks, where malicious users could trick authenticated users into performing unintended actions. After the implementation of the checklist, 50% of the projects in Stage II added Anti-CSRF tokens to their web applications using Flask-WTF, hence mitigating this threat. Some projects also implemented a content security policy to avoid XSS attacks and make sure that only trusted resources could run within the web application in the front-end.

Among the key takeaways for the students from the manual reviews of these projects, there came out the much-improved default settings of Flask which were addressed. Moreover, secret keys and credentials were not only stored in a more secure way, but performance was also optimised by reusing database connections. The improvements and quality of them also directly correlated to how thoroughly the checklists were followed. Although far from perfect, project D was substantially more secure compared to its initial state and showed a very good understanding of both security principles and best coding practices in Stage II.

Baseline data shows that indeed all the projects had very serious security problems before the application of the security checklists. The average number of original projects was 58.75 errors per project according to Pyright. Moreover, for Pylint, the average rating of the original projects was 7.06/10, and the lowest score belonged to D, which was 5.87/10. The Bandit scan also revealed an average vulnerability count of 0 undefined, 2 low, 2 medium, and 1 high in severity for the four projects. The OWASP ZAP automated scan reported a total average of 3 medium, 3 low, and 3 informational vulnerabilities per project.

Once the security fixes had been applied, several of the statistics shifted significantly. In many cases, the automated scan results were worse or no better than the original findings. While counter-intuitive, this is not unusual as systems become increasingly complex, true review requires sophisticated understanding of context to understand the code and make educated assessments on the project's source code. However, Bandit's results improved for most of the projects, which correlates well with the mitigated security flaws. The security enhancements recommended by the checklists added complexity to the applications, and complexity often begets an increase in lint-time errors. The added complexity is not only supported by the lint-time error count related to code quality, but also the 20% code size increase as demonstrated above. It is worth pointing out, though, that even while these are not directly security-related issues, better code quality - in the sense of clean, organised, and optimized code - usually means fewer security bugs and more efficient mitigations; a fact to which the results of this project proved no exception.

Furthermore, manual reviews revealed that there was a very strong correlation between the applied security checklist items on improvements in the security of the projects. The most significant improvements directly addressed major vulnerabilities on sensitive data management, input validation, and password security. The projects that followed fewer items in the checklist demonstrated very modest improvements. While basic security measures were implemented, like password hashing and input validation, the lack of comprehensive changes resulted in only minimal improvements. In contrast, project D applied 38 out of 56 checklist items and showed the most robust improvement: more secure session management, improved error handling, and enhanced cryptographic as well as general security and programming practices.

Lastly, while originally we found the following common security vulnerabilities prevalent in the code in Stage I:

  1. Project configuration and security.
  2. Input validation.
  3. Code structure and quality.
  4. Session and authentication.
  5. Cryptographic practices.
  6. Template security.
  7. Coding standards and practices.

In the new iteration of code, these were significantly less detectable, and much better mitigated than before. This leaves us with a conclusion that our checklists had positive outcomes in security, however, does not mean that it improved the code quality overall.

Key Findings

Key findings included:

  1. There were critical issues in Stage I were related to sensitive data management, such as hard-coded credentials, static secret keys, and poor password handling. All projects suffered from the problem. However, by Stage II, improvements could be noticed with the security checklists.
  2. Not having input validation is very hazardous and leaves space for a project to be vulnerable in conditions like SQL injections and Cross Site Scripting. The security checklist helped to implement better policies in error and user input handling as well as validation.
  3. None of the original projects used protection against CSRF; that means an attacker might force authenticated users to perform unwanted actions. In Stage II, Anti-CSRF tokens have been added to forms in 50% of the projects, which substantially reduced this vulnerability. Content security policies had also been implemented by some to defend against XSS attacks.
  4. The checklists improved the default settings, management of secret keys, and performance optimization such as reusing database connections. Although code quality or security did not improve uniformly for all projects, those projects that followed the checklist more closely, especially Project D, which applied 38 out of 56 items, showed the most robust improvement in secure coding practices and security principles.
  5. Introduction of the security checklists was correlated with improvements in sensitive data management, input validation, password security, session management, error handling, and cryptographic practices. At the same time, these are also the areas where the checklist was applied in increasing code complexity, which boosted the amount of lint-time errors and the code size by 20%.
  6. Pre- and post-checklist scans showed mixed results. While the findings of Bandit improved for most projects, the automated scan results, such as OWASP ZAP, did not improve on average; they even became worse in many cases, which reflects the growing complexity of the systems. This highlights the need to balance security and code quality in the future.
  7. The projects were affected by several common vulnerabilities at the beginning, including poor project configuration, lack of input validation, inadequate session management, and using weak cryptography. By Stage II, these had been significantly minimized - this shows the positiveness of the security checklist towards overall improvement in security, though, sadly, not necessarily on code quality.
  8. The security checklist resulted in quantifiable improvements related to the security posture of the projects, and an adverse indirect effect on the code quality and complexity.

Recommendations

Based on the research findings presented in this study, we recommend a number of improvement recommendations to improve security and reduce vulnerabilities for application development. These are most useful in the case of novice developers who, right from the initial stage of the application development process, wish to ensure a well-formed level of security for their applications. When creating a checklist, consider your security threat model and ensure most common vulnerabilities are mitigated first before getting into specifics.

The most notable things to do to enhance your security, based on the findings, would be to follow the following security model guidelines:

  1. Adhere to established security models
    • Identify potential threats in your system and model your security threat model.
    • Enhance your security posture by adopting zero-trust and open source security models. By treating everything as public and untrusted, you can better safeguard your sensitive information and systems, and not be easily compromised if your system gets partially compromised.
  2. Improve sensitive data management
    • Developers need to train to never hard-code passwords, cryptographic keys, API keys, or any other sensitive information directly into the codebase.
    • Passwords need to be stored using one-way hashing algorithms like Argon2, SHA-2, SHA-3, BLAKE2, or any other cryptographically secure widely-accepted peer-reviewed open-source hashing functions. Periodic review and updating of password hashing is required to keep in line with modern security standards.
    • Train developers to avoid storing sensitive data unless this is strictly necessary and not otherwise.
  3. Improve input validation
    • Implement a zero-trust security model where the client is an untrusted party.
    • Whitelisting shall be implemented on all input validation, where only known good formats shall be accepted where possible.
  4. Implement CSRF and CSP protections
    • Web applications should have Anti-CSRF tokens for every form to prevent Cross-Site Request Forgery attacks.
    • Set up Content Security Policies to provide a defense in depth approach against XSS by whitelisting sources allowed to load content in a web page.
  5. Foster a security-oriented culture
    • Everything that developers do before, during, and after development should be with a thought toward security.
  6. Perform full security checklists
    • All projects should aim to use comprehensive security checklists appropriate to their context.
    • Review the security checklists from time to time in light of emerging threats and vulnerabilities to make updates so that they remain relevant and effective.
  7. Monitoring and continuous improvement
    • Integrate automated tools that continuously scan and monitor code for vulnerabilities throughout the development lifecycle. This includes static analysis tools like Bandit, and dynamic testing tools such as OWASP ZAP.
    • Foster a continuous feedback loop where progress is welcome to improve your application's security.

Conclusion

This research highlights how security checklists in a structured format can significantly help to improve the security posture of novice developers. We have evaluated four projects each in two phases and have noted a considerable decrease in security pitfalls.

In Stage I, there were a lot of security issues, most of which were due to fundamental mistakes like hardcoding credentials and not validating user inputs-exposing the applications to huge risks. However, when a security checklist was introduced in Stage II, many of these issues were mitigated, and notable improvements were achieved in sensitive data handling, password handling security, and input validation practices.

The study also highlighted how the extent of security improvements made was directly related to the thoroughness with which the checklist was applied. In general, projects that implemented more of the checklist items tended to show significantly higher security improvements along with great improvements in the understanding of best security practices. Projects that had fewer applied checklist items also showed notably more modest improvements.

On the other hand, results also reveal that with noticeable improvements in security, application of security mechanisms increased code complexity, which in turn resulted in higher error counts in lint-time and slight code size growth of ~20%. Such increased complexity might compromise code maintainability and overall quality while gaining security. Therefore, a good balance between security and code simplicity needs to be kept, especially in novice developers, as they will be learning how to apply the principles of clean code.

In conclusion, while security checklists greatly improved the security practice of novice developers, there was a trade-off between increased security and increased code complexity. In as much as the introduction of security mechanisms largely improved security, the growth of lint-time errors and code size means that security and simplicity in code need to be in balance for best possible outcomes - most secure, best quality code.

Future Research

Future research in this area should consider expanding the scope of the study to include a larger and more diverse sample size. The sample size in this research was relatively small, and the findings may not be fully representative of the broader population of novice developers.

Moreover, future research should be conducted on more complex applications of security checklists. The projects involved in this research were relatively simple. Most real-life applications introduce multiple layers of complexity, such as integrations with third-party services and different technologies. Studies on how security checklists affect a larger system, such as an enterprise-level application, mobile application, or a cloud-based service, would offer more insight into their applicability and scalability of such checklists. It would be important to understand how such checklists behave on a wide variety of application types, which would point to whether the checklists need to be adapted or fine-tuned for specific use cases truly or is it just a hypothesis.

In addition, research is needed that would compare between generic and specific security checklists. This study applied one general broad checklist, but projects may see better results with checklists that focus on specific threat models. This may be an area of future research: whether checklists tailored for specific security needs have greater benefits to developers in some domains. A comparison of the results from using a generic and a specialized checklist would yield valuable information about the best way to secure various types of applications in practical scenarios.

Another interesting area for future research is how code quality interacts with security. This study has pointed out how security practices, upon implementation, make the code complex, which affected its maintainability and readability. Further research should be done to show how developers can balance making an application really secure without sacrificing code quality that is maintainable and scalable.

Finally, follow-up research is needed to determine whether the improvements in security achieved from the application of such checklists are long-lasting. While the current study has examined the short-term effects of using a security checklist, it is unknown long-term if improvements are maintained as an application evolves or as developers work on further applications.

In closing, the future studies should be done on a larger sample size, considering more complex applications, and generic versus tailored checklists in comparison. Moreover, investigating the balance between security and code quality, the long-term effects that improvements in security have, would not go unnoticed. Such a research study would help to fine-tune the security practices and render them more adaptable within different contexts of development.

Literature and Citations