Papers

Aufgrund meiner langen Zeit, die ich an Universitäten verbringe, habe ich einige Paper gelesen. Insbesondere während meines PhD Studiums. Aber auch davor und danach genieße ich gute Paper. Hier reviewe ich sie systematisch und fasse sie zusammen. Im Folgenden findest du meine Notizen (Notizen auf Englisch!).

Inhaltsverzeichnis:

Papers and notes

A Core Calculus for Documents: Or, Lambda: The Ultimat… §

Titel: “A Core Calculus for Documents: Or, Lambda: The Ultimate Document” by Will Crichton, Shriram Krishnamurthi [url] [dblp]
Veröffentlicht etwa 2025-01-05 in Proceedings of the ACM on Programming Languages, Vol 8, POPL und ich habe es gelesen etwa 2024-12
Zusammenfassung: WILL CRICHTON and SHRIRAM KRISHNAMURTHI, Brown University, USA Passive documents and active programs now widely comingle. Document languages include Turing-complete programming elements, and programming languages include sophisticated document notations. However, there are no formal foundations that model these languages. This matters because the interaction between document and program can be subtle and error-prone. In this paper we describe several such problems, then taxonomize and formalize document languages as levels of a document calculus. We employ the calculus as a foundation for implementing complex features such as reactivity, as well as for proving theorems about the boundary of content and computation. We intend for the document calculus to provide a theoretical basis for new document languages, and to assist designers in cleaning up the unsavory corners of existing languages. CCS Concepts: • Applied computing ! Document scripting languages; • Theory of computation ! Type theory.

quotes

  • “This proliferation of document languages raises foundational questions. What are the common characteristics of document languages? How do they relate? Are existing languages well-designed, or can we identify what appear to be flaws in their design? Given that documents are programs, can we reason about them?” (Crichton and Krishnamurthi, 2024, p. 231)
  • PHP: “The key takeaway: an impure semantics for templates can restrict the ability of authors to design easily-composable document abstractions.” (Crichton and Krishnamurthi, 2024, p. 3)
    React: “The key takeaway: computations over documents can have subtle dependency structures, which easily leads to bugs when combined with reactivity.” (Crichton and Krishnamurthi, 2024, p. 4)
    Scribble: “The key takeaway: the desugaring of templates to terms requires careful scrutiny to understand how it composes with other language features.” (Crichton and Krishnamurthi, 2024, p. 4)
  • “First, we will establish a scope by asking: what is a document, and what is a document language? Within this paper, we consider a document to be “structured prose,” that is, plain text optionally augmented with styles (e.g., bold or italics) or hierarchy (e.g., paragraphs or sections) and interspersed with figures (e.g., images or tables). This definition includes objects like academic papers and news articles, and it excludes objects like source code, spreadsheets, and computational notebooks. It is useful to restrict the scope of documents because (a) many languages are often used to generate objects in the former set, and (b) those languages have commonalities which have not yet been carefully scrutinized via the lens of PL formalism, unlike e.g. spreadsheets [Bock et al. 2020]. We will give a formal definition of structured prose in the ensuing sections” (Crichton and Krishnamurthi, 2024, p. 5)
  • “String template programs are the highest level of the document calculus in the domain of strings. Therefore, we can now proceed by enriching the domain with additional structure. The most common form of structured document is an attributed tagged tree like this:
    Node n ::= text s | node (s, [(s,s)*], n*)” (Crichton and Krishnamurthi, 2024, p. 9)
  • “For example, one form of validity is well-typedness of the input: a document expression e is valid if • ⊢ e: NodeTy list. Another form of validity is well-formedness of the output: a document expression e is valid if e ↦ v and v ∈ Article. In the wild, validity sometimes means parseability: the CommonMark specification for Markdown states that ‘any sequence of characters is a valid CommonMark document’.” (Crichton and Krishnamurthi, 2024, p. 15)
  • “In LATEX, for example, a reference to the next section in this document like \ref{sec:reforesting} will be replaced by the text “4.2”. This operation is non-local, because the document language cannot know the section number of a forward reference at the point of reference.” (Crichton and Krishnamurthi, 2024, p. 16)
  • “‘Markup languages’ have a long history as programming languages for marking up documents that are presented on paper (literally via printing, or metaphorically in a PDF), or presented in the browser. Coombs et al. [1987] developed an early markup theory that distinguished “procedural markup”, or low-level graphical commands, from “descriptive markup”, or high-level structuring of a document. Descriptive markup, later called a “ordered hierarchy of content objects” [DeRose et al. 1997], formed the basis of systems such as SGML [Goldfarb 1990] and HyTime [Newcomb et al. 1991] that would go on to inspire HTML and XML. The article domain of the document calculus defined in Section 3.2 is a model for descriptive markup, the predominant model for document languages used today. (The notable exception to this is Teχ, which evaluates into procedural markup but uses “environments” to attempt to simulate the experience of descriptive markup.)” (Crichton and Krishnamurthi, 2024, p. 23)
  • “In that sense, this paper’s subtitle is deliberately inaccurate: lambda is not the ultimate document. As Olin Shivers wrote, “lambda is not a universally su�cient value constructor,” and that holds true for constructing documents as well. To that end, future work on document languages should investigate the design of syntaxes that trade-o� intuitiveness, error-tolerance, and systematicity” (Crichton and Krishnamurthi, 2024, p. 26)

summary

String {Literal, Program, Template Literal, Template Program}.
Article {Literal, Program, Template Literal, Template Program}.

… corresponds to …

{quoted string, programming languages with string APIs, C printf or python f-strings, C preprocessor}
{CommonMark Markdown, Programming languages with document APIs, JSX Javascript or Scribble Racket, Typst}

reforestation means turning implicit elements like paragraphs into tree nodes

typo

  • “for this purpose by supporting the langauge of regular trees as types” (Crichton and Krishnamurthi, 2024, p. 25)

A Masked Ring-LWE Implementation §

Titel: “A Masked Ring-LWE Implementation” by Oscar Reparaz, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid Verbauwhede [url] [dblp]
Veröffentlicht etwa 2015 in CHES 2015 und ich habe es gelesen etwa 2020/07
Zusammenfassung: Lattice-based cryptography has been proposed as a postquantum public-key cryptosystem. In this paper, we present a masked ring-LWE decryption implementation resistant to first-order side-channel attacks. Our solution has the peculiarity that the entire computation is performed in the masked domain. This is achieved thanks to a new, bespoke masked decoder implementation. The output of the ring-LWE decryption are Boolean shares suitable for derivation of a symmetric key. We have implemented a hardware architecture of the masked ring-LWE processor on a Virtex-II FPGA, and have performed side channel analysis to confirm the soundness of our approach. The area of the protected architecture is around 2000 LUTs, a 20 % increase with respect to the unprotected architecture. The protected implementation takes 7478 cycles to compute, which is only a factor ×2.6 larger than the unprotected implementation.

ambiguity

\[ \operatorname{th}(x) = \begin{cases} 0 & \text{if } x \in (0, q/4) \cup (3q/4, q) \\ 1 & \text{if } x \in (q/4, 3q/4) \end{cases} \]

The intervals apparently should denote inclusive-exclusive boundaries.

error

On page 686, c1 and c2 is swapped erroneously all the time. On page 685, c2 is defined such that it contains the message. Consequently, factor r2 must be applied to c1, not c2. However, page 686 repeatedly multiplies c2 to the different r values.

Masked Decoder table

Sorry, I don't have table support in Zotero.

First line is meant to be read as “If a' is in quadrant (1) and a'' is in quadrant (1), then a is inconclusive (∅) unlike bit 0 or 1”

a' a'' a

1 1 ∅
1 2 1
1 3 ∅
1 4 0
2 1 1
2 2 ∅
2 3 0
2 4 ∅
3 1 ∅
3 2 0
3 3 ∅
3 4 1
4 1 0
4 2 ∅
4 3 1
4 4 ∅

quotes

summary

Nice paper. It is sad that c1 and c2 got mixed up on page 686. The idea to mask is indeed natural with r2 = r2' + r2'' (mod q) and a' := a' + Δ1 for the decoder a'' := a'' - Δ1. Isn't sampling also a problem when masking ring-LWE? If so, the title is inappropriate and should be “A Masked Ring-LWE Decoder Implementation”. The described implementation works for CPA-only. It gives a factor of 2.679 in terms of CPU cycles and 3.2 in terms of runtime in microseconds in the decryption step. With N=16 (maximum number of iterations in the decoder), you get 1 error in about 400 000 bits. This means in NewHope's 256 bit encapsulated keys, you get 1 error in 1420 messages. I did not understand why the masked decoder requires the value of the previous iteration (loop in Figure 3) for a long time. Then I recognized. I don't like the fact that the source code was not published with the paper (→ reproducible research).

typo

A Practical Analysis of Rust’s Concurrency Story §

Titel: “A Practical Analysis of Rust’s Concurrency Story” by Aditya Saligrama, Andrew Shen, Jon Gjengset [url] [dblp]
Veröffentlicht etwa 2019 in und ich habe es gelesen etwa 2021-11
Zusammenfassung: Correct concurrent programs are difficult to write; when multiple threads mutate shared data, they may lose writes, corrupt data, or produce erratic program behavior. While many of the data-race issues with concurrency can be avoided by the placing of locks throughout the code, these often serialize program execution, and can significantly slow down performance-critical applications. Programmers also make mistakes, and often forget locks in less-executed code paths, which leads to programs that misbehave only in rare situations.

quotes

summary

A nice report about the implementation of a concurrent hashmap in rust. The basic primitives of rust are well-explained, but the lock-free design of the hash map is not discussed at all. The relative performance characteristics are presented in a figure without appropriate caption and nice examples are provided for struggles with rust. Overall a nice bachelor/master-level project outcome.

A Provably Secure True Random Number Generator with Bu… §

Titel: “A Provably Secure True Random Number Generator with Built-In Tolerance to Active Attacks” by Berk Sunar, William Martin, Douglas Stinson [url] [dblp]
Veröffentlicht etwa 2007-01 in und ich habe es gelesen etwa 2021-11-08
Zusammenfassung: This paper is a contribution to the theory of true random number generators based on sampling phase jitter in oscillator rings. After discussing several misconceptions and apparently insurmountable obstacles, we propose a general model which, under mild assumptions, will generate provably random bits with some tolerance to adversarial manipulation and running in the megabit-persecond range. A key idea throughout the paper is the fill rate, which measures the fraction of the time domain in which the analog output signal is arguably random. Our study shows that an exponential increase in the number of oscillators is required to obtain a constant factor improvement in the fill rate. Yet, we overcome this problem by introducing a postprocessing step which consists of an application of an appropriate resilient function. These allow the designer to extract random samples only from a signal with only moderate fill rate and, therefore, many fewer oscillators than in other designs. Last, we develop fault-attack models and we employ the properties of resilient functions to withstand such attacks. All of our analysis is based on rigorous methods, enabling us to develop a framework in which we accurately quantify the performance and the degree of resilience of the design.

clarifications and questions

quotes

summary

A very good paper. It specifies the requirements of a True Random Number Generator, builds up the theory with mathematical background and finally presents one particular instance with desired properties.

I struggled seeing the connection to linear codes which is given by Theorem 1, but I assume one just has to follow reference [15] which proves Theorem 1.

A good TRNG relies on {entropy source, harvesting mechanism, postprocessing}. The design uses ring oscillators (quantified by the urn model with prime ring lengths and the Coupon Collector's Problem), a XOR tree and a linear code namely simplex code (justified by the (n, m, t)-resilient function model).

Prior work:

Test suites:

4 gigasamples/sec = 4·109/s corresponds to 4·109s between 2 samples = 4ns

Figure 5 shows a neat network to make the XOR tree more robust by redundancy.

A Public-Key Cryptosystem Based On Algebraic Coding Th… §

Titel: “A Public-Key Cryptosystem Based On Algebraic Coding Theory” by RJ McEliece [url] [dblp]
Veröffentlicht etwa 1978 in und ich habe es gelesen etwa 2021-03
Zusammenfassung:

summary

Certainly a tremendously important result. But from a paper perspective, I think it is sad that it is not self-contained. Goppa codes are left out and a reference to [5] is made. Also the semantics of the Patterson algorithm on the second page are left out. It should at least specify that the algorithm creates u' for a given x.

Sardinas-Patterson algorithm determines whether a given variable-length code is uniquely decodable.

A Replication Study on Measuring the Growth of Open So… §

Titel: “A Replication Study on Measuring the Growth of Open Source” by Michael Dorner, Maximilian Capraro, Ann Barcomb, Krzysztof Wnuk [url] [dblp]
Veröffentlicht etwa 2022-01 in und ich habe es gelesen etwa 2022-04
Zusammenfassung: Context: Over the last decades, open-source software has pervaded the software industry and has become one of the key pillars in software engineering. The incomparable growth of open source reflected that pervasion: Prior work described open source as a whole to be growing linearly, polynomially, or even exponentially. Objective: In this study, we explore the long-term growth of open source and corroborating previous findings by replicating previous studies on measuring the growth of open source projects. Method: We replicate four existing measurements on the growth of open source on a sample of 172,833 open-source projects using Open Hub as the measurement system: We analyzed lines of code, commits, new projects, and the number of open-source contributors over the last 30 years in the known open-source universe. Results: We found growth of open source to be exhausted: After an initial exponential growth, all measurements show a monotonic downwards trend since its peak in 2013. None of the existing growth models could stand the test of time. Conclusion: Our results raise more questions on the growth of open source and the representativeness of Open Hub as a proxy for describing open source. We discuss multiple interpretations for our observations and encourage further research using alternative data sets.

quotes

  • “Prior work described open source as a whole to be growing linearly, polynomially, or even exponentially.” (Dorner et al., 2022, pp. -)
  • “We replicate four existing measurements on the growth of open source on a sample of 172,833 open-source projects using Open Hub as the measurement system: We analyzed lines of code, commits, new projects, and the number of open-source contributors over the last 30 years in the known open-source universe.” (Dorner et al., 2022, p. 1)
  • “Open source has evolved from small communities of volunteers driven by non-monetary incentives to foundations that host large projects and support decentralized innovation among many global industries [13]” (Dorner et al., 2022, p. 2)
  • “Three horizontal, longitudinal studies investigated the growth of open source as a whole: [9] from 2003, [24] from 2007, and in 2008 [11].” (Dorner et al., 2022, p. 2)
  • “In detail, the contributions of this paper are:

    • a detailed discussion of the three prior studies,
    • the multi-dimensional measurements using Open Hub as a measuring system to quantify the growth of open source with respect to lines of code, commits, contributors, and projects, and, thereby,
    • the dependent and independent replication of the measurements by three prior studies.” (Dorner et al., 2022, p. 2)
  • “Study A [9] from 2003 at FreshMeat.net with 406 projects accounted until 2002-07-01
    Study B [24] from 2007 at SourceForge.net with 4,047 projects
    Study C [11] from 2008 at Ohloh.net with 5,122 projects from 1995-01-01 until 2006-12-31
    from Table 1: Three prior [9, 11, 24] studies on the evolution of open source, showing data source, project sample size, and the considered data time frame.”
    (Dorner et al., 2022, p. 3)
  • “Their construction is unclear and, in consequence, we cannot replicate the measurements.” (Dorner et al., 2022, p. 4)
  • “Vitality V is defined by V = R·A/L (1) where—according to description—R is the number of releases in a given period (t), A is the age of the project in days, and L is the number of releases in the period t.” (Dorner et al., 2022, p. 4)
  • “However, a comment is a valuable contribution to a software project. In modern programming languages like Go or Python, comments are directly embedded in the source code for documentation purposes.” (Dorner et al., 2022, p. 7)
  • “Therefore, our measurement is consistent with Hypothesis 1, which states that open source grows (in bytes).” (Dorner et al., 2022, p. 7)
  • “When commit size is calculated as the total number of commits in a period, it follows a power-law distribution [27]” (Dorner et al., 2022, p. 8)
  • “At the time of the data collection (2021-06-04 to 2021-06-07), Open Hub lists 355,111 open-source projects as of 2021-06-06.” (Dorner et al., 2022, p. 10)
  • “After filtering our data set for the time frame from 1991-01-01 to 2020-12-31, 173,265 projects were available for further analysis.” (Dorner et al., 2022, p. 10)
  • “Observation 2: Although initially growing exponentially until 2009, the growth in lines of code has continuously slowed down since 2013.” (Dorner et al., 2022, p. 13)
  • “Observation 4: Although growing exponentially until 2009 and reaching its peak in March 2013 with 107,915 contributors, the number of open-source contributors has, as of 2018, decreased to the level of 2008.” (Dorner et al., 2022, p. 14)
  • “We also encountered similar problems while crawling Open Hub: 181,846 of 355,111 projects do not contain information on the development activity. We also found that the accumulated number of lines added does not fit the measured lines of code. Additionally, we found a large drop in added projects in 2011 we are not able to explain (Figure 5). We speculate that Open Hub could have decreased the number of projects it adds, so that newer projects are under-represented.” (Dorner et al., 2022, p. 16)
  • “In this study, we conducted a large-scale sample study on open-source projects and their cumulative growth. We analyzed the number of developers and their contributions with respect to lines of code, commits, and new projects to the open-source universe. We leveraged Open Hub as a measuring system to measure development activities of 172,833 open-source projects over the last 30 years concerning those four quantities.” (Dorner et al., 2022, p. 19)

replications

  • exact replications

    • dependent: keep all conditions identical
    • independent: vary one or more major aspects

summary

In this study (I read version #5), the authors replicate results from 2003 (Capiluppi, A., Lago, P., Morisio, M., “Characteristics of open source projects”), 2007 (Koch, S., “Software evolution in open source projects—a large-scale investigation”), and 2008 (Deshpande, A., Riehle, D., “The Total Growth of Open Source”) on the OpenHub project index. Open Hub lists 355,111 open-source projects as of 2021-06-06 and the development activity is available for 173,305 projects. The previous studies claimed that Open Source grows w.r.t. byte size (2003), grows quadratically (2007) or exponentially (2008) w.r.t. lines of code as well as exponentially w.r.t. projects (2008).

For the 2003 study, the authors remark “their construction is unclear and, in consequence, we cannot replicate the measurements”. The status (e.g. “beta”, “mature”) is also determined in an unclear way. The 2003 study’s lack of specification regarding the magnitude is also weak (“grows”?!).

In essence, the data sources from 2003 and 2007 are not publicly available anymore. As such the statements need to be replicated with new data for an extended timeline. The result is that the authors recognize a peak around 2010 and 2013 and a steady decline afterwards in various parameters (size in bytes, LoCs, # of projects, new projects).

The paper is well-written and the intention is clear. A little more thorough investigation around the peaks from 2010 and 2013 could be done since they occur in multiple parameters (c.f. Fig. 2 and Fig. 3) and thus seem significant. I suggest to validate against hypotheses that activities around Github or the Linux foundations influenced the way how projects are maintained. On page 16, it is mentioned that 181,846 projects do not contain information on the development activity. It should be explicitly pointed out that those are not included in the respective figures (at least this is what I assume).

The paper shows how bad the situation regarding empirical replication of data in software development is. On the other hand, it shows how it improved because version control and public availability of data improved. My personal summary is just that Open Source changed over time and since 2013 in particular.

A Side-channel Resistant Implementation of SABER §

Titel: “A Side-channel Resistant Implementation of SABER” by Michiel Van Beirendonck, Jan-Pieter D'Anvers, Angshuman Karmakar, Josep Balasch, Ingrid Verbauwhede [url] [dblp]
Veröffentlicht etwa 2020 in IACR eprint 2020 und ich habe es gelesen etwa 2020-11
Zusammenfassung: The candidates for the NIST Post-Quantum Cryptography standardization have undergone extensive studies on efficiency and theoretical security, but research on their side-channel security is largely lacking. This remains a considerable obstacle for their real-world deployment, where side-channel security can be a critical requirement. This work describes a side-channel resistant instance of Saber, one of the lattice-based candidates, using masking as a countermeasure. Saber proves to be very efficient to mask due to two specific design choices: power-of-two moduli, and limited noise sampling of learning with rounding. A major challenge in masking lattice-based cryptosystems is the integration of bit-wise operations with arithmetic masking, requiring algorithms to securely convert between masked representations. The described design includes a novel primitive for masked logical shifting on arithmetic shares, as well as adapts an existing masked binomial sampler for Saber. An implementation is provided for an ARM Cortex-M4 microcontroller, and its side-channel resistance is experimentally demonstrated. The masked implementation features a 2.5x overhead factor, significantly lower than the 5.7x previously reported for a masked variant of NewHope. Masked key decapsulation requires less than 3,000,000 cycles on the Cortex-M4 and consumes less than 12kB of dynamic memory, making it suitable for deployment in embedded platforms.

open questions

quotes

summary

 

typo

A Sound Method for Switching between Boolean and Arith… §

Titel: “A Sound Method for Switching between Boolean and Arithmetic Masking” by Louis Goubin [url] [dblp]
Veröffentlicht etwa 2001 in CHES 2001 und ich habe es gelesen etwa 2021-10
Zusammenfassung: Since the announcement of the Differential Power Analysis (DPA) by Paul Kocher and al., several countermeasures were proposed in order to protect software implementations of cryptographic algorithms. In an attempt to reduce the resulting memory and execution time overhead, a general method was recently proposed, consisting in “masking” all the intermediate data.

quotes

summary

One of the most beautiful papers, I have read.

Minor issues:

A generalized approach to document markup §

Titel: “A generalized approach to document markup” by Dr C F Goldfarb [url] [dblp]
Veröffentlicht etwa 1981 in SIGPLAN 1981 und ich habe es gelesen etwa 2024-11
Zusammenfassung:

examples

SCRIPT language

.sk 1
Text processing and word processing systems typically require users to intersperse additional information In the natural text of the document being processed.
This added information, called 'markup,' serves two purposes:
.tb 4
.of 4
.sk 1
1.  it separates the logical elements of the document; and
.of 4
.sk 1
2.  it specifies the processing functions to be
performed on those elements.
.of 0
.sk 1

GML example

:p.
Text processing and word processing systems typically require users to intersperse additional information in the natural text of the document being processed.
This added information, called :q.markup::q., serves two purposes:
:ol.
:li.it separates the logical elements of the document; and
:li.it specifies the processing functions to be performed on those elements.
::ol.

GML example:

:fig id-angelfig
:figbody :artwork depth-2qp
::artwork
::figbody
:figcap. Three Angels Dancing
::figcap
::fig

SGML/XML equivalent:

<fig id="angelfig">
  <figbody>
    <artwork depth="24p"/>
  </figbody>
  <figcap>
    Three Angels Dancing
  </figcap>
</fig>

quotes

  • “Text processing and word processing systems typically require users to intersperse additional information in the natural text of the document being processed.” (Goldfarb, p. 68)
  • “Procedural markup is also inflexible.” (Goldfarb, p. 68)
  • “These disadvantages of procedural markup are avoided by a markup schema due to C. F. Goldfarb, E. J. Mosher, and R. A. Lorie (3, 4). It is called the "Generalized Markup Language" (GML) because it does not restrict documents to a single application, formatting style, or processing system.” (Goldfarb, p. 69)
  • “GML is based on two novel postulates:

    1. Markup should describe a document's structure and other attributes, rather than specify processing to be performed on it, as descriptive markup need be done only once and will suffice for all future processing.
    2. Markup should be rigorous, so the techniques available for processing rigorously-defined objects like programs and data bases can be used for processing documents as well.” (Goldfarb, p. 69)
  • “The comma that follows the quotation element is not actually part of it, but is brought inside the quotation marks during formatting.” (Goldfarb, p. 69)
  • “Different formats can then be obtained from the same markup by invoking a different set of homonymous procedures.” (Goldfarb, p. 70)
  • “From this we can construct a 3-step model of document processing:

    1. Recognition
    2. Mapping
    3. Processing” (Goldfarb, p. 69)
  • “In the case of low-level elements such as words and sentences, the user is normally given little control over the processing, and almost none over the recognition.” (Goldfarb, p. 70)
  • “In terms of the document processing model, the advantage of descriptive markup is that it permits the user to define attributes--and therefore element types--not known to the formatter, and to specify the processing for them.” (Goldfarb, p. 70)
  • “So far the discussion has addressed only a single attribute, the generic identifier, whose value characterizes an element's semantic role or purpose. Some descriptive markup schemes refer to markup as "generic coding," because the GI is the only attribute they recognize (5). In generic coding schemes, recognition, mapping, and processing can be accomplished all at once by the simple device of using GIs as control procedure names.” (Goldfarb, p. 70)
  • “Different formats can then be obtained from the same markup by invoking a different set of homonymous procedures.” (Goldfarb, p. 70)
  • “One way in which GML differs from generic coding schemes is in the conceptual and notational tools it provides for dealing with this hierarchical structure.” (Goldfarb, p. 70)
  • “There has been a 40% reduction in markup, since three ending generic identifiers are no longer needed.” (Goldfarb, p. 71)
  • “Tothis the author added some element types required for textbooks (such as "exercise"), and mnemonic names for "constant" elements like mathematical and logical symbols.” (Goldfarb, p. 72)

A plea for lean software §

Titel: “A plea for lean software” by N. Wirth [url] [dblp]
Veröffentlicht etwa 1995 in und ich habe es gelesen etwa 2024-03
Zusammenfassung:

quotes

  • “Memory requirements of today’s workstations typically jump substantially—from several to many megabytes—whenever there’s a new software release.”
  • “About 25 years ago, an interactive text editor could be designed with as little as 8,000 bytes of storage. (Modern program editors request 100 times that much!)”
  • “With a touch of humor, the following two laws reflect the state of the art admirably well:

    • Software expands to fill the available memory. (Parkinson)
    • Software is getting slower more rapidly than hardware becomes faster. (Reiser)”
  • “those arbitrarily overlapping windows suggested by the uncritically but widely adopted desktop metaphor; and fancy icons decorating the screen display, such as antique mailboxes and garbage cans that are further enhanced by the visible movement of selected items toward their ultimate destination.”
  • “Increasingly, people seem to misinterpret complexity as sophistication, which is baffling—the incomprehensible should cause suspicion rather than admiration.”
  • “Good engineering is characterized by a gradual, step-wise refinement of products that yields increased performance under given constraints and with given resources.”
  • “the most demanding aspect of system design is its decomposition into modules. Each module is a part with a precisely defined interface that specifies imports and exports.”
  • “Precisely defining the right decomposition is difficult and can rarely be achieved without iterations. Iterative (tuning) improvements are of course only possible up to the time of system release.”
  • “To gain experience, there is no substitute for one’s own programming effort.”

A system for typesetting mathematics §

Titel: “A system for typesetting mathematics” by Brian W. Kernighan, Lorinda L. Cherry [url] [dblp]
Veröffentlicht etwa 1975 in und ich habe es gelesen etwa 2022-01
Zusammenfassung: The syntax of the language is specified by a small context-free grammar; a compiler-compiler is used to make a compiler that translates this language into typesetting commands. Output may be produced on either a phototypesetter or on a terminal with forward and reverse half-line motions. The system interfaces directly with text formatting programs, so mixtures of text and mathematics may be handled simply. This paper was typeset by the authors using the system described.

error

quotes

summary

Design paper for a fundamental tool in the UNIX space is introduced. It is neat to see how the fundamentals of mathematical typesetting developed. It is truly an achievement that the paper is set using eqn & troff itself. However, the formal grammar behind eqn is not formalized rigorously, seems pretty complex (custom disambiguation rules).

Additively Homomorphic Ring-LWE Masking §

Titel: “Additively Homomorphic Ring-LWE Masking” by Oscar Reparaz, Ruan de Clercq, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid Verbauwhede [url] [dblp]
Veröffentlicht etwa 2016 in PQCrypto 2016 und ich habe es gelesen etwa 2020-07
Zusammenfassung: In this paper, we present a new masking scheme for ringLWE decryption. Our scheme exploits the additively-homomorphic property of the existing ring-LWE encryption schemes and computes an additive-mask as an encryption of a random message. Our solution differs in several aspects from the recent masked ring-LWE implementation by Reparaz et al. presented at CHES 2015; most notably we do not require a masked decoder but work with a conventional, unmasked decoder. As such, we can secure a ring-LWE implementation using additive masking with minimal changes. Our masking scheme is also very generic in the sense that it can be applied to other additively-homomorphic encryption schemes.

quotes

The most important statement is equation 3:

decryption(c1, c2) ⊕ decryption(c1’, c2’) = decryption(c1 + c1’, c2 + c2’)

summary

A paper of rather low quality. The core essence is Equality 3: decryption(c1, c2) ⊕ decryption(c1’, c2’) = decryption(c1 + c1’, c2 + c2’). Besides that, there are many confusing statements. The workings of Equality 3 are barely mentioned (i.e. correctness of Equality 3 is IMHO insufficiently discussed), but should be understandable for everyone in the field. Correctness is non-obvious, because we have an addition on the RHS and XOR on the LHS. But it is trivial, because on the RHS, we consider ℤq whereas LHS uses ℤ2 and the XOR is addition in ℤ2. Unlike the CHES 2015 paper, no additional circuitry is needed, which makes it a super interesting approach. The failure rate increases by a factor of 92 (3.6 × 10-5 → 3.3 × 10-3 per bit), which is a difficult problem of its own, but given in the CHES 2015 approach as well.

In summary, the approach computes the encryption of the original message (⇒ (c1, c2)) and also the encryption of some random value (⇒ (c1’, c2’)). Then, we don't decrypt dec(c1, c2), but dec(c1 + c1’, c2 + c2’).

typo

Aggregated Private Information Retrieval §

Titel: “Aggregated Private Information Retrieval” by Lukas Helminger, Daniel Kales, Christian Rechberger [url] [dblp]
Veröffentlicht etwa 2020-05 in und ich habe es gelesen etwa 2020-07
Zusammenfassung: With the outbreak of the coronavirus, governments rely more and more on location data shared by European mobile network operators to monitor the advancements of the disease. In order to comply with often strict privacy requirements, this location data, however, has to be anonymized, limiting its usefulness for making statements about a filtered part of the population, like already infected people.

quotes

summary

The paper implements a nice idea. The usefulness of the usecase needs to be discussed with public health experts (what does it help if I know that many infected people live in this block of houses?). However, I have been told the entire paper was written in 1 month and that is quite impressive considering the technical depth in the field of Homomorphic Encryption.

There are many typos and to me, the main purpose of the protocol in Figure 1 was not comprehensible before talking to the authors. I also didn't understand in which ways ε-Differential Privacy is desirable or how it can be ensured and which definition they used for “vector c is binary” before going into details in section 3.2. Apparently, a binary vector is desirable to prevent leakage. For Figure 2, they used the Teχ package cryptocode to illustrate the protocol. To the best of my understanding, this is just a reiteration of Figure 2. On page 13, the paragraph “Note that, as we have already mentioned in Section 3.6” should be moved to the concluding remarks. On page 14, it is unclear, what “isolated profiles” are. I didn't go through the details of section 5.

BasicBlocker: Redesigning ISAs to Eliminate Speculativ… §

Titel: “BasicBlocker: Redesigning ISAs to Eliminate Speculative-Execution Attacks” by Jan Philipp Thoma, Jakob Feldtkeller, Markus Krausz, Tim Güneysu, Daniel J. Bernstein [url] [dblp]
Veröffentlicht etwa 2020-07 in und ich habe es gelesen etwa 2020-08
Zusammenfassung: Recent research has revealed an ever-growing class of microarchitectural attacks that exploit speculative execution, a standard feature in modern processors. Proposed and deployed countermeasures involve a variety of compiler updates, firmware updates, and hardware updates. None of the deployed countermeasures have convincing security arguments, and many of them have already been broken.

Comment: Preprint

quotes

summary

Quite good paper.

“Preventing speculative execution exploits” conjured up more sophisticated expectations for my part, but in general the idea is legit and the research was done properly. Essentially, the additional bb instruction annotates how many following instructions do not contain control flow instructions which allows a processor to prefetch them without any speculative considerations. An additional lcnt instruction for RISC-V handles loops.

In the reading group it was criticized that formal definitions were cumbersome and inappropriate, but I consider it a strong suit that the idea was not only presented, but formalized. I think the negative aspects are that some statements are not properly attributed; like Observation 1 which is not a contribution by this paper, but prior work. Furthermore statements like “One could easily extend our design to larger, more complex CPUs” seem unjustified. Intel Skylake CPUs are built with 19 stage pipelines, which certainly shows different performance metrics. So more research into the relation of number of pipeline stages and performance is required.  A new instruction type for RISC-V is also a non-trivial modification in practice. The corresponding code is not yet published, but the paper was just released 1 month prior reading. Another weak point is selling, which focuses on the good performance of BasicBlocker in the nettle-aes and nettle-sha benchmarks. As cryptographic applications, these obviously do not rely on branching except for “for loops”, which is completely non-representative, but statements like “In some cases, BasicBlocker even outperforms sophisticated branch predictors” can be found. Finally, Claudio pointed out very well, that JIT compilation cannot be implemented with BB since the number of instructions of a block are most likely not known.

Overall, a quite good paper, but less aggressive selling would have increased reputation from my perspective.

Benchmarking Post-quantum Cryptography in TLS §

Titel: “Benchmarking Post-quantum Cryptography in TLS” by Christian Paquin, Douglas Stebila, Goutam Tamvada [url] [dblp]
Veröffentlicht etwa 2020-04 in PQCRYPTO 2020 und ich habe es gelesen etwa 2021-06
Zusammenfassung: Post-quantum cryptographic primitives have a range of tradeoffs compared to traditional public key algorithms, either having slower computation or larger public keys and ciphertexts/signatures, or both. While the performance of these algorithms in isolation is easy to measure and has been a focus of optimization techniques, performance in realistic network conditions has been less studied. Google and Cloudflare have reported results from running experiments with post-quantum key exchange algorithms in the Transport Layer Security (TLS) protocol with real users’ network traffic. Such experiments are highly realistic, but cannot be replicated without access to Internet-scale infrastructure, and do not allow for isolating the effect of individual network characteristics.

quotes

summary

Neat experiment. OpenQuantumSafe's project implementation is used to run a network experiment on Linux virtual network devices where packet loss and round trip times are the considered variables. The key finding is obvious, but the collected data is provided publicly enabling future research. It was very interesting to read Firefox telemetry's reference packet loss data.

Bitcoin: A Peer-to-Peer Electronic Cash System §

Titel: “Bitcoin: A Peer-to-Peer Electronic Cash System” by Satoshi Nakamoto [url] [dblp]
Veröffentlicht etwa 2009-03 in und ich habe es gelesen etwa 2024-12
Zusammenfassung: A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.

summary

Very interesting. Even though, I knew Bitcoin’s idea before reading the paper, I enjoyed reading it because it perfectly describes proof-of-work as central idea and takes the threat of overtaking the network serious. The writing style is easy to follow and well-structured. I was only surprised by the description of multiple in/out values (i.e. the phrasing) and furthermore the limit of minable coins is not explicitly discussed. Interesting to revisit hardware considerations from 2008 in 2024.

Borrow checking Hylo §

Titel: “Borrow checking Hylo” by Dimi Racordon, Dave Abrahams [url] [dblp]
Veröffentlicht etwa 2023 in SPLASH 2023 und ich habe es gelesen etwa 2024-11
Zusammenfassung: Hylo is a language for high-level systems programming that promises safety without loss of efficiency. It is based on mutable value semantics, a discipline that emphasizes the independence of values to support local reasoning. The result—in contrast with approaches based on sophisticated aliasing restrictions—is an efficient, expressive language with a simple type system and no need for lifetime annotations. Safety guarantees in Hylo programs are verified by an abstract interpreter processing an intermediate representation, Hylo IR, that models lifetime properties with ghost instructions. Further, lifetime constraints are used to eliminate unnecessary memory allocations predictably.

quotes

  • “Hylo is a language for high-level systems programming that promises safety without loss of efficiency. It is based on mutable value semantics, a discipline that emphasizes the independence of values to support local reasoning. The result—in contrast with approaches based on sophisticated aliasing restrictions—is an efficient, expressive language with a simple type system and no need for lifetime annotations.” (Racordon and Abrahams, 2023, p. 1)
  • “Enter Hylo [1], a project that promises memory safety with zero-cost abstraction. Unlike languages with similar goals such as Rust [13], Hylo does not attempt to police references with a sophisticated type system; instead, it simply removes them from the user model. Variables cannot share mutable state, eliminating data races, and object lifetimes can be tracked trivially, eliminating temporal safety violations. As a result, Hylo is immune to most commonly reported software vulnerabilities [4].” (Racordon and Abrahams, 2023, p. 1)
  • “Hylo enforces MVS using a form of linear type system [20]. Such systems require that all values be used exactly once, thus guaranteeing freedom from aliasing and resource leakage. Most are, however, extremely restrictive and have been widely criticized for being impractical. Fortunately, Hylo makes two observations to address this issue.” (Racordon and Abrahams, 2023, p. 2)
  • “The first is that destruction need not be explicit, as the compiler can detect unused values and destroy them automatically.1 The second is that reading or modifying a value is safe as long as such value cannot be modified concurrently. As a result, Hylo only requires that all values be consumed (returned from a function, moved to new storage, or deinitialized) exactly once.” (Racordon and Abrahams, 2023, p. 2)

summary

Hylo uses the concepts of Mutable Value Semantics (MVS) and linear types to build a type system which provides memory safety. Unlike rust where the behavior is unambiguously encoded into the types, Hylo allows to provide multiple implementations and decides on function-level which implementation is most appropriate based on the context (e.g. if you don’t use a struct afterwards anymore, it does not make sense to create a copy upon modification). In this example (the program implements bit access/assignment into a uint32), implementations with the semantics sink/let/inout are provided (“set” is left out):

extension UInt32 {
 public subscript(_ i: Int): Bool {
   sink { return (self & (1 << i)) != 0 }
   let { yield (self & (1 << i)) != 0 }
   inout {
     var b = self[i]
     yield &b
     if b { &self |= 1 << i }
     else { &self &= ~(1 << i) }
   }
 }
}

public fun main() {
 var s: UInt32 = 0
 &s[4] = true
 print(s) // "16"
}

Maybe I overlooked the details but what does sink/let/inout relate to? It seems to be implicitly relate to the return value only. I could not figure out what will happen with multiple return values. I also could not figure out how interior mutability is represented.

But the resulting model by Hylo is simpler than rust’s borrow checking and provides a simpler, easier-to-communicate framework for memory safety.

C-rusted: The Advantages of Rust, in C, without the Di… §

Titel: “C-rusted: The Advantages of Rust, in C, without the Disadvantages” by Roberto Bagnara, Abramo Bagnara, Federico Serafini [url] [dblp]
Veröffentlicht etwa 2023-02 in und ich habe es gelesen etwa 2023-05-07
Zusammenfassung: C-rusted is an innovative technology whereby C programs can be (partly) annotated so as to express: ownership, exclusivity and shareability of language, system and user-defined resources; dynamic properties of objects and the way they evolve during program execution; nominal typing and subtyping. The (partially) annotated C programs can be translated with unmodified versions of any compilation toolchain capable of processing ISO C code. The annotated C program parts can be validated by static analysis: if the static analyzer flags no error, then the annotations are provably coherent among themselves and with respect to annotated C code, in which case said annotated parts are provably exempt from a large class of logic, security, and run-time errors.
Comment: 7 pages, 4 figures

quotes

  • “There are very strong economical reasons behind the use of the C programming language, namely: 1) C compilers exist for almost any processor; 2) C compiled code is very efficient and without hidden costs; 3) C allows writing compact code thanks to the many builtin operators and the limited verbosity of its constructs; 4) C is defined by an ISO standard [1]; 5) C, possibly with extensions, allows easy access to the hardware; 6) C has a long history of usage, including in critical systems; 7) C is widely supported by all sorts of tools.” (Bagnara et al., 2023, p. 1)
  • “On the other hand, several of C’s strong points have negative counterparts, e.g.: 1) The fact that C code can efficiently be compiled to machine code for almost any architecture is due to the fact that, whenever this is possible and convenient, highlevel constructs are mapped directly to a few machine instructions: given that instructions sets differ from one architecture to the other, this is why the behavior of C programs is not fully defined. 2) The reason why the maximum execution time of C programs can be estimated with good precision by expert programmers is because there is nothing happening under the hood and, in particular, there is no built-in run-time error detection.” (Bagnara et al., 2023, p. 1)
  • “These negative sides of C compound when memory handling is concerned, as memory handling is fully under the programmers responsibility: 1) memory references in C are (unless special care is taken) raw pointers that bring with themselves no information about the associated memory block or its intended use; 2) no run-time checks are made to ensure the safety of pointer arithmetic, memory accesses, and memory deallocation; 3) code involving memory addressing with pointers can be particularly opaque to peer review. Some of the most common C memory issues are: • dereferencing invalid pointers, including null pointers, dangling pointers (pointers to deallocated memory blocks), and misaligned pointers; • use of uninitialized memory; • memory leaks; • invalid deallocation (including double free and free with invalid argument); • buffer overflow.” (Bagnara et al., 2023, p. 1)
  • “Even though various coding standards (with MISRA C being the most authoritative one) and lots of “bug finders” exist, there is no verification tool that can guarantee, in a strong sense, the absence of a large class of software defects in a consistent, effective and repeatable way.” (Bagnara et al., 2023, p. 1)
  • “MISRA C provides guidelines for writing software that is on average much safer;” (Bagnara et al., 2023, p. 1)
  • “systems based on deductive methods, like Frama-C [2], require programmers that are highly skilled in mathematical logic and, even when such programmers are available” (Bagnara et al., 2023, p. 1)

    “development time is multiplied by a factor from 2 to 4” (Bagnara et al., 2023, p. 2)

  • “types in C do not offer programmers a way of expressing non-trivial data properties that are bound to the program logic. For instance: • an open file has the same type as a closed file; • a resource or a transaction has the same type independently from its state; • an exclusive reference and a shared reference to a resource are indistinguishable; • an integer with special values that represent error conditions is indistinguishable from an ordinary integer.” (Bagnara et al., 2023, p. 2)
  • “Most importantly, this enables the C-rusted Analyzer, which is based on the ECLAIR Software Verification Platform, to verify correctness on any platform, with any architecture, and for each compiler.” (Bagnara et al., 2023, p. 2)
  • “That is, keep using C, exactly as before, using the same compilers and the same tools, the same personnel . . . but incrementally adding to the program the information required to demonstrate correctness, using a system of annotations that is not based on mathematical logic and can be taught to programmers in a week of training.” (Bagnara et al., 2023, p. 5)
  • “This technique is not new: it is called gradual typing, and consists in the addition of information that does not alter the behavior of the code, yet it is instrumental in the verification of its correctness. Gradual typing has been applied with spectacular success in the past: Typescript has been created 10 years ago, and in the last 6 years its diffusion in the community of JavaScript developers has increased from 21% to 69%. And it will continue to increase: simply put, there is no reason to write more code in the significantly less secure and verifiable JavaScript language” (Bagnara et al., 2023, p. 5)
  • “For instance, C-rusted is 100% compatible with MISRA C: a C program that is MISRA compliant can be rusted without negatively impacting MISRA compliance.” (Bagnara et al., 2023, p. 6)
  • “Contrast this with Rust and Zig: they are not standardized and, as a matter of fact, they frequently change in 3C-rusted is compatible with any version of the ISO C Standard and can be used with any C toolchain. 4We note on passing that, in the authors’ opinion, C-to-Rust transpilation [16], [17], [18] is not a real solution: transpiling well-written C code to unreadable and unmaintainable Rust code could possibly solve only a small fraction of the problems at the cost of creating several new problems. This, however, goes beyond the scope of this paper. a way that does not follow a rigorous process. This is the main reason why qualifying a Rust or Zig compilation toolchain is impossible today. In contrast, any qualified C compiler is, as is, a qualified C-rusted compiler.” (Bagnara et al., 2023, p. 6)
  • “The analysis is rigorously intraprocedural, i.e., it is done one function at a time, using only the information available for that function in the translation unit defining it, which includes the annotations possibly provided in function declarations.” (Bagnara et al., 2023, p. 7)

summary

In this paper, the authors extend the MISRA C Checker with additional type checks and type annotations which allow it to apply the common rust semantics (“owner handles” map to ownership, “exclusive handles” to &, “shared handles” to &mut) to C. I like the reference to gradual typing since it really follows that paradigm. But once more the system only captures a tiny fraction of issues and semantics are not enforced but only provide hints. And as with all gradual typing systems, the user loses overview whether is code strictly follows all recommendations or required properties are just not met because insufficient type information is provided.

As a result, I consider this as “nice approach” in a line of many others, but it likely won’t receive popularity.

Cheat Sheets for Data Visualization Techniques §

Titel: “Cheat Sheets for Data Visualization Techniques” by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach [url] [dblp]
Veröffentlicht etwa 2020-04-25 in CHI 2020 und ich habe es gelesen etwa 2020-10-18
Zusammenfassung: This paper introduces the concept of ‘cheat sheets’ for data visualization techniques, a set of concise graphical explanations and textual annotations inspired by infographics, data comics, and cheat sheets in other domains. Cheat sheets aim to address the increasing need for accessible material that supports a wide audience in understanding data visualization techniques, their use, their fallacies and so forth. We have carried out an iterative design process with practitioners, teachers and students of data science and visualization, resulting six types of cheat sheet (anatomy, construction, visual patterns, pitfalls, false-friends and well-known relatives) for six types of visualization, and formats for presentation. We assess these with a qualitative user study using 11 participants that demonstrates the readability and usefulness of our cheat sheets.

definitions

quotes

summary

Nice paper. First, it looks like there is no necessity to write a paper about cheatsheets, but the authors attributed appropriate contextualization and also presented usability study results. In the end, it is still all about the cheat sheets and they themselves are well-designed and nice (though I am in favor of exhaustiveness; thus e.g. I would like to see alternative terminology included).

The website is beautiful, btw: https://visualizationcheatsheets.github.io/

typos

Crouching error, hidden markup [Microsoft Word] §

Titel: “Crouching error, hidden markup [Microsoft Word]” by N. Holmes [url] [dblp]
Veröffentlicht etwa 2001-09 in und ich habe es gelesen etwa 2021-08
Zusammenfassung:

quotes

summary

In this article Holmes derives from his personal experiences the requirements for a modern text formatting system. He favors a markup language over the intransparent data model of WYSIWYG editors. He concludes that a dual system is the preferred user-centric way to go wheres Teχ is praised for its visual result.

I like his fundamental discussion of standards and former text formatting tools. His preference for Lilac seems understandable, but lacks depth. Whereas the concept was also applied to Macromedia's Dreamweaver on the HTML markup language, the situation becomes increasingly difficult with a larger data model and more powerful layouting scheme. CSS allows users to place objects on defined pixels which makes it difficult to select and edit elements in a visual editor if elements get close. One of the issues, the author completely ignores is Teχ's property as representational notation (in contrast to HTML). As such Teχ is less a markup language than a sequence of instructions to generate a document.

 

Cryptanalysis of ring-LWE based key exchange with key … §

Titel: “Cryptanalysis of ring-LWE based key exchange with key share reuse” by Scott Fluhrer [url] [dblp]
Veröffentlicht etwa 2016 in und ich habe es gelesen etwa 2020-06
Zusammenfassung:

quotes

summary

 

typos

Cryptographic competitions §

Titel: “Cryptographic competitions” by Daniel J Bernstein [url] [dblp]
Veröffentlicht etwa 2020-12 in und ich habe es gelesen etwa 2021-06-20
Zusammenfassung: Competitions are widely viewed as the safest way to select cryptographic algorithms. This paper surveys procedures that have been used in cryptographic competitions, and analyzes the extent to which those procedures reduce security risks.

quotes

summary

Very interesting meta-level read with many historical remarks. As organizers of cryptographic competitions, djb also has required background information to share. Performance as criterion is broadly discussed in the first half. The contextualization of running competitions is done nicely. Personal remarks can be found more towards the end. But it cannot offer solutions to the posed (difficult) problems.

Hitchhiker's guide to the paper:

Significant quotes:

 

Cyclone: A safe dialect of C §

Titel: “Cyclone: A safe dialect of C” by Greg Morrisett, James Cheney, Dan Grossman, Michael Hicks, Yanling Wang [url] [dblp]
Veröffentlicht etwa 2002 in USENIX 2002 und ich habe es gelesen etwa 2020-02
Zusammenfassung: Cyclone is a safe dialect of C. It has been designed from the ground up to prevent the buffer overflows, format string attacks, and memory management errors that are common in C programs, while retaining C’s syntax and semantics. This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C’s hallmark control over low-level details such as data representation and memory management.

Ambiguity

“Arrays and strings are converted to ?-pointers as necessary (automatically by the compiler).”

→ When exactly?

Example

“We don’t consider it an error if non-pointers are uninitialized. For example, if you declare a local array of non-pointers, you can use it without initializing the elements:

char buf[64];       // contains garbage ..
sprintf(buf,"a");   // .. but no err here
char c = buf[20]; // .. or even here

This is common in C code; since these array accesses are in-bounds, we allow them.”

Example

“However, this technique will not catch even the following simple variation:

char *itoa(int i) {
  char buf[20];
  char *z;
  sprintf(buf,"%d",i);
  z = buf;
  return z;
}

Here, the address of buf is stored in the variable z, and then z is returned. This passes gcc -Wall without complaint.”

Quotes

Region blocks

“Therefore, Cyclone provides a feature called growable regions. The following code declares a growable region, does some allocation into the region, and deallocates theregion:

region h {
  int *x = rmalloc(h,sizeof(int));
  int ?y = rnew(h) { 1, 2, 3 };
  char ?z = rprintf(h,"hello");
}

The code uses a region block to start a new, growable region that lives on the heap. The region is deallocated on exit from the block (without an explicit free).”

Summary

Great paper with pragmatic approach derived from the Typed Assembly Language project. Definitely worth a read. Essential for everyone interested in the Rust programming language as this project inspired many ideas related to proper memory management and lifetimes. Implementation needs to be compiled on your own, but it is not maintained anymore anyways. Furthermore, there are more papers following the progress of the project and they also introduced more drastic changes which discards the label “pragmatic”.

Tagged unions

“We solve this in Cyclone in two steps. First, we add tagged unions to the language:”

tunion t {
  Int(int);
  Str(char ?);
};

[…]

void pr(tunion t x) {
  switch (x) {
    case &Int(i): printf("%d",i); break;
    case &Str(s): printf("%s",s); break;
  }
}

“The printf function itself accesses the tagged arguments through a fat pointer (Cyclone’s varargs are bounds checked)”

Design Guidelines for Domain Specific Languages §

Titel: “Design Guidelines for Domain Specific Languages” by Gabor Karsai, Holger Krahn, Claas Pinkernell, Bernhard Rumpe, Martin Schindler, Steven Völkel [url] [dblp]
Veröffentlicht etwa 2009 in Proceedings of the 9th OOPSLA Workshop on Domain-Specific Modeling und ich habe es gelesen etwa 2023-12
Zusammenfassung: Designing a new domain specific language is as any other complex task sometimes error-prone and usually time consuming, especially if the language shall be of high-quality and comfortably usable. Existing tool support focuses on the simplification of technical aspects but lacks support for an enforcement of principles for a good language design. In this paper we investigate guidelines that are useful for designing domain specific languages, largely based on our experience in developing languages as well as relying on existing guidelines on general purpose (GPLs) and modeling languages. We defined guidelines to support a DSL developer to achieve better quality of the language design and a better acceptance among its users.

summary

Quite okay paper presenting 26 design guidelines for domain-specific languages along 5 categories. For example “Compose existing languages where possible”, “Limit number of language elements”, and “Use descriptive notations”. I wonder whether they could have clarified terminology “consistency”, “redundancy”, etc. Specifically they could have shown examples for syntaxes violating these guidelines.

Detecting Unsafe Raw Pointer Dereferencing Behavior in… §

Titel: “Detecting Unsafe Raw Pointer Dereferencing Behavior in Rust” by Zhijian Huang, Yong Jun Wang, Jing Liu [url] [dblp]
Veröffentlicht etwa 2018 in IEICE 2018 und ich habe es gelesen etwa 2021-11
Zusammenfassung: The rising systems programming language Rust is fast, efficient and memory safe. However, improperly dereferencing raw pointers in Rust causes new safety problems. In this paper, we present a detailed analysis into these problems and propose a practical hybrid approach to detecting unsafe raw pointer dereferencing behaviors. Our approach employs pattern matching to identify functions that can be used to generate illegal multiple mutable references (We define them as thief function) and instruments the dereferencing operation in order to perform dynamic checking at runtime. We implement a tool named UnsafeFencer and has successfully identified 52 thief functions in 28 real-world crates∗, of which 13 public functions are verified to generate multiple mutable references.

quotes

summary

A nice specific LLVM compiler plugin to detect a special case where multiple mutable references in the context of  unsafe are generated. Its practicality is limited, but the authors provided pragmatic tools to make it as accessible as possible.

typos/problems

EWD1300: The Notational Conventions I Adopted, and Why §

Titel: “EWD1300: The Notational Conventions I Adopted, and Why” by Edsger W. Dijkstra [url] [dblp]
Veröffentlicht etwa 2002 in und ich habe es gelesen etwa 2021-09
Zusammenfassung:

quotes

summary

A nice read summing up opinions related to mathematical notation. Prefix/Infix notation, precedence rules, and the layout of proofs are discussed. Details are often lacking, but I expected it to be only some opinion paper.

Engineering a sort function §

Titel: “Engineering a sort function” by Jon L. Bentley, M. Douglas McIlroy [url] [dblp]
Veröffentlicht etwa 1993 in Software - Practice and Experience und ich habe es gelesen etwa 2022-12-29
Zusammenfassung: We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra’s Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design techniques apply in domains beyond sorting.

quotes

  • “C libraries have long included a qsort function to sort an array, usually implemented by Hoare’s Quicksort.1 Because existing qsorts are flawed, we built a new one. This paper summarizes its evolution.” (Bentley and McIlroy, 1993, p. 1249)
  • “Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983 would consume quadratic time on arrays that contain a few elements repeated many times—in particular arrays of random zeros and ones.” (Bentley and McIlroy, 1993, p. 1249)
  • “The sort need not be stable; its specification does not promise to preserve the order of equal elements.” (Bentley and McIlroy, 1993, p. 1250)
  • “Sedgewick studied Quicksort in his Ph.D. thesis” (Bentley and McIlroy, 1993, p. 1251)
  • “A more efficient (and more familiar) partitioning method uses two indexes i and j. Index i scans up from the bottom of the array until it reaches a large element (greater than or equal to the partition value), and j scans down until it reaches a small element. The two array elements are then swapped, and the scans continue until the pointers cross. This algorithm is easy to describe, and also easy to get wrong—Knuth tells horror stories about inefficient partitioning algorithms.” (Bentley and McIlroy, 1993, p. 1252)
  • “As a benchmark, swapping two integers in inline code takes just under a microsecond.” (Bentley and McIlroy, 1993, p. 1253)
  • “using inline swaps for integer-sized objects and a function call otherwise.” (Bentley and McIlroy, 1993, p. 1254)
  • “When a colleague and I modified sort to improve reliability and efficiency, we found that techniques that improved performance for other sorting applications sometimes degraded the performance of sort.’” (Bentley and McIlroy, 1993, p. 1254)
  • “Partitioning about a random element takes Cn ∼∼ 1.386n lg n comparisons. We now whittle away at the constant in the formula. If we were lucky enough to choose the median of every subarray as the partitioning element, we could reduce the number of comparisons to about n lg n.” (Bentley and McIlroy, 1993, p. 1254)
  • “We adopted Tukey’s ‘ninther’, the median of the medians of three samples, each of three elements.” (Bentley and McIlroy, 1993, p. 1255)
  • “Tripartite partitioning is equivalent to Dijkstra’s ‘Dutch National Flag’ problem.” (Bentley and McIlroy, 1993, p. 1257)
  • “Quicksort with split-end partitioning (Program 7) is about twice as fast as the Seventh Edition qsort.” (Bentley and McIlroy, 1993, p. 1258)
  • “Since the expected stack size is logarithmic in n, the stack is likely to be negligible compared to the data—only about 2,000 bytes when n = 1,000,000.” (Bentley and McIlroy, 1993, p. 1259)
  • “We therefore emulated Knuth’s approach to testing TeX: ‘I get into the meanest, nastiest frame of mind that I can manage, and I write the nastiest code I can think of; then I turn around and embed that in even nastier constructions that are almost obscene.’” (Bentley and McIlroy, 1993, p. 1260)
  • “P. McIlroy’s merge sort has guaranteed O (n log n) worst-case performance and is almost optimally adaptive to data with residual order (it runs the highly nonrandom certification suite of Figure 1 almost twice as fast as Program 7), but requires O (n) additional memory.” (Bentley and McIlroy, 1993, p. 1262)
  • “The key to performance is elegance, not battalions of special cases.” (Bentley and McIlroy, 1993, p. 1263)

summary

In this paper, the authors try to improve the performance of qsort; a Quicksort implementation by Lee McMahon based on Scowen’s ‘Quickersort’ shipped with the Seventh Edition Unix System. Predictably, Quicksort exhibits quadratic behavior in an easy-to-discover set of inputs. As a result, the authors look at some other proposals and use techniques like split-end partitioning to improve upon these set of inputs.

An old, but down-to-earth paper. It is interesting to see how asymptotic behavior was really the main driver for considerations during that time (contrary to considerations regarding caches like today). But I want to emphasize that actual benchmarks were also provided in the paper.

typo

“to sift the” (Bentley and McIlroy, 1993, p. 1250)

Everything Old is New Again: Binary Security of WebAss… §

Titel: “Everything Old is New Again: Binary Security of WebAssembly” by Daniel Lehmann, Johannes Kinder, Michael Pradel [url] [dblp]
Veröffentlicht etwa 2020 in USENIX Security 2020 und ich habe es gelesen etwa 2020-07
Zusammenfassung: WebAssembly is an increasingly popular compilation target designed to run code in browsers and on other platforms safely and securely, by strictly separating code and data, enforcing types, and limiting indirect control flow. Still, vulnerabilities in memory-unsafe source languages can translate to vulnerabilities in WebAssembly binaries. In this paper, we analyze to what extent vulnerabilities are exploitable in WebAssembly binaries, and how this compares to native code. We find that many classic vulnerabilities which, due to common mitigations, are no longer exploitable in native binaries, are completely exposed in WebAssembly. Moreover, WebAssembly enables unique attacks, such as overwriting supposedly constant data or manipulating the heap using a stack overflow. We present a set of attack primitives that enable an attacker (i) to write arbitrary memory, (ii) to overwrite sensitive data, and (iii) to trigger unexpected behavior by diverting control flow or manipulating the host environment. We provide a set of vulnerable proof-of-concept applications along with complete end-to-end exploits, which cover three WebAssembly platforms. An empirical risk assessment on real-world binaries and SPEC CPU programs compiled to WebAssembly shows that our attack primitives are likely to be feasible in practice. Overall, our findings show a perhaps surprising lack of binary security in WebAssembly. We discuss potential protection mechanisms to mitigate the resulting risks.

quotes

summary

Wonderful paper. Firstly, a security analysis was necessary and I think they covered an appropriate amount of attack vectors. Secondly it is, of course, sad to see that many old vulnerabilities still work on a new platform.

I am unconditionally in favor of true read-only memory in WebAssembly. As David points out, this can also be enforced by a compiler by appropriate runtime checks. However, David also specified a criterion: Iff a memory attack could lead to an escape from the WASM sandbox, then it is an issue of WebAssembly sandbox and should be prevented at this level.

I keep wondering about stack canaries and guard pages. Maybe it was a decision between the trade-off of security and performance. But I am not convinced by it with 100%. Thirdly, the paper is well-structured and gives sufficient data to reason its arguments. The attacks were okay, the introduction to WebAssembly was superb and justifying the claims regarding indirect calls with quantitative data in section 6 was outstanding. I think everyone in IT security can easily follow it. I love it!

WebAssembly architecture:

toc

• Introduction
• Background on WebAssembly
• Security Analysis of Linear Memory
• Managed vs. Unmanaged Data
• Memory Layout
• Memory Protections
• Attack Primitives
• Obtaining a Write Primitive
• Stack-based Buffer Overflow
• Stack Overflow
• Heap Metadata Corruption
• Overwriting Data
• Overwriting Stack Data
• Overwriting Heap Data
• Overwriting “Constant” Data
• Triggering Unexpected Behavior
• Redirecting Indirect Calls
• Code Injection into Host Environment
• Application-specific Data Overwrite
• End-to-End Attacks
• Cross-Site Scripting in Browsers
• Remote Code Execution in Node.js
• Arbitrary File Write in Stand-alone VM
• Quantitative Evaluation
• Experimental Setup and Analysis Process
• Measuring Unmanaged Stack Usage
• Measuring Indirect Calls and Targets
• Comparing with Existing CFI Policies
• Discussion of Mitigations
• WebAssembly Language
• Compilers and Tooling
• Application and Library Developers
• Related Work
• Conclusion

Exploring the Vulnerability of R-LWE Encryption to Fau… §

Titel: “Exploring the Vulnerability of R-LWE Encryption to Fault Attacks” by Felipe Valencia, Tobias Oder, Tim Güneysu, Francesco Regazzoni [url] [dblp]
Veröffentlicht etwa 2018 in CS2@HiPEAC 2018 und ich habe es gelesen etwa 2020-06
Zusammenfassung: The future advent of quantum computer pushes for the design and implementation of public-key cryptosystems capable of resisting quantum attacks. Lattice-based cryptography, especially when implemented over ideal lattices, is one of the most promising candidates to replace current public-key systems. Area and performance of different designs have been explored in a wide number of platforms, including embedded ones. However, the resistance of these constructions against physical attacks, although fundamental to guarantee security, still remains largely unexplored.

ambiguity

Description of “Randomization of the key” (page 5):

“Decryption can also be attacked by altering the key injecting a random fault in one coefficient. The drawback of this attack is that it is necessary to know the difference δ between the original key value and the new one. If this is known, the attacker starts to send faulty ciphertexts to the decryption oracle. For each δ we note for each bit in the decrypted message whether it was decrypted correctly or not. We repeat this procedure until we know the decryption error status for each δ and each bit of the message. Then, for each message bit, we count the number of the times the error status has changed between δi and δi+1 , for each δ ∈ [0, q-1]. The number of error status changes divided by two is the absolute value of the secret key at this position.”

Best effort interpretation:

Robert pointed out another attack that is somewhat similar:

inconsistency

quotes

summary

I like the overview-style nature of the paper. It is properly structured. The overview section is very short, dense, and fine. The fault attacks mentioned are not sophisticated, but the overview makes it a nice paper. However, the last attack “Randomization of the key” is incomprehensible. Furthermore it is only published on ACM and all references are broken there. In conclusion: Nice read if you are interested in ring-LWE and fault attacks.

typo

FPGA Vendor Agnostic True Random Number Generator §

Titel: “FPGA Vendor Agnostic True Random Number Generator” by Dries Schellekens, Bart Preneel, Ingrid Verbauwhede [url] [dblp]
Veröffentlicht etwa 2006 in FPL 2006 und ich habe es gelesen etwa 2021/12
Zusammenfassung: This paper describes a solution for the generation of true random numbers in a purely digital fashion; making it suitable for any FPGA type, because no FPGA vendor specific features (e.g., like phase-locked loop) or external analog components are required. Our solution is based on a framework for a provable secure true random number generator recently proposed by Sunar, Martin and Stinson. It uses a large amount of ring oscillators with identical ring lengths as a fast noise source – but with some deterministic bits – and eliminates the non-random samples by appropriate post-processing based on resilient functions. This results in a slower bit stream with high entropy. Our FPGA implementation achieves a random bit throughput of more than 2 Mbps, remains fairly compact (needing minimally 110 ring oscillators of 3 inverters) and is highly portable.

quotes

summary

Rather uninteresting paper. It takes the ideas of Sunar, Martin and Stinson [12] and applies them to an FPGA architecture. After a tiny discussion with related work, the authors reason their choice for parameters.

This paper was suggested in the course “Cryptographic Engineering” I attended.

FastSpec: Scalable Generation and Detection of Spectre… §

Titel: “FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings” by M. Caner Tol, Koray Yurtseven, Berk Gulmezoglu, Berk Sunar [url] [dblp]
Veröffentlicht etwa 2020 in und ich habe es gelesen etwa 2020-07
Zusammenfassung: Several techniques have been proposed to detect vulnerable Spectre gadgets in widely deployed commercial software. Unfortunately, detection techniques proposed so far rely on hand-written rules which fall short in covering subtle variations of known Spectre gadgets as well as demand a huge amount of time to analyze each conditional branch in software. Since it requires arduous effort to craft new gadgets manually, the evaluations of detection mechanisms are based only on a handful of these gadgets.

ambiguity

quotes

summary

Very advanced paper. Perfect combination of Machine Learning technology with microarchitectural attack work. Shows a huge effort and nice considerations regarding Generative Adversarial Networks. However, I could not understand technical aspects of the machine learning part

The masking rate (page 6) seems to be the percentage of hidden tokens during the training phase. Figure 4 is a little bit awkward and a little bit random. FastSpec/scan.sh seems to show how FastSpec was called to evaluate OpenSSL. And commands.txt tries to explain it somehow.

typo

Grading on a Curve: How Rust can Facilitate New Contri… §

Titel: “Grading on a Curve: How Rust can Facilitate New Contributors while Decreasing Vulnerabilities” by Justin Tracey, Ian Goldberg [url] [dblp]
Veröffentlicht etwa 2023 in SecDev 2023 und ich habe es gelesen etwa 2023-12-20
Zusammenfassung: New contributors are critical to open source projects. Without them, the project will eventually atrophy and become inactive, or its experienced contributors will bias the future directions the project takes. However, new contributors can also bring a greater risk of introducing vulnerable code. For projects that have a need for both secure implementations and a strong, diverse contributor community, this conflict is a pressing issue. One avenue being pursued that could facilitate this goal is rewriting components of C or C++ code in Rust—a language designed to apply to the same domains as C and C++, but with greater safety guarantees. Seeking to answer whether Rust can help keep new contributors from introducing vulnerabilities, and therefore ease the burden on maintainers, we examine the Oxidation project from Mozilla, which has replaced components of the Firefox web browser with equivalents written in Rust. We use the available data from these projects to derive parameters for a novel application of learning curves, which we use to estimate the proportion of commits that introduce vulnerabilities from new contributors in a manner that is directly comparable. We find that despite concerns about ease of use, first-time contributors to Rust projects are about 70 times less likely to introduce vulnerabilities than first-time contributors to C++ projects. We also found that the rate of new contributors increased overall after switching to Rust, implying that this decrease in vulnerabilities from new contributors does not result from a smaller pool of more skilled developers, and that Rust can in fact facilitate new contributors. In the process, we also qualitatively analyze the Rust vulnerabilities in these projects, and measure the efficacy of the common SZZ algorithm for identifying bug-inducing commits from their fixes.

quotes

  • “We find that despite concerns about ease of use, first-time contributors to Rust projects are about 70 times less likely to introduce vulnerabilities than first-time contributors to C++ projects.” (Tracey and Goldberg, 2023, p. 1)
  • “Project enhancements are less likely to be implemented when “heroes” dominate development in open source projects [1], and developers for projects such as the Tor Project have struggled to secure funding for maintenance work that is less attractive to traditional funding sources” (Tracey and Goldberg, 2023, p. 1)
  • “The scripts and hand-annotated data necessary to reproduce our results are publicly available.” (Tracey and Goldberg, 2023, p. 2)
  • “From this, we obtain a list of commits that nominally contributed to the bug, which are called “fix-inducing” commits (in our case, they are also the VCCs), and the information associated with those commits (authors, times, histories, etc.).” (Tracey and Goldberg, 2023, p. 2)
  • “possible source of of information” (Tracey and Goldberg, 2023, p. 4)
  • “For example, many vulnerabilities were hidden behind feature flags while new functionality was under development, and therefore could not execute in normal Firefox builds. In our analysis, these vulnerabilities would be introduced when some C++ code was committed that, when executed, could cause an exploit to succeed—not the commit that flipped the feature flag that allowed the vulnerable code to execute.” (Tracey and Goldberg, 2023, p. 4)
  • “we opted for a manual review of the issues in question; i.e., we read the relevant information on the issue tracker, understood what the vulnerability consisted of, then went through the history of the files until we found a commit that introduced the vulnerability from the perspective of a code reviewer.” (Tracey and Goldberg, 2023, p. 4)
  • “If the learning curve for a project has a more negative slope than the learning curve of another project, it indicates that contributors to this project more quickly learn to avoid adding vulnerabilities than the other project.” (Tracey and Goldberg, 2023, p. 5)
  • “In Rust, there exists a notion of “soundness”, which is a guarantee that all code, excluding lines explicitly annotated as “unsafe”, cannot have undefined behavior [27]. That is, soundness implies that if some Rust code does not make use of the unsafe keyword, and it compiles, then the code is well defined. Soundness bugs, then, are bugs that allow this guarantee to be violated in some way—e.g., a bug that allows some safe code written in a particular way to cause a null pointer to dereference somewhere. In the Rust ecosystem, soundness bugs are largely treated as security vulnerabilities, as they can violate the memory safety of the language, which can in turn be used to exploit traditional memory safety vulnerabilities.” (Tracey and Goldberg, 2023, p. 6)
  • “Because most C++ vulnerabilities were memory-safety vulnerabilities (at least 87% in our data), and Rust is a memory-safe language, it is expected that most C++ vulnerabilities would not translate into identical Rust vulnerabilities.” (Tracey and Goldberg, 2023, p. 7)
  • “Six vulnerabilities involved some form of direct interaction with C++ code (1557208, 1577439, 1593865, 1614971, 1622291, 1758223), functionality known as Foreign Function Interfacing (FFI). In every case, this was caused by a race condition with C++ threads.” (Tracey and Goldberg, 2023, p. 7)
  • “SZZ attributed a total of 130 commits over 43 issues, with a maximum of 14 commits attributed to one issue, versus 77 commits over 54 issues from our manual review, with a maximum of 4 commits attributed to one issue.” (Tracey and Goldberg, 2023, p. 7)
  • “The y-intercepts of the two learning curves imply that for Oxidation projects, a first-time contributor to a C++ project was approximately 70 times as likely to introduce a vulnerability as a first-time contributor to an equivalent Rust project.” (Tracey and Goldberg, 2023, p. 8)

summary

The authors applied a lot of reasoning on the question which security vulnerabilities occured and how commits contributed to the issue at hand. They put a lot of thought into comprehending the empirical data and categorizing it properly. But it is really not a significant sample size and the central 70-times statement is a strong claim which I don’t think is backed by sufficient data.

The publicly available data deserved special recognition.

High-speed Instruction-set Coprocessor for Lattice-bas… §

Titel: “High-speed Instruction-set Coprocessor for Lattice-based Key Encapsulation Mechanism: Saber in Hardware” by Sujoy Sinha Roy, Andrea Basso [url] [dblp]
Veröffentlicht etwa 2020 in CHES 2020 und ich habe es gelesen etwa 2020-11
Zusammenfassung: In this paper, we present an instruction set coprocessor architecture for lattice-based cryptography and implement the module lattice-based post-quantum key encapsulation mechanism (KEM) Saber as a case study. To achieve fast computation time, the architecture is fully implemented in hardware, including CCA transformations. Since polynomial multiplication plays a performance-critical role in the module and ideal lattice-based public-key cryptography, a parallel polynomial multiplier architecture is proposed that overcomes memory access bottlenecks and results in a highly parallel yet simple and easy-to-scale design. Such multipliers can compute a full multiplication in 256 cycles, but are designed to target any area/performance trade-offs. Besides optimizing polynomial multiplication, we make important design decisions and perform architectural optimizations to reduce the overall cycle counts as well as improve resource utilization.

open questions

quotes

summary

A good read with appropriate comparisons with other schemes and implementations. Reasonable arguments are provided for design decisions. The runtime results were expectable and have been met.

Instruction                             Cycle Count
Keygen Encapsulation Decapsulation
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
SHA3-256 339 585 303
SHA3-512 0 62 62
SHAKE-128 1,461 1,403 1,403
Vector sampling 176 176 176
Polynomial multiplications 2,685 3,592 4,484
(i.e. 47% of column) (i.e. 54%) (i.e. 56%)
Remaining operations 792 800 1,606
Total cycles 5,453 6,618 8,034
Total time at 250 MHz 21.8 μs 26.5 μs 32.1 μs

typo

Historical Notes on the Fast Fourier Transform §

Titel: “Historical Notes on the Fast Fourier Transform” by W Lewis, Peter D Welch [url] [dblp]
Veröffentlicht etwa 1967 in IEEE 1967 und ich habe es gelesen etwa 2020-03
Zusammenfassung: The fast Fourier transform algorithm has a long and interesting history that has only recently been appreciated. In this paper, the contributions of many investigators are described and placed in historical perspective.

summary

Horizontal side-channel vulnerabilities of post-quantu… §

Titel: “Horizontal side-channel vulnerabilities of post-quantum key exchange protocols” by Aydin Aysu, Youssef Tobah, Mohit Tiwari, Andreas Gerstlauer, Michael Orshansky [url] [dblp]
Veröffentlicht etwa 2018 in HOST 2018 und ich habe es gelesen etwa 2020-09
Zusammenfassung: Key exchange protocols establish a secret key to confidentially communicate digital information over public channels. Lattice-based key exchange protocols are a promising alternative for next-generation applications due to their quantum-cryptanalysis resistance and implementation efficiency. While these constructions rely on the theory of quantum-resistant lattice problems, their practical implementations have shown vulnerability against sidechannel attacks in the context of public-key encryption or digital signatures. Applying such attacks on key exchange protocols is, however, much more challenging because the secret key changes after each execution of the protocol, limiting the side-channel adversary to a single measurement.

quotes

summary

The research was okay, but this paper does also not show any unexpected results. I liked the paper's brevity. In essence, it is the application of a horizontal DPA on Frodo and NewHope KEMs, but the evaluation on a resource-constrained devices is more specific. It is also unusual to see the evaluation on custom built hardware/FPGA.

  • It is awkward that Park does not have an appropriate citation (i.e. “[10]”) in section 3. The formulation of the error distribution (“Rényi divergence …”) is wonderful.
  • page 85, top left: “to extract one key bit at a time” … of course, you extract bits, but this is only the special case for bytes, which is the common/expected hypothesis with the Hamming weight. So either you did something unusual here (→ remarks needed) or “bits” should be “bytes” here. Also in general, the first two sentences are phrased difficult to comprehend (“intermediate sum […] is 0” → no, its initial value is zero).
  • page 85, right bottom: the bound check text is awkward. I don't get it. Algorithm 1 shows a non-constant time implementation. How did you achieve it and parallel computation does not make it constant-time.
  • Citation “[11, Appendix A]” is just wrong. Appendix A does not show a zero-value attack. Did you mean some attacks from the “Exploring the Vulnerability of R-LWE Encryption to Fault Attacks” paper?
  • section 4.B “Limitations of Our Attack” mentions NTT as “alternative method to implement polynomial multiplication”, but actually it is the default for NewHope. But my point is that you did not answer the question, whether your attack also works on NTTs?

typo

How ISO C became unusable for operating systems develo… §

Titel: “How ISO C became unusable for operating systems development” by Victor Yodaiken [url] [dblp]
Veröffentlicht etwa 2021 in PLOS@SOSP 2021 und ich habe es gelesen etwa 2022-12-25
Zusammenfassung: The C programming language was developed in the 1970s as a fairly unconventional systems and operating systems development tool, but has, through the course of the ISO Standards process, added many attributes of more conventional programming languages and become less suitable for operating systems development. Operating system programming continues to be done in non-ISO dialects of C. The differences provide a glimpse of operating system requirements for programming languages.
Comment: PLOS '21: Proceedings of the 11th Workshop on Programming Languages and Operating Systems October 2021

quotes

  • “The Rationale for the C standard [9] cited C’s capability to function as a "high-level assembler" and explained that "many operations are defined to be how the target machine’s hardware does it rather than by a general abstract rule" but C also has traditional attributes of an ALGOL style programming language.” (Yodaiken, 2021, p. 1)
  • “A common argument (made e.g. by Dietz [1]) is that programmers are wrong: their objections to changes in C semantics embody "a fundamental and pervasive misunderstanding: the compiler [is] not ’reinterpreting’ the semantics but rather [is] beginning to take advantage of leeway explicitly provided by the C standard."” (Yodaiken, 2021, p. 2)
  • “Limitations of ISO C for OS development have been noted in academic literature "Systems or library C codes often cannot be written in standard-conformant C" [20]. and by practitioners e.g. [38].” (Yodaiken, 2021, p. 2)
  • “For example, a well-known security issue in the Linux kernel was produced by a compiler incorrectly assuming a pointer null check was unnecessary ([40] fig. 6) and deleting it as an optimization.” (Yodaiken, 2021, p. 2)
  • “For an example of an implementation of malloc in K&R (page 187) the text explains there is a question about whether "pointers to different blocks ... can be meaningfully compared", something not guaranteed by the standard. The conclusion is "this version of malloc is portable only among machines for which general pointer comparison is meaningful."” (Yodaiken, 2021, p. 2)
  • “Ritchie’s main objection was to a type attribute intended to limit aliasing (two or more active pointers/references addressing the same storage).

    the committee is planting timebombs that are sure to explode in people’s faces. Assigning an ordinary pointer to a pointer to a ‘noalias’ object is a license for the compiler to undertake aggressive optimizations that are completely legal by the committee’s rules, but make hash of apparently safe programs.” (Yodaiken, 2021, p. 3)

  • “Even for something as apparently innocent as for(i=0; i < *b; i++) a[i] = a[i]+ *v; if it is possible that for some i, a+i == v or a+i == b or even a+i = &v so compiled code must reload both values on each iteration of the loop” (Yodaiken, 2021, p. 3)
  • “A year later, C89 imposed type restrictions on access to C objects as a way of facilitating type-based alias analysis (TBAA) (section 3.3 in C89). The basic idea is that ISO C forbade accessing an object of one type via a pointer (or other "left hand side") of a different enough type, with an exception for character pointers which can access everything (sort of as discussed below).” (Yodaiken, 2021, p. 3)
  • “The usual example of TBAA optimization involves "lifting" variables out of loops. For the loop above
    long b1 = *b; long v1 = *v; //lifted
    for(int i= 0; i < b1; i++) a[i]= a[i]+v1);
    is a legal optimization if the type of a[i] doesn’t match the types of *b and *v because the compiler "knows" that those pointers cannot alias objects of a different type.”
    (Yodaiken, 2021, p. 3)
  • “There is no general escape mechanism, even though char pointers are allowed to alias anything2. The absence of escapes is a major change in C’s type system as Brian Kernighan’s critique of Pascal makes clear [12]:

    There is no way [in Pascal] to override the type mechanism when necessary, nothing analogous to the “cast” mechanism in C. This means that it is not possible to write programs like storage allocators or I/O systems in Pascal, because there is no way to talk about the type of object that they return, and no way to force such objects into an arbitrary type for another use.” (Yodaiken, 2021, p. 4)

  • “Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.” (Yodaiken, 2021, p. 4)
  • “This modest view of undefined behavior is not, however, the prevailing one, which is that the compiler can assume undefined behavior is impossible and can optimize on the basis of that assumption. In fact, it is currently argued that the standard interpretation allows implementations to take any action at all, not just for (say) an overflowing execution but for the entire program, if they detect a single feasible instance of undefined behavior.” (Yodaiken, 2021, p. 4)
  • “By C18, the ISO C Standard document included a 10-page, incomplete list of undefined behaviors covering everything from type constraints to syntax errors and synchronization errors. Most C programs contain undefined behavior – certainly every operating system code base does. Perhaps more troubling, as [2] points out, this concept of undefined behavior makes C compilers unstable.” (Yodaiken, 2021, p. 4)
  • “And by 2011, Chris Lattner, the main architect of the Clang/LLVM compilers was echoing Ritchie’s warning [16]:

    To me, this is deeply dissatisfying, partially because the compiler inevitably ends up getting blamed, but also because it means that huge bodies of C code are land mines just waiting to explode. This is even worse because [...] there is no good way to determine whether a large scale application is free of undefined behavior, and thus not susceptible to breaking in the future.” (Yodaiken, 2021, p. 4)

  • “C "ints" are fixed-size sequences of bytes interpreted as 2s complement values that map into the ring ℤ/2𝑘 ℤ where (𝑥∗𝑦)/𝑦 = 𝑥 is not a theorem3. Gcc x86-64 with the optimizer on will reveal that (see the code and compilation [43]) if 𝑥 is an "int" and 𝑥 = 1, 000, 000, 000 then calculating (𝑥 ∗5)/5 directly produces 1, 000, 000, 000 but also 𝑧 = 𝑥 ∗ 5 = 705032704 and then 𝑧/5 = 141006540. The result depends on whether the compiler can recognize the overflows. Paradoxical results are easy to generate.” (Yodaiken, 2021, p. 5)
  • float *f = malloc(sizeof(float));
    *f = 3.14; //1
    int *a = (int *)f
    *a = 4; //2
    (Yodaiken, 2021, p. 5)
  • “So have we converted the object pointed to by f to an int in statement 2? Section 6.5 paragraph 7 says " An object shall have its stored value accessed only by an lvalue expression that has one of the following types", which are limited to compatible types and character types.” (Yodaiken, 2021, p. 5)
  • “The second part of Ritchie’s comment cited above is "and I don’t know how much performance improvement is likely to result." It is difficult to find any documentation of significant performance advantages of any kind of undefined behavior optimization [2].” (Yodaiken, 2021, p. 6)
  • “Lee [17] provides an example of the claimed advantage of undefined behavior for overflow.
    for(int i=0 ; i <= N; i++) a[i] = x+1;
    If 𝑖 and 𝑁 are both 32bit and the target machine is x86-64, it is sometimes assumed that permitting overflow requires a sign extend of i on each iteration.”
    (Yodaiken, 2021, p. 6)
  • “Assuming overflow is impossible allows omitting the sign extend. Sign extension is a fast operation so if the loop is non-trivial, the cost will be lost in the noise.” (Yodaiken, 2021, p. 6)
  • “Alias detection in the abstract is not Turing computable [29] and C pointers make approximate alias detection difficult [14], but there are effective algorithms that can detect most aliasing [7] and it is a design choice to make aliasing optimization rely on assumptions about program code that are not validated.” (Yodaiken, 2021, p. 6)
  • “The CompCert compiler is aimed at control systems that have many of the same properties as operating systems, does not do any undefined behavior based optimization4 (and does not optimize extensively) [19] and has a deterministic semantics [18]:

    The semantics is deterministic and makes precise a number of behaviors left unspecified or undefined in the ISO C standard [...] CompCert generates code that is more than twice as fast as that generated by gcc without optimizations, and competitive with gcc at optimization levels 1 and 2. On average, CompCert code is only 7% slower than gcc -O1 and 12% slower than gcc -O2.” (Yodaiken, 2021, p. 6)

summary

Victor Yodaiken picks up the central statement “ISO C is unusable for operating systems development” and discusses C’s history, optimization, the type system and undefined behavior with some several illustrative examples to justify that statement. Good contextualization.

typo

  • page 3, the example has a trailing right parenthesis
  • page 5, example is missing semicolon to close statement

How Usable are Rust Cryptography APIs? §

Titel: “How Usable are Rust Cryptography APIs?” by Kai Mindermann, Philipp Keck, Stefan Wagner [url] [dblp]
Veröffentlicht etwa 2018 in QRS 2018 und ich habe es gelesen etwa 2021-12
Zusammenfassung: Context: Poor usability of cryptographic APIs is a severe source of vulnerabilities. Aim: We wanted to find out what kind of cryptographic libraries are present in Rust and how usable they are. Method: We explored Rust’s cryptographic libraries through a systematic search, conducted an exploratory study on the major libraries and a controlled experiment on two of these libraries with 28 student participants. Results: Only half of the major libraries explicitly focus on usability and misuse resistance, which is reflected in their current APIs. We found that participants were more successful using rustcrypto which we considered less usable than ring before the experiment. Conclusion: We discuss API design insights and make recommendations for the design of crypto libraries in Rust regarding the detail and structure of the documentation, higher-level APIs as wrappers for the existing low-level libraries, and selected, good-quality example code to improve the emerging cryptographic libraries of Rust.

quotes

summary

A well-written usability paper. The authors thoroughly determine a list of rust cryptographic libraries and their history. Subsequently one author tried to implement an advanced usecase with the rust-crypto, ring, rust-openssl, rust_sodium, and octavo libraries. Then 22 participants with little cryptographic knowledge were asked to implement a basic usecase. The results were expected but can lead to practical improvement suggestions (which happened partially).

typos

How did Dennis Ritchie produce his PhD thesis? A typog… §

Titel: “How did Dennis Ritchie produce his PhD thesis? A typographical mystery” by David F. Brailsford, Brian W. Kernighan, William A. Ritchie [url] [dblp]
Veröffentlicht etwa 2022-11 in und ich habe es gelesen etwa 2023-02
Zusammenfassung:

summary

An interesting look at Dennis Ritchie’s thesis. Dennis Ritchie never got his PhD formally, but finished his thesis where two finalization iterations are known. In this report the authors look at his thesis (“How was the thesis prepared?”) and recognize that his thesis was likely typeset on an IBM 2741, but remarkably some details (grid, 4 versus 4 (symbol without connection at the highest point)) don’t align and show manual intervention.

In defense of PowerPoint §

Titel: “In defense of PowerPoint” by N. Holmes [url] [dblp]
Veröffentlicht etwa 2004 in und ich habe es gelesen etwa 2021-08
Zusammenfassung:

quotes

summary

The argument is “blame the user, not the tool”. Otherwise this article just recites known arguments.

Is rust used safely by software developers? §

Titel: “Is rust used safely by software developers?” by Ana Nora Evans, Bradford Campbell, Mary Lou Soffa [url] [dblp]
Veröffentlicht etwa 2020-05 in ICSE 20 und ich habe es gelesen etwa 2023-01-08
Zusammenfassung: Rust, an emerging programming language with explosive growth, provides a robust type system that enables programmers to write memory-safe and data-race free code. To allow access to a machine’s hardware and to support low-level performance optimizations, a second language, Unsafe Rust, is embedded in Rust. It contains support for operations that are difficult to statically check, such as C-style pointers for access to arbitrary memory locations and mutable global variables. When a program uses these features, the compiler is unable to statically guarantee the safety properties Rust promotes. In this work, we perform a large-scale empirical study to explore how software developers are using Unsafe Rust in real-world Rust libraries and applications. Our results indicate that software engineers use the keyword unsafe in less than 30% of Rust libraries, but more than half cannot be entirely statically checked by the Rust compiler because of Unsafe Rust hidden somewhere in a library’s call chain. We conclude that although the use of the keyword unsafe is limited, the propagation of unsafeness offers a challenge to the claim of Rust as a memory-safe language. Furthermore, we recommend changes to the Rust compiler and to the central Rust repository’s interface to help Rust software developers be aware of when their Rust code is unsafe.

errors

  • page 2, top, “assume the call we be” should be “assume the call will be”
  • page 10, “we believe the way we segmented crates is makes our analysis representative of the larger Rust ecosystem.” should be “we believe the way we segmented crates is makes our analysis representative of the larger Rust ecosystem.”
  • page 10, “and doe not include crates” should be “and doe not include crates”

quotes

  • “To allow access to a machine’s hardware and to support low-level performance optimizations, a second language, Unsafe Rust, is embedded in Rust. It contains support for operations that are difficult to statically check, such as C-style pointers for access to arbitrary memory locations and mutable global variables. When a program uses these features, the compiler is unable to statically guarantee the safety properties Rust promotes” (Evans et al., 2020, p. 1)
  • “Our results indicate that software engineers use the keyword unsafe in less than 30% of Rust libraries, but more than half cannot be entirely statically checked by the Rust compiler because of Unsafe Rust hidden somewhere in a library’s call chain. We conclude that although the use of the keyword unsafe is limited, the propagation of unsafeness offers a challenge to the claim of Rust as a memory-safe language” (Evans et al., 2020, p. 1)
  • “Operations such as configuring hardware or reading a network socket involve manipulating memory in ways that the compiler cannot guarantee to be safe.” (Evans et al., 2020, p. 1)
  • “This functionality was originally described as “pragmatic safety” [14] when Rust was first introduced, and allows developers to use their own discretion when writing Rust code. Part of the justification for allowing Unsafe Rust code is that uses of unsafe would be easy to locate and audit, and that developers can decide how much unchecked code they are willing to accept in their software.” (Evans et al., 2020, p. 1)
  • “The Rust open source community recently formed a new Rust working group to create a “Unsafe Code Guideline Reference” to help guide developers [36].” (Evans et al., 2020, p. 1)
  • “The majority of unsafe uses in the Rust ecosystem are to call other Rust functions that are marked unsafe. We find that only 22% of these unsafe functions are to external libraries implemented in C, suggesting that a majority of the Unsafe Rust is actually from Rust code where the software developer decided to disable the compiler checks.” (Evans et al., 2020, p. 2)
  • “The ownership mechanism in Safe Rust requires that a unique variable is the owner for every memory location.” (Evans et al., 2020, p. 2)
  • “The definition of memory-safety used by Rust is similar to the one proposed by Szekeres et al. [30].” (Evans et al., 2020, p. 2)
  • “A Rust program is memory-safe if it is free of any memory errors such as dereferencing a null or dangling pointer, reading or writing unaligned pointers, and reading uninitialized memory [34].” (Evans et al., 2020, p. 2)
  • “A developer may directly use Unsafe Rust by creating a code block labeled with the keyword unsafe, which is required for the following operations:

    1. Calling a function marked unsafe, non-Rust external function, or a compiler intrinsic (a function whose implementation is handled specially by the compiler).
    2. Dereferencing a C-style pointer.
    3. Accessing a mutable global variable.
    4. Using inline assembly instructions.
    5. Accessing a field of a union type.” (Evans et al., 2020, p. 3)
  • “An example of a possibly dangerous unsafe function call is the mem::transmute() function used to coerce the contents of an arbitrary memory location into a specific Rust type” (Evans et al., 2020, p. 3)
  • “Using unsafe also makes mutable reference aliasing possible, leading to undefined behavior.” (Evans et al., 2020, p. 3)
  • “For example, a function that sets the length of a vector, Vector.set_length(), will be marked unsafe despite only setting an internal field because if it is called with a parameter larger than the capacity of the internal buffer, future vector operations may access unallocated memory. This example also demonstrates how memory safety errors may occur in Safe Rust code after any use of Unsafe Rust.” (Evans et al., 2020, p. 4)
  • “A trait may be declared unsafe if it contains any unsafe method or its implementations must satisfy an invariant. An example from the Rust standard library is the trait Send, and any type that implements Send declares that it is safe to change the ownership of the type to another thread.” (Evans et al., 2020, p. 4)
  • “The motivation for RQ3 is to understand if interactions with C are the main source of Unsafe Rust. If yes, then most Unsafe Rust can be eliminated by implementing the libraries in Rust.” (Evans et al., 2020, p. 4)
  • “Additionally, to compare crates contributed by the larger Rust community with those developed by members of the core Rust development team and Mozilla Research [32], we analyze the application Servo [3], a web browser engine from Mozilla. Servo, endemic of the larger Rust ecosystem, itself is implemented as a collection of about fifty discrete crates, and together with all its dependencies, compiling Servo involves compiling almost 400 different crates.” (Evans et al., 2020, p. 5)
  • “To generate a call graph, we use the Rust compiler to compile the crate to an intermediate representation of Rust known as MIR (Middle Intermediate Representation). We then use the "control flow graphs" of functions obtained directly from the MIR representation to run a context-sensitive analysis which uses type inference to find the precise functions that can be called. This analysis starts at the leaf terminal nodes of the control flow graph of a function.” (Evans et al., 2020, p. 5)
  • “To obtain a “popular” subset, we fetch the per-crate downloads numbers from crates.io and select the most downloaded crates that account for ninety percent of the downloads from crates.io. These 500 or so crates form a group we call most downloaded. From these most downloaded crates we were able to compile 473 crates.” (Evans et al., 2020, p. 6)
  • “Overall, 29% of crates directly include some sort of Unsafe Rust in them.” (Evans et al., 2020, p. 7)
  • “The number of unsafe blocks per crate is small for the majority of the crates, more than 90% of the crates have fewer than ten unsafe blocks” (Evans et al., 2020, p. 7)
  • “However, the crates for Servo use unsafe functions and unsafe traits two or three times as frequently as general crates, which is consistent with the growing preference in the Rust community that unsafe is encapsulated at a higher level than individual functions. That is, exposing unsafe functions is acceptable as long they are eventually enclosed within a safe interface.” (Evans et al., 2020, p. 8)
  • “On average, across all crates in our dataset, a crate depends on twelve other crates.” (Evans et al., 2020, p. 8)
  • “Among the Rust unsafe calls, 47.6% are calls to the Rust Core Library. Of these unsafe functions in Core, 36.4% are of functions in the ptr module used to manually manage memory through C-style(raw) pointers and 40% are calls of functions that are unsafe wrappers to SIMD instructions and architecture-specific intrinsics.” (Evans et al., 2020, p. 9)
  • “A majority (55%) indicated the use Unsafe Rust for higher performance, with Safe Rust being too restrictive as the next most common reason (40%). The other reasons selected include: the Safe Rust alternative is too verbose or complicated (25%), needed to make the code compile (10%), and faster to write code with Unsafe Rust (5%). Further, respondents provide other reasons, including: implementing fundamental data structures, custom concurrency primitives and system calls, interaction with specialized hardware, and integration with C or other languages.” (Evans et al., 2020, p. 9)
  • “Most respondents (65%) indicated they read the code very carefully, until they convince themselves that the code is correct. Another frequently used technique (55% selected this option) is adding runtime checks to prevent memory corruption. Half of the respondents write more unit tests for the function or method that uses unsafe Rust.” (Evans et al., 2020, p. 9)
  • “having discussions with experienced Rust developers in person or online, reading the documentation and Rust books, creating theoretical proofs, using available test generation, running fuzzing and analysis tools, and using Miri [17].” (Evans et al., 2020, p. 9)
  • “Anecdotally, over 100 unsound uses of unsafe in a popular Rust web framework, Actix [31], were only discovered and partially fixed by the Rust community after an online post by a concerned user [37].” (Evans et al., 2020, p. 10)
  • “Motivated by this example, we propose the following changes to the crates.io interface: i) a new tag or badge for crates that include Unsafe Rust; ii) a dependency tree for each library with the crates that use Unsafe Rust clearly marked; and iii) a list of code reviews for any Unsafe Rust. Previous research established that code review in open source software communities is common and successful in eliminating a large number of errors [7, 26, 27]. Alami et al. [2] find that open source software developers develop a mature attitude to negative feedback and improve their code through a cycle of review, rejection, and improvement.” (Evans et al., 2020, p. 10)
  • “Java is a safe language, but the runtime provides a “backdoor” that permits the circumvention of Java’s safety guarantees to enable high-performance systems-level code. Mastrangelo et al. [22] perform a large-scale analysis of Java bytecode to determine how these unsafe capabilities are used in real world. The authors determine that 25% of the Java code analyzed depends on unsafe Java code. One explanation for why the 25% of the analyzed Java code lacks safety because of use of unsafe API may be that it is not an integral part of the language, like in Rust, and it is not exposed by the java standard libraries. Huang et al. [16] study unsafe crash patterns and implement a bytecode-level transformation that introduces runtime checks to help diagnose and prevent some memory errors caused by the use of the unsafe API.” (Evans et al., 2020, p. 11)

summary

The authors evaluate occurence of unsafe code blocks in popular crates. Specifically, they download roughly 500 crates which account for 90% of crates.io downloads. Somehow, only 473 of these crates were compilable. After compilation into MIR, they look at the call graph and special care needs to be taken for monomorphized code (here a “conservative” and an “optimistic” assumption is taken on unsafe usage in parallel). They discover that 29% of crates use unsafe code blocks and due to dependencies more than 50% of crates are unsafe overall.

The main motivation for unsafe seems to be performance, but a too permissive compiler is another reason. Usage occurs when interacting with hardware, interfacing C software or preventing overhead when implementing data structures. Rust developers are aware of the significance of unsafe blocks and thus pay more attention through code reviews, expert consultation, or even application of MIRI on these blocks. A general tendency occurs that low-level primitives in unsafe blocks is abstracted behind a high-level API yielding unsafe-free crates on top and optimized/system-level crates on the bottom.

Well written paper. Well defined research questions. Though I did not expect from the research topic in general, the paper showed well conducted research.

I was also not convinced that the crates are representative because more popular crates might receive more attention and might have higher performance demands. On the contrary, these crates are used most often and thus represent actual usage. Thus I reject my own argument.

I learned that Java also has an Unsafe API. Specifically, the paper “Use at Your Own Risk: The Java Unsafe API in the Wild” analyzed 86,500 Java bytecode archives and learned that 25% instantiated the Unsafe() API.

Languages and the computing profession §

Titel: “Languages and the computing profession” by N. Holmes [url] [dblp]
Veröffentlicht etwa 2004 in und ich habe es gelesen etwa 2021-09
Zusammenfassung:

quotes

summary

Very shallow reading which proposes an intermediate language. He discusses some properties (specifity, precision, regularity, literality, neutrality) but fails to achieve a coherent requirement analysis.

Learn&Fuzz: Machine learning for input fuzzing §

Titel: “Learn&Fuzz: Machine learning for input fuzzing” by Patrice Godefroid, Hila Peleg, Rishabh Singh [url] [dblp]
Veröffentlicht etwa 2017 in ASE 2017 und ich habe es gelesen etwa 2021-07
Zusammenfassung: Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft’s new Edge browser. We discuss and measure the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of wellformed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.

quotes

summary

Interesting paper. It clearly explains the consecutive steps taken with a focus on the reason why the next step has been done. I would love to see more examples how the {NoSample, Sample, SampleSpace} strategies produce (kinds of) PDF objects in the paper.

However, due to the nature of machine learning it is limited to non-binary PDF objects only. Furthermore, I criticize the reproducibility of the paper since neither the PDF parser executable, the fuzzing output nor the reported bug are publicly accessible and can be verified. I also keep wondering… the “smallest three PDF files of our set of 534 files […] are of size 26 Kb, 33 Kb and 16 Kb” is quite large. Why not a minimal PDF file?

The key result regarding the tension between coverage (→ coverage) and pass rate (→ learning) is intriguing.

Short summary: This paper generates non-binary PDF objects and adds them to small PDF files to create a complete PDF file. The objects are generated with three different machine learning strategies called …

The ML approach uses a recurrent neural network which uses sequences of 100 characters in the training phase. An LSTM model with 2 hidden layers (128 hidden states each) is used. The training used {10, 20, 30, 40, or 50} passes (called epochs). They evaluated the strategy most useful for coverage (“Sample-40e”) and for the pass rate (SampleSpace achieves pass rate between 70%–97% with 50 epochs). With this setup they found one stack overflow bug triggering an infinite recursion in the parser. One needs to point out that other fuzzers  (e.g. SAGE) have been applied to the parser before.

pass rate = For each test execution, we programmatically check (grep) for the presence of parsing-error messages in the PDF-parser execution log. If there are no error messages, we call this test pass otherwise we call it fail. Pass tests corresponds to PDF files that are considered to be well-formed by the Edge PDF parser.

coverage = For each test execution, we measure instruction coverage, that is, the set of all unique instructions executed during that test.

Markup systems and the future of scholarly text proces… §

Titel: “Markup systems and the future of scholarly text processing” by James H. Coombs, Allen H. Renear, Steven J. DeRose [url] [dblp]
Veröffentlicht etwa 1987 in Communications of the ACM und ich habe es gelesen etwa 2024-11-19
Zusammenfassung: Markup practices can affect the move toward systems that support scholars in the process of thinking and writing. Whereas procedural and presentational markup systems retard that movement, descriptive markup systems accelerate the pace by simplifying mechanical tasks and allowing the authors to focus their attention on the content.

error

“For example, contiguous quotation marks are to be separated by “/ , “, which is a “typesetting command that causes TEX to insert a small amount of space”” (Coombs et al., 1987, p. 5)

Should be a backslash.

quotes

  • “Reid deveioped Scribe. freeing authors from formatting concerns and providing them with integrated tools for bibliography and citation management [SO].” (Coombs et al., 1987, p. 1)
  • “Previously. developers worked with the model of scholar as researching and composing author. Recently. however, the dominant model construes the author as typist or, even worse. as typesetter.’ Instead of enabling scholars to perform tasks that were not possible before, today’s systems emulate typewriters.” (Coombs et al., 1987, p. 1)
  • “Those who have access to more powerful systems rarely have the time to learn to exploit them fully, and many find them too complicated to use at all. This is quite understandable, since most text formatters on minicomputers and mainframes were developed under a model that is even more inappropriate than author-as-typist. Written by and for programmers, these systems often require quasi-programming skills, treating authors as programmer-typists.” (Coombs et al., 1987, p. 1)
  • “Thus. we see far more attention paid to keyboards. printers. fonts. displays. graphics. colors. and similar features than to the retrieval and structuring of information or even to the verification of spelling and grammar.” (Coombs et al., 1987, p. 2)
  • “Second, developers and authors have lost sight of the fact that there are two products in the electronic development of a document: the printout and the “source” file. Currently. everything is directed toward producing the printout: the source is a mere byproduct. wasting valuable disk space, useful for little more than producing another printout of the same document.” (Coombs et al., 1987, p. 2)
  • “Actually. Goldfarb ignores presentational markup; SGML distinguishes “natural-language notation” (punctuational markup) from “formatted text notations” (presentational markup).” (Coombs et al., 1987, p. 3)
  • “Procedural markup is characteristically associated with batch text formatters, such as nroff/troff and TFX. Word processors like WordStar, however, supplement their presentational editing commands with “dot commands.”” (Coombs et al., 1987, p. 4)
  • “To summarize, there are six types of document markup, but only three are in full competition: presentational, procedural, and descriptive. Presentational markup clarifies the presentation of a document and makes it suitable for reading. Procedural markup instructs a text formatter to “do X,” for example, skip three lines, in order to create presentational markup. Finally, descriptive markup tells text formatters that “this is an X,” for example, a long quotation. Normally, text formatters treat presentational markup in source files as if it were text; that is, no special processing is performed. Procedural markup, however, is processed according to the rules specified in the documentation for the system; and descriptive markup is usually mapped onto procedural markup. In addition, descriptive markup is well suited for processing by an open set of applications.” (Coombs et al., 1987, p. 6)
  • “it should be clear that typesetting is a skilled task requiring special training in such concepts as typefaces, styles, and sizes; and leading, weighting, kerning, widows, rectos, versos, letterspacing, loose lines, and all the apparatus of professional designers.” (Coombs et al., 1987, p. 8)
  • “In addition, we have raised, somewhat obliquely, issues of what authors view. We have suggested briefly that interfaces that expose or display descriptive markup will help authors focus on content and stay aware of structure. This is clearly an oversimplification. First, not all markup provides significant structural information or, better, information that is important to the author at the moment. Second, even descriptive markup can become so dense as to obscure the text. Furthermore, it might be in the author’s best interests at any particular time to see, for example, the enumeration of items in a list instead of the descriptive markup. Ultimately, we have to conclude there is no simple relationship between viewing format and content orientation. An ideal system would provide authors with the ability to select among a number of different views of a file, and it should be possible to display the markup for some textual elements and to conceal the markup for others.” (Coombs et al., 1987, p. 12)

summary

The paper considers available tools for document preparation / text editing / word processing and extracts classifications for markup. Specifically it extracts the notions of descriptive, procedural, presentational, punctuational, and scribal markup but concludes (in section “Markup theory”) that only three of them are in competition. It proceeds to point out what advantages descriptive markup has over presentational markup.

McBits: Fast Constant-Time Code-Based Cryptography §

Titel: “McBits: Fast Constant-Time Code-Based Cryptography” by Daniel J. Bernstein, Tung Chou, Peter Schwabe [url] [dblp]
Veröffentlicht etwa 2015 in CHES 2013 und ich habe es gelesen etwa 2022-04
Zusammenfassung: This paper presents extremely fast algorithms for code-based public-key cryptography, including full protection against timing attacks. For example, at a 2128 security level, this paper achieves a reciprocal decryption throughput of just 60493 cycles (plus cipher cost etc.) on a single Ivy Bridge core. These algorithms rely on an additive FFT for fast root computation, a transposed additive FFT for fast syndrome computation, and a sorting network to avoid cache-timing attacks.

quotes

  • “To summarize, all of these examples of bitsliced speed records are for small Sboxes or large binary fields, while code-based cryptography relies on medium-size fields and seems to make much more efficient use of table lookups.” (Bernstein et al., 2013, p. 3)
  • “[…] we point out several ways that our decryption algorithm improves upon the algorithm used in [44]: we use an additive FFT rather than separate evaluations at each point (“Chien search”); we use a transposed additive FFT rather than applying a syndrome-conversion matrix; we do not even need to store the syndrome-conversion matrix, the largest part of the data stored in [44]; and we use a simple hash (see Section 6) rather than a constant-weight-word-to-bit-string conversion” (Bernstein et al., 2013, p. 6)
  • “For multipoint evaluation we use a characteristic-2 “additive FFT” algorithm introduced in 2010 [39] by Gao and Mateer (improving upon an algorithm by von zur Gathen and Gerhard in [40], which in turn improves upon an algorithm proposed by Wang and Zhu in [77] and independently by Cantor in [29]), together with some new improvements described below.” (Bernstein et al., 2013, p. 7)
  • “The basic idea of the algorithm is to write f in the form f0(x2−x)+xf1(x2−x) for two half-degree polynomials f0, f1 ∈ Fq[x]; this is handled efficiently by the ‘radix conversion’ described below. This form of f shows a large overlap between evaluating f(α) and evaluating f(α + 1). Specifically, (α + 1)2 − (α + 1) = α2 − α, so
    f(α) = f02 − α) + αf12 − α),
    f(α + 1) = f0(α2 − α) + (α + 1)f1(α2 − α).
    Evaluating both f0 and f1 at α2 − α produces both f(α) and f(α + 1) with just a few more field operations: multiply the f1 value by α, add the f0 value to obtain f(α), and add the f1 value to obtain f(α + 1).” (Bernstein et al., 2013, p. 8)
  • “Consider the problem of computing the vector (∑α rα, ∑α rαα, …, ∑α rααd), given a sequence of q elements rα ∈ Fq indexed by elements α ∈ Fq, where q = 2m. This vector is called a ’syndrome’.” (Bernstein et al., 2013, p. 10)
  • “The transposition principle states that if a linear algorithm computes a matrix M (i.e., M is the matrix of coefficients of the inputs in the outputs) then reversing the edges of the linear algorithm, and exchanging inputs with outputs, computes the transpose of M .” (Bernstein et al., 2013, p. 12)
  • “In particular, since syndrome computation is the transpose of multipoint evaluation, reversing a fast linear algorithm for multipoint evaluation produces a fast linear algorithm for syndrome computation.” (Bernstein et al., 2013, p. 12)
  • “This procedure produced exactly the desired number of operations in Fq but was unsatisfactory for two reasons. First, there were a huge number of nodes in the graph, producing a huge number of variables in the final software. Second, this procedure eliminated all of the loops and functions in the original software, producing a huge number of lines of code in the final software. Consequently the C compiler, gcc, became very slow as m increased and ran out of memory around m = 13 or m = 14, depending on the machine we used for compilation.” (Bernstein et al., 2013, p. 13)
  • “A ‘sorting network’ uses a sequence of ‘comparators’ to sort an input array S. A comparator is a data-independent pair of indices (i, j); it swaps S[i] with S[j] if S[i] > S[j]. This conditional swap is easily expressed as a data-independent sequence of bit operations: first some bit operations to compute the condition S[i] > S[j], then some bit operations to overwrite (S[i], S[j]) with (min {S[i], S[j]}, max {S[i], S[j]}). There are many sorting networks in the literature. We use a standard ‘oddeven’ sorting network by Batcher [4], which uses exactly (m2 − m + 4)2m−2 − 1 comparators to sort an array of 2m elements. This is more efficient than other sorting networks such as Batcher’s bitonic sort [4] or Shell sort [72]. The oddeven sorting network is known to be suboptimal when m is very large (see [2]), but we are not aware of noticeably smaller sorting networks for the range of m used in code-based cryptography.” (Bernstein et al., 2013, p. 14)
  • “Our goals in this paper are more conservative, so we avoid this approach: we are trying to reduce, not increase, the number of questions for cryptanalysts.” (Bernstein et al., 2013, p. 16)
  • “Code-based cryptography is often presented as encrypting fixed-length plaintexts. McEliece encryption multiplies the public key (a matrix) by a k-bit message to produce an n-bit codeword and adds t random errors to the codeword to produce a ciphertext. The Niederreiter variant (which has several well-known advantages, and which we use) multiplies the public key by a weight-t n-bit message to produce an (n − k)-bit ciphertext. If the t-error decoding problem is difficult for the public code then both of these encryption systems are secure against passive attackers who intercept valid ciphertexts for random plaintexts.” (Bernstein et al., 2013, p. 16)
  • “However, this argument relies implicitly on a detailed analysis of how much information the attacker actually obtains through timing. By systematically eliminating all timing leaks we eliminate the need for such arguments and analyses.” (Bernstein et al., 2013, p. 18)
  • “A security proof for Niederreiter KEM/DEM appeared very recently in Persichetti’s thesis [64]. The proof assumes that the t-error decoding problem is hard; it also assumes that a decoding failure for w is indistinguishable from a subsequent MAC failure. This requires care in the decryption procedure; see below.” (Bernstein et al., 2013, p. 18)
  • “Many authors have stated that Patterson’s method is somewhat faster than Berlekamp’s method.” (Bernstein et al., 2013, p. 19)
  • “CFS is a code-based public-key signature system proposed by Courtois, Finiasz, and Sendrier in [31]. The main drawbacks of CFS signatures are large public-key sizes and inefficient signing; the main advantages are short signatures, fast verification, and post-quantum security.” (Bernstein et al., 2013, p. 19)

strong statement

“We have found many claims that NTRU is orders of magnitude faster than RSA and ECC, but we have also found no evidence that NTRU can match our speeds” (Bernstein et al., 2013, p. 5)

summary

The paper describes multiple optimization techniques useful for code-based cryptography. It mentions 400,000 decryptions per second at the 280 security level and 200,000 per second at the 2128 security level. Among the techniques, Horner’s rule, Chien search (Gao-Mateer additive search), and syndrome computation as transpose of multipoint evaluation is mentioned.

Neat paper starting with prior work to describe optimized software implementations for Niederreiter’s cryptosystem as well as the CFS signature scheme. One question discussed is whether bitslicing is worth the effort. Interestingly, the scheme is explained at the end (section 6); not the beginning. The data is described in text and not summarized in a table. I think the main contribution are the optimization techniques in section 3.

Mind your language: on novices' interactions with erro… §

Titel: “Mind your language: on novices' interactions with error messages” by Guillaume Marceau, Kathi Fisler, Shriram Krishnamurthi [url] [dblp]
Veröffentlicht etwa 2011 in Onward! 2011 und ich habe es gelesen etwa 2021-04
Zusammenfassung: Error messages are one of the most important tools that a language offers its programmers. For novices, this feedback is especially critical. Error messages typically contain both a textual description of the problem and an indication of where in the code the error occurred. This paper reports on a series of studies that explore beginning students' interactions with the vocabulary and source-expression highlighting in DrRacket. Our findings demonstrate that the error message significantly fail to convey information accurately to students, while also suggesting alternative designs that might address these problems.

quotes

summary

Guillaume Marceau, Kathi Fisler, and Shriram Krishnamurthi are contributors to the Racket programming language. They evaluated potential problems with the error messages of DrRacket, an IDE for Racket. They point out that “the DrRacket development team put considerable effort into the design of the error messages. They carefully considered both form and terminology, and refined the messages many times over the years based on their observations of students.” They implemented a user study to evaluate problems: “We logged students’ programming sessions and explored the effectiveness of the error messages at helping students make progress.”

Fundamentally, error messages in DrRacket are presented with a highlighted code section. The error message often follows the same structure. The study was done as part of one of the first lab session of university’s introductory course for novice programmers. They collected data from 60 students (out of 120, self-selected via consent to participate in our study) in the course. Lab sessions ran 50 minutes per week for 6 weeks. “To analyze this data, two experienced instructors independently assessed the extent to which individual edits addressed the reported problems.”

An interesting observation was that terminology like “function body” was not familiar to users. In the end, a lack of familiarity with the vocabulary used in the error message was the main result of the paper.

Their approach of student interviews, vocabulary quizzes, and collaboration/interviews with programming teachers seems profound and the sample size large enough (a distinction between universities would not have been necessary in illustrations though, since data points are similar).

I agree with the results except for Recommendation principle 1: “Error messages should not propose solutions”. To the best of my understanding, the current rust compiler shows that the opposite is true. However, this only works because the compiler can deduce enough information about the written program to suggest correct solutions. This seems to be a difference to Racket.

I also have personal objections to the simplified vocabulary in Table 5, because it simplifies language such that important distinctions cannot be made to uniquely identify the issue. e.g. “Selector” versus “Constructor” (instead of a simplified “function”) can help to identify the root cause. But this opinion is unsupported whereas the data of the paper shows that such simplification might be useful.

NTRU: A ring-based public key cryptosystem §

Titel: “NTRU: A ring-based public key cryptosystem” by Jeffrey Hoffstein, Jill Pipher, Joseph H. Silverman [url] [dblp]
Veröffentlicht etwa 1998 in ANTS 1998 und ich habe es gelesen etwa 2021-11
Zusammenfassung: W e describe N T R U , a new public key cryptosystem. N T R U features reasonably short, easily created keys, high speed, and low memory requirements. NTR.U encryption and decryption use a mixing system suggested by polynomial algebra combined with a clustering principle based on elementary probability theory. The security of the N T R U cryptosystem comes from the interaction of the polynomial mixing system with the independence of reduction modulo two relativelyprime integersp and q.

quotes

summary

Very fundamental, important paper. NTRU has the unique property of a lattice scheme switching between moduli (“polynomial mixing system” or “lifting operation”). It is interesting to see that they saw RSA and GGH as competitors for NTRU. I think they did a very good job looking at its security analysis, but I didn't look into the details.

New directions in cryptography §

Titel: “New directions in cryptography” by W. Diffie, M. Hellman [url] [dblp]
Veröffentlicht etwa 1976 in Invited paper, IEEE Transactions on Information Theory, Volume 22, Issue 6 und ich habe es gelesen etwa 2021-05
Zusammenfassung:

definitions

quotes

summary

A historically important document. The paper has a SoK-style content giving a look at contemporary asymmetric cryptography in 1976. I liked the high-level overview and the collection of established vocabulary. I only missed to understand the math related to “kn = 5000” after parameterizing Leslie Lamport's system.

It was very interesting to see his discussion of the relation to complexity theory. In particular, I have never seen this written in such clear words: “As systems whose strength had been so argued were repeatedly broken, the notion of giving mathematical proofs for the security of systems fell into disrepute and was replaced by certification via crypanalytic assault.”

“We stand today on the brink of a revolution in cryptography” is Diffie-Hellman's first statement which is justified by the last sentence of the paper: “We hope this will inspire others to work in this fascinating area in which participation has been discouraged in the recent past by a nearly total government monopoly”.

typo

Number "Not" Used Once - Key Recovery Fault Attacks on… §

Titel: “Number "Not" Used Once - Key Recovery Fault Attacks on LWE Based Lattice Cryptographic Schemes” by Prasanna Ravi, Shivam Bhasin, Anupam Chattopadhyay [url] [dblp]
Veröffentlicht etwa 2018 in COSADE 2019 und ich habe es gelesen etwa 2020-05
Zusammenfassung: This paper proposes a simple single bit flip fault attack applicable to several LWE (Learning With Errors Problem) based lattice based schemes like KYBER, NEWHOPE, DILITHIUM and FRODO which were submitted as proposals for the NIST call for standardization of post quantum cryptography. We have identified a vulnerability in the usage of nonce, during generation of secret and error components in the key generation procedure. Our fault attack, based on a practical bit flip model (single bit flip to very few bit flips for proposed parameter instantiations) enables us to retrieve the secret key from the public key in a trivial manner. We fault the nonce in order to maliciously use the same nonce to generate both the secret and error components which turns the LWE instance into an exactly defined set of linear equations from which the secret can be trivially solved for using Gaussian elimination.

ambiguity

What is z in the provided interval in section 2.6? Seems to be either an arbitrary integer or z of Algorithm 1 (NewHope)

error

“later converted to the NTT domain using the PolyBitRev function” … no, it is a separate step

inconsistency

quotes

summary

Short paper, tiny errors, no practical implementation (how successful can the attack on Frodo be implemented?) (correction: unlike the paper, their presentation slides show evaluation data), the structure of the text is beautiful and it is well-written. I particularly enjoyed the summary of weak LWE instances. Masking is also another countermeasure, but is heavy-weight compared to the proposed error-correcting codes

typo

The aforementioned fault vulnerabilities associated with the nonce only occur due to the implementation strategies used and hence can we believe can be easily corrected albeit with a small performance overhead.

⇒ hence can, we believe, can be easily corrected

On the Security of Password Manager Database Formats §

Titel: “On the Security of Password Manager Database Formats” by Paolo Gasti, Kasper B. Rasmussen [url] [dblp]
Veröffentlicht etwa 2012 in ESORICS 2012 und ich habe es gelesen etwa 2021-05
Zusammenfassung: Password managers are critical pieces of software relied upon by users to securely store valuable and sensitive information, from online banking passwords and login credentials to passport- and social security numbers. Surprisingly, there has been very little academic research on the security these applications provide.

question

Why does Definition 2 (IND-CDBA security) include 1/2 + negl(κ) as threshold whereas Definition 3 (MAL-CDBA security) uses negl(κ)?

Game 1 (which Definition 2 relies upon) asks to decide upon one binary value (⇒ 50% with guessing strategy). Game 2 (which Definition 3 relies upon) asks to create a new, valid DB (⇒ 0% probability if guessing is your strategy).

quotes

summary

Well written paper with a clear agenda. It is perfectly structured and can be followed easily.

The paper defines two notions of security and evaluates password managers against this notion. The notion is well defined, but the practicality is debatable:

  • “We argue that MAL-CDBA security (together with IND-CDBA security) is an appropriate security notion for a password manager database format in practice.”
  • “We defined two realistic security models, designed to represent the capabilities of real-world attacks.”

I don't think so. Adding entries to the password manager do have a different impact on security than actually replacing existing entries. To retrieve the actual password, the adversary has to set up a typo-squatting domain and than replace the defined URL with the malicious one. Such a scenario is discussed in the paper, was evaluated and is practical, but the security notion does not distinguish between modification and extension. A pure extension attack has a much more limited impact.

I enjoyed the proper discussion of related work to security notions and formal definition of password managers. As of 2021, some results are deprecated (e.g. Internet Explorer does not exist anymore) and some password manager are not maintained anymore (e.g. PINs). Recommendable for everyone to see how the database formats are defined.

typo

On the criteria to be used in decomposing systems into… §

Titel: “On the criteria to be used in decomposing systems into modules” by David Lorge Parnas [url] [dblp]
Veröffentlicht etwa 1972 in CACM, Volume 15, 1972 und ich habe es gelesen etwa 2021-03
Zusammenfassung: This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time. The effectiveness of a "modulariza tion11 is dependent upon the criteria used in dividing the system into modules. Two system design problems are presented, and for each, both a conventional and unconventional decomposition are described. It is shown that the unconventional decompositions have distinct advantages for the goals outlined. The criteria used in arriving at the decomposi tions are discussed. The unconventional decomposition, if implemented with the conventional assumption that a module consists of one or more subroutines, will be less efficient in most cases. An alternative approach to implementation which does not have this effect is sketched.

quotes

summary

First, someone needs to get familiar with KWIC (recognize the paper reference in the section below). KWIC felt like an arbitrary index someone came up with. I got confused by phrases like “the characters are packed four to a word”, which make little sense outside the index context of a book. But after reading the paper, I looked up the Wikipedia article and learned about its usecase (index of keywords before full-text search was available or in print). The paper is considered an ACM Classic and he got high praise for it.

Essentially, first I had to understand the setting when this paper was written. I grew up with data encapsulation in object-oriented programming, local scoping in programming languages and manipulating data behind a pointer was already a foreign, dangerous idea. The setting is the transition of assembler language towards more high-level languages with questions regarding information hiding arising.

In modular design, his double dictum of high cohesion within modules and loose coupling between modules is fundamental to modular design in software. However, in Parnas's seminal 1972 paper On the Criteria to Be Used in Decomposing Systems into Modules, this dictum is expressed in terms of information hiding, and the terms cohesion and coupling are not used. He never used them.
Wikipedia: David Parnas

I would define a module as a set of functionality (independent of representation in a programming language). High cohesion within modules and loose coupling between modules is a defining criterion for a good programmer. What I consider an open question, but often triggers bugs is the missing documentation for the interface between modules. Often a data structure transfers the data from one module to another. An informal description often triggers different expectations regarding the content of the data structure.

Back to the paper, it illustrates the decomposition of a system by two exemplary modularizations. Whereas the first decomposition was created along the major steps of the processing routines, the second decomposition was created with information hiding in mind. Then several recommendable criteria for decompositions are mentioned:

  1. A data structure, its internal linkings, accessing procedures and modifying procedures are part of a single module.
  2. The sequence of instructions necessary to call a given routine and the routine itself are part of the same module
  3. The formats of control blocks used in queues in operating systems and similar programs must be hidden within a “control block module.”
  4. Character codes, alphabetic orderings, and similar data should be hidden in a module for greatest flexibility
  5. The sequence in which certain items will be processed should (as far as practical) be hidden within a single module

In the end, I think the paper advocates a clean style which (in some sense and with some limitations) is still true today (e.g. web frameworks heavily limit the way you can define your architecture). I recommend every programmer to reflect about possible decompositions of a system, because the most intuitive one might not be the best. The notion of a flowchart approach being the sequential one, is however awkward and foreign to me.

PDF/A considered harmful for digital preservation §

Titel: “PDF/A considered harmful for digital preservation” by Marco Klindt [url] [dblp]
Veröffentlicht etwa 2017 in iPRES 2017 und ich habe es gelesen etwa 2020-12
Zusammenfassung: Today, the Portable Document Format (PDF) is the prevalent file format for the exchange of fixed content electronic documents for publication, research, and dissemination work in the academic and cultural heritage domains. Therefore it is not surprising that PDF/A is perceived to be an archival format suitable for digital archiving workflows.

clarification needed

errors

“The textual markup of Markdown variants is machine actionable while being human friendly to read at the same time. It is suitable for structured texts (including lists and tables) where the exact layout is not as important. Markdown is not well suited for validation.”

quotes

Interesting:

Well put:

summary

This paper summarizes some shortcomings why/how PDF/A is not the final solution for long-term archival of documents.

PDF features not available in PDF/A (list from external resource):

  • Embedded video and audio files
  • encryption
  • transparency
  • Javascript
  • executable files
  • functions
  • external links
  • layers

I would summarize those arguments as lack of annotations in the PDF to make PDFs accessible for machines and people with visual disabilities. This makes it difficult to index and retrieve document information as part of a large corpus (→ database). The problems boil down to the design of PDF, which allows multiple ways of encoding text information, which might loose e.g. reading order information. Refer to Table 2, for example. In the end, the tooling support is not great, but a lack of alternatives can be identified. However, if we consider information extraction capabilities as more important than (e.g. font embedding and reproducible layout), some alternatives are mentioned in section 5.1 (e.g. HTML/CSS or ODF/OOXML). One intriguing analogy is given in the paper: Musical typesetting can be done with PDF as output, but retrieving information about the music is very difficult.

In the conclusion the author admits that the PDF/A authors were aware of its shortcomings: “The intent was not to claim that PDF-based solutions are the best way to preserve electronic  documents. PDF/A simply defines an archival profile of PDF that is more amenable to long-term preservation than traditional PDF”

In the end, the paper is a summary which does not provide any solution as pointed out by the author. As a critic of Markdown, I am saddened to see that Markdown was even mentioned (but other markup languages are neglected).

Parsing Expression Grammars: A Recognition-Based Synta… §

Titel: “Parsing Expression Grammars: A Recognition-Based Syntactic Foundation” by Bryan Ford [url] [dblp]
Veröffentlicht etwa 2004 in POPL 2004 und ich habe es gelesen etwa 2025-11-23
Zusammenfassung: For decades we have been using Chomsky’s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs. Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machineoriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place. Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice. PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components. A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing. While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power.

quotes

  • “{s ∈a* | s = (aa)n} is a generative definition […] In contrast, {s ∈a* | s = (|s| mod 2 = 0)} is a recognition-based definition of the same language” (Ford, p. 1)
  • “While most language theory adopts the generativeparadigm, most practical language applications in computer science involvethe recognition and structural decomposition, or parsing,of strings. Bridging the gap from generativedefinitions to practical recognizers is the purpose of our ever-expanding library of parsing algorithms with diverse capabilities and trade-offs[9].” (Ford, p. 1)
  • “Ambiguity in CFGs is difficult to avoid even when we want to, and itmak es general CFG parsing an inherently super-linear-time problem [14, 23].” (Ford, p. 1)
  • “PEGs havefarmore syntactic expressiveness than the LL(k) language class typically associated with top-down parsers, however,and can express all deterministic LR(k) languages and manyothers, including some non-context-free languages.” (Ford, p. 2)
  • “The primary contribution of this work isto provide language and protocol designers with anewtool for describing syntax that isboth practical and rigorously formalized. Asecondary contribution isto render this formalism more amenable to further analysis by proving its equivalence totwosimpler formal systems, originally named TS (“TMG recognition scheme”) and gTS (“generalized TS”) by Alexander Birman [4, 5],in reference to an early syntax-directed compiler-compiler.These systems were later called TDPL (“TopDown Parsing Language”) and GTDPL (“Generalized TDPL”) respectively by Aho and Ullman [3]. By extension we provethat with minor caveats TS/TDPL and gTS/GTDPL are equivalent in recognition power,an unexpected result contrary to prior conjectures [5].” (Ford, p. 2)
  • “The expression ‘a* a’ for example can never match any string.” (Ford, p. 2)
  • “If the definition of Grammar in Figure 1 did not reference EndOfFile at the end, then anyASCII file starting with at least one correct Definition would be interpreted as a“correct” grammar,even ifthe file has unreadable garbage at the end.” (Ford, p. 4)
  • “We desugar an and-predicate &e to !(!ed),where ed is the desugaring of e.” (Ford, p. 5)
  • “parsing expression languages (PELs), the class of languages that can be expressed by PEGs.” (Ford, p. 5)
  • “PELs are closed under union, intersection, and complement. It is undecidable in general whether a PEG represents a nonempty language, or whether two PEGs represent the same language.” (Ford, p. 5)
  • “The classic example language anbncn is not context-free, but we can recognize it with a PEG G = (({A, B, D}, {a, b, c}, R, D), where R contains the following definitions:

    A <- a A b / ε
    B <- b B c / ε
    D <- &(A !b) a* B !.
    (Ford, p. 6)

summary

Very cool paper. It formalizes a pragmatic recognition-based approach towards parsing, while contrasting it properly with prior work. It does not lack formal proofs to show properties of PEGs/PELs.

Phonological Differences between Japanese and English:… §

Titel: “Phonological Differences between Japanese and English: Several Potentially Problematic Areas of Pronunciation for Japanese ESL/EFL Learners” by Kota Ohata [url] [dblp]
Veröffentlicht etwa 2004 in Asian EFL Journal und ich habe es gelesen etwa 2023-10
Zusammenfassung: Inlight of the fact, that L2 pronunciation errors are often caused by the transfer of well-established L1 sound systems, this paper examines some of the characteristic phonological differences between Japanese and English. Comparing segmental and suprasegmental aspects of both languages, this study also discusses several problematic areas of pronunciation, for Japanese learners of English. Based on such contrastive analyses, some of the implications for L2 pronunciation teaching are drawn.

summary

Japanese and English is compared from a phonological perspective (rhythm, stress, pitch, intonation, segmentals).

Used in 2017 in our phonetics class for our presentation.

For me, the most interesting aspect is Table 3 which illustrates the differences between Japanese and English. But the text is not too long and provides some nice examples.

Good read, recommended!

Piret and Quisquater’s DFA on AES Revisited §

Titel: “Piret and Quisquater’s DFA on AES Revisited” by Christophe Giraud, Adrian Thillard [url] [dblp]
Veröffentlicht etwa 2010 in und ich habe es gelesen etwa 2020-06
Zusammenfassung: At CHES 2003, Piret and Quisquater published a very efficient DFA on AES which has served as a basis for many variants published afterwards. In this paper, we revisit P&Q’s DFA on AES and we explain how this attack can be much more efficient than originally claimed. In particular, we show that only 2 (resp. 3) faulty ciphertexts allow an attacker to efficiently recover the key in the case of AES-192 (resp. AES-256). Our attack on AES-256 is the most efficient attack on this key length published so far.

quotes

summary

Good paper. Ideas are immediate. Not all attacks presented give runtimes, but the (more important) number of faulty ciphertexts is analyzed properly. The original P&Q attack is neat in general.

The original attack from 2003 uniquely identifies the key with 2 faulty ciphertext pairs with probability 98%.

typo

Figure 2 swaps the last SubBytes and ShiftRows operation

Power analysis attack on Kyber §

Titel: “Power analysis attack on Kyber” by Alexandre Karlov, Natacha Linard de Guertechin [url] [dblp]
Veröffentlicht etwa 2021-09 in und ich habe es gelesen etwa 2021-09
Zusammenfassung: This paper describes a practical side-channel power analysis on CRYSTALSKyber key-encapsulation mechanism. In particular, we analyse the polynomial multiplication in the decapsulation phase to recover the secret key in a semi-static setting. The power analysis attack was performed against the KYBER512 implementation from pqm4 [1] running on STM32F3 M4cortex CPU.

errors

quotes

summary

Idea is immediate (CPA on multiplication in decryption step, impl by pqm4, ChipWhisperer Pro with CW308 UFO and STM32F3). The scientific contribution is the evaluation data. A number of 200 traces suffices which is nice. The paper writing is very preliminary (e.g. definition of B in Bη missing, definition of semi-static setting, etc).

Practical CCA2-Secure and Masked Ring-LWE Implementati… §

Titel: “Practical CCA2-Secure and Masked Ring-LWE Implementation” by Tobias Oder, Tobias Schneider, Thomas Pöppelmann [url] [dblp]
Veröffentlicht etwa 2018 in TCHES 2018 und ich habe es gelesen etwa 2020-10
Zusammenfassung: During the last years public-key encryption schemes based on the hardness of ring-LWE have gained significant popularity. For real-world security applications assuming strong adversary models, a number of practical issues still need to be addressed. In this work we thus present an instance of ring-LWE encryption that is protected against active attacks (i.e., adaptive chosen-ciphertext attacks) and equipped with countermeasures against side-channel analysis. Our solution is based on a postquantum variant of the Fujisaki-Okamoto (FO) transform combined with provably secure first-order masking. To protect the key and message during decryption, we developed a masked binomial sampler that secures the re-encryption process required by FO. Our work shows that CCA2-secured RLWE-based encryption can be achieved with reasonable performance on constrained devices but also stresses that the required transformation and handling of decryption errors implies a performance overhead that has been overlooked by the community so far. With parameters providing 233 bits of quantum security, our implementation requires 4,176,684 cycles for encryption and 25,640,380 cycles for decryption with masking and hiding countermeasures on a Cortex-M4F. The first-order security of our masked implementation is also practically verified using the non-specific t-test evaluation methodology.

quotes

summary

iae

typo

Practical Evaluation of Masking for NTRUEncrypt on ARM… §

Titel: “Practical Evaluation of Masking for NTRUEncrypt on ARM Cortex-M4” by Thomas Schamberger, Oliver Mischke, Johanna Sepulveda [url] [dblp]
Veröffentlicht etwa 2019 in COSADE 2019 und ich habe es gelesen etwa 2021-10
Zusammenfassung: To protect against the future threat of large scale quantum computing, cryptographic schemes that are considered appropriately secure against known quantum algorithms have gained in popularity and are currently in the process of standardization by NIST. One of the more promising so-called post-quantum schemes is NTRUEncrypt, which withstood scrutiny from the scientific community for over 20 years.

quotes

summary

A good read and average academic paper.

This paper a masking scheme for NIST submission round 1 NTRUEncrypt variants NTRU-443 and NTRU-743. They discuss 4 proposed countermeasures (1 of them is masking) and conclude that two of the countermeasures should only be applied in combination with masking. They go on to mask polynomial multiplication per coefficient in the decryption as m = (f * (e + masks) - (f * masks)) = f * e. Then they evaluate the countermeasure on a STM32F303RCT7 ARM Cortex-M4 microcontroller mounted on the NewAE CW308 UFO board.

I think they could have skipped trivial figures 4 and 5 and instead argue in more detail why the masking only has to be applied to the multiplication but not other parts.

Region-Based Memory Management in Cyclone §

Titel: “Region-Based Memory Management in Cyclone” by Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, James Cheney [url] [dblp]
Veröffentlicht etwa 2002-06 in LPDI 2002 und ich habe es gelesen etwa 2023-05-14
Zusammenfassung: Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline. The design incorporates several advancements, including support for region subtyping and a coherent integration with stack allocation and a garbage collector. To support separate compilation, Cyclone requires programmers to write some explicit region annotations, but a combination of default annotations, local type inference, and a novel treatment of region effects reduces this burden. As a result, we integrate C idioms in a region-based framework. In our experience, porting legacy C to Cyclone has required altering about 8% of the code; of the changes, only 6% (of the 8%) were region annotations.

quotes

  • “Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline.” (Grossman et al., p. 1)
  • “to reduce the need for type casts, Cyclone has features like parametric polymorphism, subtyping, and tagged unions. To prevent bounds violations without making hidden data-representation changes, Cyclone has a variety of pointer types with different compile-time invariants and associated run-time checks.” (Grossman et al., p. 1)
  • “Following the seminal work of Tofte and Talpin [28], the system is region-based : each object lives in one region and, with the exception that a distinguished heap region may be garbage collected, a region’s objects are all deallocated simultaneously.” (Grossman et al., p. 2)
  • “In our experience, porting C code has required altering about 8% of the code, and the vast majority of changes have not been region annotations.” (Grossman et al., p. 2)
  • “Dynamic regions are created with the construct region r{s},wherer is an identifier and s is a statement. The region’s lifetime is the execution of s.Ins, r is bound to aregionhandle, which primitives rmalloc and rnew use to allocate objects into the associated region. For example, rnew(r) 3 returns a pointer to an int allocated in the region of handle r and initialized to 3.” (Grossman et al., p. 2)
  • “Like a declaration block, a dynamic region is deallocated precisely when execution leaves the body of the enclosed statement.” (Grossman et al., p. 2)
  • “Pointers to arrays of unknown size (denoted τ ?) are implemented with extra fields to support bounds-checks, but this design is orthogonal to regions.” (Grossman et al., p. 2)
  • “int*ρ describes a pointer to an int that is in the region whose name is ρ.” (Grossman et al., p. 2)
  • “A handle for a region corresponding to ρ has the type region_t<ρ>.” (Grossman et al., p. 3)
  • “A block labeled L (e.g., L:{int x=0;s}) has name ρL and refers to the stack region that the block creates.” (Grossman et al., p. 3)
  • “We can now give types to some small examples. If e1 has type region_t<ρ> and e2 has type τ ,thenrnew (e1) e2 has type τ *ρ” (Grossman et al., p. 3)
  • “Preventing dangling-pointer dereferences. To dereference a pointer, safety demands that its region be live.” (Grossman et al., p. 3)
  • “Functions in Cyclone are region polymorphic; they can abstract the actual regions of their arguments or results. That way, functions can manipulate pointers regardless of whether they point into the stack, the heap, or a dynamic region.” (Grossman et al., p. 3)
  • “The ? is Cyclone notation for a pointer to a dynamically sized array.” (Grossman et al., p. 3)
  • “Because struct definitions can contain pointers, Cyclone allows these definitions to be parameterized by region names.” (Grossman et al., p. 3)
  • “To solve this problem, we observe that if the region corresponding to ρ1 outlives the region corresponding to ρ2, then it is sound to use a value of type τ *ρ1 whereweexpect one of type τ *ρ2. Cyclone supports such coercions implicitly. The last-in-first-out region discipline makes such outlives relationships common: when we create a region, we know every region currently alive will outlive it.” (Grossman et al., p. 4)
  • “To ensure soundness, we do not allow casting τ1*ρ to τ2*ρ, even if τ1 is a subtype of τ2, as this cast would allow putting a τ2 in a location where other code expects a τ1.(Thisproblem is the usual one with covariant subtyping on references.)” (Grossman et al., p. 4)
  • “We emphasize that our approach to inference is purely intraprocedural and that prototypes for functions are never inferred. Rather, we use a default completion of partial prototypes to minimize region annotations. This approach permits separate compilation.” (Grossman et al., p. 4)
  • “the compiler deduces an appropriate annotation based on context: 1. For local declarations, a unification-based inference engine infers the annotation from the declaration’s (intraprocedural) uses. This local inference works well in practice, especially when declarations have initializers. 2. Omitted region names in argument types are filled in with fresh region names that are generalized implicitly. So by default, functions are region polymorphic without any region equalities. 3. In all other contexts (return types, globals, type definitions), omitted region names are filled in with ρH (i.e., the heap). This default works well for global variables and for functions that return heap-allocated results. However, it fails for functions like strcpy that return one of their parameters. Without looking at the function body, we cannot determine which parameter (or component of a parameter) the function might return.” (Grossman et al., p. 4)
  • “Cyclone does not have closures, but it has other typing constructs that hide regions. In particular, Cyclone provides existential types [22, 14], which suffice to encode closures [21] and simple forms of objects [5]. Therefore, it is possible in Cyclone for pointers to escape the scope of their regions.” (Grossman et al., p. 5)
  • “To address this problem, the Cyclone type system keeps track of the subset of region names that are considered live at each control-flow point. Following Walker, Crary, and Morrisett [29], we call the set of live regions the capability. To allow dereferencing a pointer, the type system ensures that the associated region name is in the capability. Similarly, to allow a function call, Cyclone ensures that regions the function might access are all live. To this end, function types carry an effect that records the set of regions the function might access.” (Grossman et al., p. 5)
  • “The second major departure from TT is that we do not have effect variables.” (Grossman et al., p. 5)
  • “To simplify the system while retaining the benefit of effect variables, we use a type operator, regions_of(τ ).This novel operator is just part of the type system; it does not existatruntime. Intuitively,regions_of(τ ) represents the set of regions that occur free in τ” (Grossman et al., p. 5)
  • “For typ e variables, regions_of(α) is treated as an abstract set of region variables, much like effect variables. For example, regions_of(α*ρ) = {ρ} ∪ regions_of(α)” (Grossman et al., p. 6)
  • “Cyclone supports existential types, which allow programmers to encode closures.” (Grossman et al., p. 6)
  • “In a separate technical report [15], we have defined an operational model of Core Cyclone, formalized the type system, and proven type soundness.” (Grossman et al., p. 6)
  • “The code generator ensures that regions are deallocated even when their lifetimes end due to unstructured control flow.” (Grossman et al., p. 8)
  • “In this fashion, we ensure that a region is always deallocated when control returns.” (Grossman et al., p. 8)
  • “We took two approaches to porting. First, we changed all the programs as little as possible to make them correct Cyclone programs. Then, for cfrac and mini_httpd,we regionized the code: We made functions more region polymorphic and, where possible, eliminated heap allocation in favor of dynamic region allocation with rnew. We also added compiler-checked “not null” annotations to pointer types where possible to avoid some null checks.” (Grossman et al., p. 8)
  • “There are two interesting results regarding the difficulty of minimal porting. First, the overall changes in the programs are relatively small — less than 10% of the program code needed to be changed. The vast majority of the differences arise from pointer-syntax alterations. These changes are typically easy to make — e.g., the type of strings are changed from char * to char ?. We are currently experimenting with interpreting char * as a safe null-terminated string type by default; doing so allows many fewer changes. The most encouraging result is that the number of region annotations is small: only 124 changes (which account for roughly 6% of the total changes) in more than 18,000 lines of code. The majority of these changes were completely trivial, e.g., many programs required adding ρH annotations to argv so that arguments could be stored in global variables.” (Grossman et al., p. 9)
  • “The cost of porting a program to use dynamic regions was also reasonable; in this case roughly 13% of the total differences were region-related.” (Grossman et al., p. 9)
  • “For the non-web benchmarks (and some of the web benchmarks) the median and mean were essentially identical, and the standard deviation was at most 2% of the mean. The factor columns for the Cyclone programs show the slowdown factor relative to the C versions.” (Grossman et al., p. 10)
  • “we pay a substantial penalty for compute-intensive benchmarks; the worst is grobner, which is almost a factor of three slower than the C version.” (Grossman et al., p. 10)
  • “As Table 3 demonstrates, bounds-checks are also an important component of the overhead, but less than we expected. We found that a major cost is due to the representation of fat pointers. A fat pointer is represented with three words: the base address, the bounds address, and the current pointer location (essentially the same representation used by McGary’s bounded pointers [20]).” (Grossman et al., p. 10)
  • “Many systems, including but certainly not limited to LCLint [10, 9], SLAM [3], Safe-C [2], and CCured [25], aim to make C code safe.” (Grossman et al., p. 10)
  • “As they note, region-based programming in C is an old idea; they contribute language support for efficient reference counting to detect if a region is deallocated while there remain pointers to it (that are not within it). This dynamic system has no apriorirestrictions on regions’ lifetimes and a pointer can point anywhere, so the RC approach can encode more memory-management idioms.” (Grossman et al., p. 11)
  • “Instead, we are currently developing a traditional intraprocedural flow analysis to track region aliasing and region lifetimes.” (Grossman et al., p. 11)

summary

This paper introduces region-based memory management following the seminal work of Tofte and Talpin for the C programming language. A compiler modification is written that takes regions into account to eliminate dereferencing of dangling pointers as compile-time error, memory management through deallocation once the region of the enclosed body statements goes out of scope, and run-time bounds checking with fat pointers for arbitrary-size arrays.

The paper describes the extended type system, notions of subtyping, polymorphism, and “outliving” for regions, revises a soundness proof (provided fully in a technical report), and inference techniques to avoid most region declarations. As a result, only 8% of the code had to changed. Performance has often improved (due to negligible checks), but run-time checks worsen the performance of CPU-intense tasks.

Revisiting “what is a document?” §

Titel: “Revisiting “what is a document?”” by Bernd Frohmann [url] [dblp]
Veröffentlicht etwa 2008-06 in Journal of Documentation, Volume 65 (2009) und ich habe es gelesen etwa 2024-12
Zusammenfassung: Purpose – The purpose of this paper is to provide a reconsideration of Michael Buckland’s important question, “What is a document?”, analysing the point and purpose of definitions of “document” and “documentation”.

quotes

  • “The philosopher’s responsibility is to discover what the meaning of a term should be, in order to contribute to a scientific classification of things, and thereby to bring order to thought and communication.” (Frohmann, 2009, p. 2)
  • “In our example, we began with some remarks (one fibre) on Otlet’s and Briet’s concept of a document, in which the concept of evidence was yoked to the prevailing concept of the document, thereby extending it to include physical objects.” (Frohmann, 2009, p. 8)

summary

A philosophical article mainly contrasting Wittgensteinian arguments with Putnam’s. I don’t feel like any question of mine was answered, but maybe document can be defined as “anything that can serve as evidence”; hence including physical objects as the base article discusses.

I liked the phrasing of philosopher’s responsibility in the article.

SEVurity: No Security Without Integrity §

Titel: “SEVurity: No Security Without Integrity” by Luca Wilke, Jan Wichelmann, Mathias Morbitzer, Thomas Eisenbarth [url] [dblp]
Veröffentlicht etwa 2020 in IEEE Symposium on Security and Privacy 2020 und ich habe es gelesen etwa 2020-06
Zusammenfassung: One reason for not adopting cloud services is the required trust in the cloud provider: As they control the hypervisor, any data processed in the system is accessible to them. Full memory encryption for Virtual Machines (VM) protects against curious cloud providers as well as otherwise compromised hypervisors. AMD Secure Encrypted Virtualization (SEV) is the most prevalent hardware-based full memory encryption for VMs. Its newest extension, SEV-ES, also protects the entire VM state during context switches, aiming to ensure that the host neither learns anything about the data that is processed inside the VM, nor is able to modify its execution state. Several previous works have analyzed the security of SEV and have shown that, by controlling I/O, it is possible to exfiltrate data or even gain control over the VM’s execution. In this work, we introduce two new methods that allow us to inject arbitrary code into SEV-ES secured virtual machines. Due to the lack of proper integrity protection, it is sufficient to reuse existing ciphertext to build a high-speed encryption oracle. As a result, our attack no longer depends on control over the I/O, which is needed by prior attacks. As I/O manipulation is highly detectable, our attacks are stealthier. In addition, we reverse-engineer the previously unknown, improved Xor-Encrypt-Xor (XEX) based encryption mode, that AMD is using on updated processors, and show, for the first time, how it can be overcome by our new attacks.

Intel SGX's role

Intel Software Guard Extensions (SGX) was the first widely available solution for protecting data in RAM. However, it only can protect a small chunk of RAM, not the VM as a whole.

Memory encryption systems for VMs

Comparison: S. Mofrad, F. Zhang, S. Lu, and W. Shi, “A comparison study of intel sgx and amd memory encryption technology,”, 2018

Notes

On virtual memory with virtual machines

On virtualized systems, two different page tables are used. Within the VM, the VA used by the guest, the Guest Virtual Address (GVA), is translated to the Guest Physical Address (GPA). The GPA is the address which the VM considers to be the PA. However, on the host system itself another page table is introduced, to allow multiple VMs to run on the same physical machine. This second page table is called the Second Level Address Translation (SLAT), or Nested Page Table (NPT) [5]. The NPT translates the GPA into the Host Physical Address (HPA), the actual address of the data in physical memory.

summary

Trust in cloud providers

One reason for not adopting cloud services is the required trust in the cloud provider: As they control the hypervisor, any data processed in the system is accessible to them.

Tweakable block ciphers

One popular method for storage encryption are tweakable block ciphers, such as AES-XTS [1], which is, e.g., used in Apple’s FileVault, MS Bitlocker and Android’s file-based encryption. Tweakable block ciphers provide encryption without data expansion as well as some protection against plaintext manipulation. A tweak allows to securely change the behavior of the block cipher, similar to instantiating it with a new key, but with little overhead.

XEX and Xor-Encrypt (XE) are methods to turn a block cipher such as AES into a tweakable blockcipher, where a tweak-derived value is XORed with the plaintext before encryption (and XORed again after encryption in the case of XEX).

typo

“before it continuous its operation” → “before it continues its operation.”

SOK: On the Analysis of Web Browser Security §

Titel: “SOK: On the Analysis of Web Browser Security” by Jungwon Lim, Yonghwi Jin, Mansour Alharthi, Xiaokuan Zhang, Jinho Jung, Rajat Gupta, Kuilin Li, Daehee Jang, Taesoo Kim [url] [dblp]
Veröffentlicht etwa 2021-12 in und ich habe es gelesen etwa 2022-02
Zusammenfassung: Web browsers are integral parts of everyone’s daily life. They are commonly used for security-critical and privacy sensitive tasks, like banking transactions and checking medical records. Unfortunately, modern web browsers are too complex to be bug free (e.g., 25 million lines of code in Chrome), and their role as an interface to the cyberspace makes them an attractive target for attacks. Accordingly, web browsers naturally become an arena for demonstrating advanced exploitation techniques by attackers and state-of-the-art defenses by browser vendors. Web browsers, arguably, are the most exciting place to learn the latest security issues and techniques, but remain as a black art to most security researchers because of their fast-changing characteristics and complex code bases.

quotes

summary

The paper looks at Chrome/Blink/V8, Safari/WebKit/JavaScriptCore, Firefox/Gecko/SpiderMonkey, and Internet Explorer/Trident/Chakra web browsers and their architecture with respect to security requirements. This concerns the rendering, IPC, Site isolation, Javascript engine security as well as web technologies like Same-Origin Policy. In Table 1, one can see many sandboxing/isolation mechanisms in place, especially in Chrome and Firefox. Browser exploitation scenarios and bug classifications are provided in Figure 2. The paper also looks at the history and timeline of bugs and bug classes as well as strategies to detect them. Table 4 provides a very detailed look at mitigations in browsers in relation to time.

In summary, the paper provides a very detailed look at web browser security. It can serve as list of keywords to get started on web security topics. In case you plan to build your own web browser engine, it will help you understand requirements and gives architectural design ideas. But it also gives a historical overview which migitations were deployed as time progressed.

Lessons:

  1. Using memory safe languages is an effective mitigation against memory-safety bugs.
  2. Higher payouts motivate more bug reports.
  3. UAF mitigations are effective towards reducing DOM bug exploits.
  4. Mitigating JS engine bugs is difficult.
  5. UXSS bugs are mostly mitigated by Site Isolation.
  6. Collaborative efforts on mitigations are good.

One controversial aspect:

“Although vendors are trying, they are consistently behind in this arms race. Mitigations from vendors are mostly reactive, which means they are developed long after each wave of attacks. By the time an attack surface is finally closed, attackers have already come up with a better exploit.”

I am not really sure about this statement. I think it is too strong. Did the authors consider how many mitigations were put in place to prevent security exploits in the first place? I don't think credit is provided for these decisions in the statement.

citation 121 = “Safer Usage Of C++” by Google Security

Scribble: Closing the Book on Ad Hoc Documentation Too… §

Titel: “Scribble: Closing the Book on Ad Hoc Documentation Tools” by Matthew Flatt, Eli Barzilay, Robert Bruce Findler [url] [dblp]
Veröffentlicht etwa 2009-09 in ICFP'09 und ich habe es gelesen etwa 2022-01
Zusammenfassung: Scribble is a system for writing library documentation, user guides, and tutorials. It builds on PLT Scheme’s technology for language extension, and at its heart is a new approach to connecting prose references with library bindings. Besides the base system, we have built Scribble libraries for JavaDoc-style API documentation, literate programming, and conference papers. We have used Scribble to produce thousands of pages of documentation for PLT Scheme; the new documentation is more complete, more accessible, and better organized, thanks in large part to Scribble’s flexibility and the ease with which we cross-reference information across levels. This paper reports on the use of Scribble and on its design as both an extension and an extensible part of PLT Scheme.

quotes

summary

Excellent tool which expands some existing concepts from Scribe (2005). Fundamentally the prefix “#lang scribble/doc” dispatches parsing based on the module loaded. In particular S-expressions are considered as building blocks for the language defining typesetting elements. The primary advantage is to support the “program is data” paradigm in the typesetting context. The disadvantage is the tight coupling between the PLT Scheme infrastructure and the document.

Seven deadly sins of introductory programming language… §

Titel: “Seven deadly sins of introductory programming language design” by L. McIver, D. Conway [url] [dblp]
Veröffentlicht etwa 1996 in SEEP '96 und ich habe es gelesen etwa 2023-06-22
Zusammenfassung: We discuss seven undesirable features common to many programming languages used to teach first-time programmers, and illustrate typical pedagogical difficulties which stem from them with examples drawn from the programming languages ABC, Ada, C, C++, Eiffel, Haskell, LISP, Modula 3, Pascal, Prolog, Scheme, and Turing. We propose seven language design (or selection) principles which may reduce the incidence of such undesirable features.

quotes

  • “Scheme has effectively only one data type – the list – and one operation – evaluation of a list. While this abstraction is very simple to explain, and not difficult for the beginner to grasp superficially, it does result in code that is difficult to read because of large numbers of nested parentheses and the absence of other structuring punctuation” (McIver and Conway, 1996, p. 1)
  • “Furthermore, to support this extreme degree of homogeneity, a large number of inbuilt functions are required, many of which are quite sophisticated in their behaviour, and therefore difficult to understand and use correctly (for example: sort vs sortcar in Franz LISP [14]).” (McIver and Conway, 1996, p. 1)
  • “At first glance this approach seems quite reasonable, but two pedagogical problems frequently sabotage it. The first is that textbooks and other reference materials rarely confine themselves to the selected subset. The second is that, even if the textbook does limit itself to the required subset, the compiler almost certainly does not. The result is often worse than if the complete language was taught: students must still contend with the full semantics of the language, but much of it has deliberately not been explained to them!” (McIver and Conway, 1996, p. 2)
  • “Examples of this "creeping featuritis" are easy to cite: C++ provides over 50 distinct operators at 17 levels of precedence, Ada9X has 68 reserved words and over 50 predefined attributes, Modula 3 reserves over 100 keywords, and some commonly-used LISP dialects ([14] for example) define over 500 special functions. Because most textbooks and compilers attempt to cover the full language, novice programmers are forced to contend with all of these features, even if they are not using them.” (McIver and Conway, 1996, p. 2)
  • “Syntactic homonyms (constructs which are syntactically the same, but which have two or more different semantics depending on context) are perhaps a more serious flaw in a language. An extreme example of this3 may be seen in the pedagogically-oriented language Turing, in which the construct A(B) has five distinct meanings:

    • call function A and pass parameter B
    • dereference the pointer B to access an object in collection A
    • access the Bth element of array A
    • construct a set of type A with a single element having the value of B
    • create a one-letter substring of the string A consisting of its Bth character” (McIver and Conway, 1996, p. 2)
  • “Notice that referencing an element of array a with subscript i as in a(i) is notationally equivalent to [the pointer dereference] c(p). This is an example of uniform referents, which means that analogous ways of accessing data should be notationally equivalent. [17]” (McIver and Conway, 1996, p. 2)
  • “The blame for this minor failure can hardly be laid on the novice, who may simply have forgotten (or perhaps never grasped) that ABC lists are automatically sorted on input.” (McIver and Conway, 1996, p. 4)
  • “Likewise, we suggest that an introductory language need only provide a single numeric data type which stores rational numbers with arbitrary precision integer numerators and denominators. The restriction to rationals still allows the educator to discuss the general issue of representational limitations (such as the necessary approximation of common transfinite numbers such as π and e), but eliminates several large classes of common student error which are caused by misapplication of prior mathematical understanding to fixed precision arithmetic” (McIver and Conway, 1996, p. 6)

summary

The authors took a look at the programming languages ABC, Ada, C, C++, Eiffel, Haskell, LISP, Modula 3, Pascal, Prolog, Scheme, and Turing. They identified typical pain points for novice programmers to write correct programs. They broke down the faults into seven deadly sins: “Less is more”, “More is more”, “Grammatical traps”, “Hardware dependence”, “Backwards compatibility”, “Excessive cleverness”, “Violation of Expectations”. Furthermore they gave seven steps towards more teachable languages: “Start where the novice is”, “Differentiate semantics with syntax”, “Make the syntax readable and consistent”, “Provide a small and orthogonal set of features”, “Be careful especially with I/O”, “Provide better error diagnosis”, “Choose a suitable level of abstraction”. They conclude with a list of seven criteria for choosing a suitable introductory language.

It is difficult to judge the applicability of the criteria because of changes since 1996. However, “Violation of Expectations” reminds me of the principle of least astonishment from UX design. The given examples are the strong suite of the paper, but there are only few. In the end I agree with the statements, but the points are pretty generic and I somewhat expected more insights. One thing I learned from this paper is that there should be a somewhat 1:1 correspondence between syntactic and semantic elements.

Seven great blunders of the computing world §

Titel: “Seven great blunders of the computing world” by N. Holmes [url] [dblp]
Veröffentlicht etwa 2002-07 in und ich habe es gelesen etwa 2021-08
Zusammenfassung:

quotes

summary

The article discusses seven topics, the author considers to be solved wrongfully from the technical community.

Even though the mentioned ‘blunders’ are a neat collection of historically decided debates, I don't think the term ‘blunder’ is justified. Especially blunder 4 “Commercial programming” is highly controversial in retrospect and mostly claimed without proper arguments. Blunder 1 “Terminology”, on the other hand, made me reflect on the terms “information” and “data”.

SoK: Eternal War in Memory §

Titel: “SoK: Eternal War in Memory” by L. Szekeres, M. Payer, Tao Wei, Dawn Song [url] [dblp]
Veröffentlicht etwa 2013 in IEEE S&P 2013 und ich habe es gelesen etwa 2020-09
Zusammenfassung: Memory corruption bugs in software written in low-level languages like C or C++ are one of the oldest problems in computer security. The lack of safety in these languages allows attackers to alter the program’s behavior or take full control over it by hijacking its control flow. This problem has existed for more than 30 years and a vast number of potential solutions have been proposed, yet memory corruption attacks continue to pose a serious threat. Real world exploits show that all currently deployed protections can be defeated.

quotes

summary

Well-written paper. Figure 1 provides a nice classification for the different attack on memory vulnerabilities. This is a helpful contributions. The attacks themselves were not new to me. In section 2 C (“Control-flow hijack attack”) it is mentioned that “The instruction pointer can only be updated indirectly by executing an indirect control-flow transfer instruction, e.g., an indirect function call, indirect jump or function return instruction”, but this is not true for ARM platforms where the program counter is addressable. This is one example where the platform is not specified in the paper. One sentence in section 3 reads “One common way of leaking out memory contents is by overwriting the length field of a (e.g., JavaScript) string object before reading it out (in the user script).”, but the reference to CVE 2012-0469 in Table 1 does not refer to a CVE discussing Javascript String length AFAICS. Furthermore some claims are too strong and not sufficiently justified like “In case of most applications, however, this is much less of an issue than runtime performanc”. But these are rare cases. In general, a good read to get an overview over memory vulnerabilities of the previous decades.

typo

SoK: Exploiting Network Printers §

Titel: “SoK: Exploiting Network Printers” by Jens Muller, Vladislav Mladenov, Juraj Somorovsky, Jorg Schwenk [url] [dblp]
Veröffentlicht etwa 2017 in IEEE S&P 2017 und ich habe es gelesen etwa 2020-08
Zusammenfassung: The idea of a paperless office has been dreamed for more than three decades. However, nowadays printers are still one of the most essential devices for daily work and common Internet users. Instead of getting rid of them, printers evolved from simple printing devices to complex network computer systems installed directly in company networks, and carrying lots of confidential data in their print jobs. This makes them to an attractive attack target.

quotes

summary

The depth of evaluation is the strong suit of this paper which seems appropriate considering it is published as SoK. I disliked the use of attacker models. The attacker models were generic, intuitive and (as pointed out) non-exhaustive. I think the better approach for this paper is to describe the general printing process and briefly describe attacker models wherever they are needed. Correspondingly, AirPrint and other technologies shall not be explained in the attacker model section. In section 7.4, the number “524,280 bit” = 219 - 8 bit is wrong. ASCII represents 27 or 28 characters (depending on the definition). A length of 216 gives us 27 · 216 = 223 possibilities.

typo

SoK: Science, Security and the Elusive Goal of Securit… §

Titel: “SoK: Science, Security and the Elusive Goal of Security as a Scientific Pursuit” by Cormac Herley, P. C. van Oorschot [url] [dblp]
Veröffentlicht etwa 2017 in 2017 IEEE Symposium on Security and Privacy und ich habe es gelesen etwa 2020-05
Zusammenfassung: The past ten years has seen increasing calls to make security research more “scientific”. On the surface, most agree that this is desirable, given universal recognition of “science” as a positive force. However, we find that there is little clarity on what “scientific” means in the context of computer security research, or consensus on what a “Science of Security” should look like. We selectively review work in the history and philosophy of science and more recent work under the label “Science of Security”. We explore what has been done under the theme of relating science and security, put this in context with historical science, and offer observations and insights we hope may motivate further exploration and guidance. Among our findings are that practices on which the rest of science has reached consensus appear little used or recognized in security, and a pattern of methodological errors continues unaddressed.

Bacon's scientific approach

Bacon [8] formalized an inductive method of generalizing from observations—his method of observation, generalization and correction is acknowledged as an early articulation of what many historically view as the basic scientific method

Empirism versus formal approaches

Mathematical models have proved enormously useful in various branches of Science. It is worth emphasizing that a mathematical model is judged on the accuracy of its predictions rather than the perceived reasonableness of its assumptions.

There is no confusion on this point when checking predictions against measurements is easy: when there’s a conflict between the model and the data, the data wins (see, e.g., remarks by Feynman and others in Section II-D). However, in disciplines where direct measurement and controlled experiments are hard (e.g., Economics and Security) it can be tempting to seek other forms of validation for a mathematical model.

This is a limitation of any mathematical model: how well a model matches reality can only be tested empirically.

inductive / deductive

Probably the most significant settled point in the philosophy of science is that inductive and deductive statements constitute different types of knowledge claims.

The importance of this distinction has long been known. Versions date back to Plato, who distinguished the messy, physical realm of things observed from the perfect non-physical world of “forms.”

Despite broad consensus in the scientific community, in Security there is repeated failure to respect the separation of inductive and deductive statements. Both types have value, provided their limitations are recognized and they are kept separate.

Kant’s very thorough treatment was influential [5]; he calls statements whose truth is independent
of experience a priori and those that depend on observation a posteriori.

Mill argued essentially the reverse: induction is our only path to understanding the world since, on its own, deduction is incapable of helping us discover anything about the world [9]:

But this is, in fact, to say that nothing ever was, or can be, proved by syllogism, which was not known, or assumed to be known, before.

Ayer, a logical positivist, summarizes [6, p.57]:

the characteristic mark of a purely logical inquiry is that it is concerned with the formal consequences of
our definitions and not with questions of empirical fact.

Is security special?

Biological and military systems must also guarantee robustness in the presence of adversaries (see Forrest et al. [78], [79]). In that many of its discoveries are expressible as laws, Physics is an exception rather than the rule [3, pp.197-208]. The compactness and elegance of the mathematical expression of much of Physics is unmatched in other Sciences

Main points

History/Philosophy of Science

Science of Security

Failures to apply lessons from science

Ways forward: Insights and discussion

summary observations and constructive suggestions following:

Appendix

Popper's criterion

A simplified statement of Popper’s criterion is that scientific theories should be falsifiable [14]:

A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as people often think) but a vice.

Einstein’s Relativity, by contrast, even though it assaulted basic intuitions about time and space, made testable predictions.

Programs as predicates

The origins of Computer Science and Computer Security are mathematical. WWII ciphers and code-breaking are justly revered. Without the accomplishments of modern cryptography, it is doubtful the World-Wide-Web would have attained such influence. This has contributed to the strongly mathematical flavor of our discipline. In 1984, Hoare describes a computer program as [18] “a logical predicate describing all of its permitted behaviors”; and further, that the final design meets the original requirements “can be mathematically proved before starting the implementation of its components.” This view of programs as implementing mathematical predicates, is unsurprising given the importance of the development of algorithms. For an algorithm that involves sorting, searching etc., it is important that its behavior is well-understood under all possible inputs.

Shamir and Schell on scientific security

Shamir, in accepting the 2002 Turing award, described non-crypto security as “a mess.” Schell, in 2001, described the field as being filled with “pseudo-science and flying pigs” [2].

Side-channel attacks reveal the gap between theory and practice

Side-channels continue to epitomize the difficulty of modeling real world problems—raising problems even with definitions.

summary

This paper tackles the difficult question of discussing IT security from a scientific perspective. A demarcation criterion distinguishes between “scientific” and “non-scientific” content. Are any approaches in security scientific at all? Is it an inductive or deductive process? How does security compare to other sciences? Where does security position itself in Pasteur's Quadrant and is falsification an admissible demarcation criterion? What are the conclusions of the JASON report? A thorough discussion followed by summary observations and constructive suggestions is done.

Extensive, well written paper. Nice summary in section 2 “History/Philosophy of Science” which can be debated with any curious mind of any field. The following sections are security specific. Drawing conclusions on the central question is difficult and the paper can neither provide them. But it shows a lack of procedures for empirical approaches established in other sciences. I think Appendix A is very important and should have been put into the main text (because falsification is such an important discussed criterion in the paper).

typo

Further, Science generally requires that a hypothesis survives severe tests before it is trusted.

(Appendix A)

typo

give a highly accessible exposition

Software-based Power Side-Channel Attacks on x86 §

Titel: “Software-based Power Side-Channel Attacks on x86” by Moritz Lipp, Andreas Kogler, David Oswald, Michael Schwarz, Catherine Easdon, Claudio Canella, Daniel Gruss [url] [dblp]
Veröffentlicht etwa 2020-11 in und ich habe es gelesen etwa 2020-12
Zusammenfassung: Power side-channel attacks exploit variations in power consumption to extract secrets from a device, e.g., cryptographic keys. Prior attacks typically required physical access to the target device and specialized equipment such as probes and a high-resolution oscilloscope.

questions

quotes

summary

In this paper, we present PLATYPUS attacks which are novel software-based power side-channel attacks on Intel server, desktop, and laptop CPUs. We exploit unprivileged access to the Intel Running Average Power Limit (RAPL) interface that exposes values directly correlated with power consumption, forming a low-resolution side channel.

Some instructive mathematical errors §

Titel: “Some instructive mathematical errors” by Richard P. Brent [url] [dblp]
Veröffentlicht etwa 2021-06-20 in und ich habe es gelesen etwa 2020-08
Zusammenfassung: We describe various errors in the mathematical literature, and consider how some of them might have been avoided, or at least detected at an earlier stage, using tools such as Maple or Sage. Our examples are drawn from three broad categories of errors. First, we consider some significant errors made by highly-regarded mathematicians. In some cases these errors were not detected until many years after their publication. Second, we consider in some detail an error that was recently detected by the author. This error in a refereed journal led to further errors by at least one author who relied on the (incorrect) result. Finally, we mention some instructive errors that have been detected in the author’s own published papers.

Comment: 25 pages, 75 references. Corrected typos and added footnote 5 in v2

feedback

Links to Wikipedia should be permalinks.

quotes

summary

Nice paper showing how some errors crept up in mathematical literature. Some examples are historical, others are related to the author's work. Apparently the author works in the field of number theory and computational mathematics. Less examples are provided from algebra and geometry.

Reproducibility and numeric verification seem to be approaches to combat the underlying problems. How I also wonder how much a difference it would make if formulas are easier to search for. I wonder how accessible sagemath, Maple and others are for researchers (can they easily verify theorems of fields they are not familiar with?).

Some-well known errors:

Ad claim 2 - disproval methods:

  1. algebraic argument
  2. numeric evaluation
  3. analytical argument

 

Strategies for Parallel Markup §

Titel: “Strategies for Parallel Markup” by Bruce R. Miller [url] [dblp]
Veröffentlicht etwa 2015 in CICM 2015 und ich habe es gelesen etwa 2024-09
Zusammenfassung: Cross-referenced parallel markup for mathematics allows the combination of both presentation and content representations while associating the components of each. Interesting applications are enabled by such arrangements: interaction with parts of the presentation to manipulate and query the corresponding content; enhanced search indexing. Although the idea of such markup is hardly new, effective techniques for creating and manipulating it are more difficult than it appears. Since the structures and tokens in the two formats often do not correspond one-toone, decisions and heuristics must be developed to determine in which way each component refers to and is referred to by components of the other representation. Conversion between fine and coarse-grained parallel markup complicates xml identifier (ID) assignments. In this paper, we will describe the techniques developed for LATExml, a TEX/LATEX to xml converter, to create cross-referenced parallel MathML. While not yet considering LATExml’s content MathML to be truly useful, the current effort is a step towards that continuing goal.

quotes

  • “Of course, the idea of parallel markup is hardly new. The m:semantics element has been part of the MathML specification [1] since the first version, in 1998!” (Miller, 2015, pp. -)
  • “Fine-grained parallelism is when the smallest sub-expressions are represented in multiple forms, whereas with coarse-grained parallelism the entire expression appears in several forms.” (Miller, 2015, p. 1)
  • “But what isn’t so clear is how to maintain the associations between the symbols and structures in the two trees. Indeed, there is typically no one-to-one correspondence between the elements of each format.” (Miller, 2015, p. 2)
  • “And, now that generating Content MathML is more fun, we must continue working towards generating good Content MathML. Ongoing work will attempt to establish appropriate OpenMath Content Dictionaries, probably in a FlexiFormal sense [5], improved math grammar, and exploring semantic analysis.” (Miller, 2015, p. 7)

references

  • sTeX
  • Digital Library of Mathematical Functions (DLMF)

summary

Weak paper. It was nice to see MathML examples. It was also nice to see XMath (apparently an intermediate representation by LatexML). They explain one algorithm to convert XMath to MathML, but they do neither explain XMath or how they came up with it. And I think the paper name is a misnomer. It should be something like “Strategies to convert XMath to cMML and pMML”.

Sweeping for Leakage in Masked Circuit Layouts §

Titel: “Sweeping for Leakage in Masked Circuit Layouts” by Danilo Sijacic, Josep Balasch, Ingrid Verbauwhede [url] [dblp]
Veröffentlicht etwa 2019 in DATE 2020 und ich habe es gelesen etwa 2020-11
Zusammenfassung: Masking schemes are the most popular countermeasure against side-channel analysis. They theoretically decorrelate information leaked through inherent physical channels from the key-dependent intermediate values that occur during computation. Their provable security is devised under models that abstract complex physical phenomena of the underlying hardware. In this work, we investigate the impact of the physical layout to the side-channel security of masking schemes. For this we propose a model for co-simulation of the analog power distribution network with the digital logic core. Our study considers the drive of the power supply buffers, as well as parasitic resistors, inductors and capacitors. We quantify our findings using Test Vector Leakage Assessment by relative comparison to the parasitic-free model. Thus we provide a deeper insight into the potential layout sources of leakage and their magnitude.

quotes

summary

typo

TEXML: Resurrecting TEX in the XML world §

Titel: “TEXML: Resurrecting TEX in the XML world” by Oleg Parashchenko [url] [dblp]
Veröffentlicht etwa 2007 in und ich habe es gelesen etwa 2022-11
Zusammenfassung: TEXML is an XML syntax for TEX, LATEX and ConTEXt. This definition is extremely correct, but I dislike its formality. Instead, I prefer the following. Thanks to TEXML, you can reuse your TEX skills in the XML world. With TEXML, XML publishing becomes a case of TEX publishing. TEXML is a very simple thing. You can learn it in a minute by looking at the examples in the section ‘TEXML tour’. But knowing the syntax isn’t enough. To feel TEXML, you need to know its past and future, the ideas behind it, and understand the author’s intentions. That’s why the technical stuff is wrapped by the sections with my very subjective view on the topic of XML publishing.In the most cases, the words ‘TEX’ and ‘LATEX’ are interchangeable, and they mean also any other TEX format. The author is from the XML world. The TEXML home page is http://getfo.org/texml/.

opinion

“Creating automata for TEXML could be a good master thesis or even a PhD work. If you know someone who might be interested in this task, don’t hesitate to mention TEXML.” (Parashchenko, 2007, p. 5)

This is pure engineering and seems pretty easy. How could this be a scientific project?

quotes

  • “What in XML looks like
     <environment> ...text... </environment>
    in LATEX looks like this:
     \begin{environment} ...text... \end{environment}”
    (Parashchenko, 2007, pp. -)
  • “Among benefits of logical markup is the possibility of single source publishing, when the same source document can be converted to different output formats.” (Parashchenko, 2007, p. 1)
  • “On the other hand, the only correct TEX parser is TEX itself, and TEX is locked in its sandbox.” (Parashchenko, 2007, p. 1)
  • “Only a few tools implement XSL-FO in full, and all these tools are commercial, without open source alternatives (the best one is FOP, which is under development), and the W3 Consortium has started work on XSL-FO 2.0.” (Parashchenko, 2007, p. 1)
  • “And there are common errors when generating TEX code. (See bug databases for such projects as db2latex (Casellas and Devenish, 2004), dblatex (Guillon, 2006) and others.)” (Parashchenko, 2007, p. 2)
  • “The TEXML markup language is minimalistic. Most of the time, you use only three elements: cmd, env and group (the other elements are pdf, math, dmath, ctrl, spec and TeXML).” (Parashchenko, 2007, p. 2)
  • “This example demonstrates the three most often used TEXML elements:

    • cmd creates a LATEX command,
    • env creates a LATEX environment,
    • group creates a LATEX group.” (Parashchenko, 2007, p. 3)
  • “Meanwhile, I also investigated how to deal with TEX and XSLT limitations. This activity resulted in the projects sTEXme (Parashchenko, 2004a) (TEX+Scheme) and XSieve (Parashchenko, 2006c) (XSLT+Scheme), one of the Google Summer of Code 2005 projects, presented at the XTech 2006 conference.” (Parashchenko, 2007, p. 4)
  • “I developed Consodoc (Parashchenko, 2006a), an XML to PDF publishing tool on top of TEXML.” (Parashchenko, 2007, p. 6)

summary

XML syntax encoding for common Teχ markup generates escaped Teχ output to generate a PDF.

Templates vs. Stochastic Methods: A Performance Analys… §

Titel: “Templates vs. Stochastic Methods: A Performance Analysis for Side Channel Cryptanalysis” by Benedikt Gierlichs, Kerstin Lemke-Rust, Christof Paar [url] [dblp]
Veröffentlicht etwa 2006 in CHES 2006 und ich habe es gelesen etwa 2020-06
Zusammenfassung: Template Attacks and the Stochastic Model provide advanced methods for side channel cryptanalysis that make use of ‘a-priori’ knowledge gained from a profiling step. For a systematic comparison of Template Attacks and the Stochastic Model, we use two sets of measurement data that originate from two different microcontrollers and setups. Our main contribution is to capture performance aspects against crucial parameters such as the number of measurements available during profiling and classification. Moreover, optimization techniques are evaluated for both methods under consideration. Especially for a low number of measurements and noisy samples, the use of a T-Test based algorithm for the choice of relevant instants can lead to significant performance gains. As a main result, T-Test based Templates are the method of choice if a high number of samples is available for profiling. However, in case of a low number of samples for profiling, stochastic methods are an alternative and can reach superior efficiency both in terms of profiling and classification.

ambiguity

quotes

summary

Important empirical results. Parameters and assumptions are okay. Results are fundamental and significant. However, the description of the methods (templates, stochastic) are quite bad. Looking up the references is required.

Notes:

The Aesthetics of Reading §

Titel: “The Aesthetics of Reading” by Kevin Larson, Rosalind Picard [url] [dblp]
Veröffentlicht etwa 2005 in und ich habe es gelesen etwa 2021-08
Zusammenfassung: In this paper we demonstrate a new methodology that can be used to measure aesthetic differences by examining the cognitive effects produced by elevated mood. Specifically in this paper we examine the benefits of good typography and find that good typography induces a good mood. When participants were asked to read text with either good or poor typography in two studies, the participants who received the good typography performed better on relative subjective duration and on certain cognitive tasks.

quotes

summary

This paper tries to evaluate ClearType's typographic performance in a user study by evaluating performance in creative tasks. I think the assumptions are very strong (e.g. “Our hope is that RSD not only detects task difficulty, but also aesthetic differences”).

The Case of Correlatives: A Comparison between Natural… §

Titel: “The Case of Correlatives: A Comparison between Natural and Planned Languages” by Federico Gobbo [url] [dblp]
Veröffentlicht etwa 2011 in Journal of Universal Language 2011 und ich habe es gelesen etwa 2021-07
Zusammenfassung:

quotes    

summary

A very neat paper from the field of linguistics. Language elements are first established based on grammatical categories (Bausani (1974) and Gobbo (2008) distinguish esoteric/exoteric languages) (Blanke (1985) differentiates project to language on a one-dimensional axis) (Grammar Characters by Tesnière (1959)).

In the following planned languages with their peak development on the transition from the 19th to 20th century and a focus on the Standard Average European sprachbund are discussed. Here the distinction of similarity (with established languages) versus regularity (formed along formal principles) is pointed out as fundamental. Similar to the notion of suppletive. The paper then contributes the correlatives in various languages: {English, German, French, Latin, Volapük, Spelin, Zahlensprache, Lithuanian, Esperanto, Esperanto (1984), Ido, Novial, Neo, Interlingua, Klingon, Na'vi}. The paper appropriately discusses the relation between those languages.

The main result is given as “some natural languages are surprisingly far more regular than their planned daughters in spite of the fact that regularity was a major claim of the efforts in planning IALs during the late 19th and early 20th century”. The result is less unexpected in the end. As the discussed developments show, similarity is sometimes favored over regularity thus neglecting a regular table of correlatives. Furthermore some discussed auxiliary languages never used simplicity/easiness/regularity as the main goal. In essence, Esperanto shows the most regular system which is an auxiliary language.

typo

page 73, remove “took”

The European Union and the Semantic Web §

Titel: “The European Union and the Semantic Web” by Neville Holmes [url] [dblp]
Veröffentlicht etwa 2008 in und ich habe es gelesen etwa 2021-09
Zusammenfassung:

quotes

summary

Without a particular reason the author creates the E-speranto proposal; a “simplification” to Esperanto. The elements are appropriately discussed for an introductory article but also the relationship to the EU is unmotivated. In the end, the author emphasizes the lexicon which is interesting to consider it as a separate concept from language.

The proposal:

Word endings: As an Indo-European language, E-speranto uses endings to express grammatical qualification of word stems. Thus, adverbs end in –e, infinitives in –i, and imperatives in –u. Synthesis starts showing in noun and adjective endings. Using a BNF-like notation, these are –[a|o](y](n] where the required a or o signal an adjective or noun respectively, the optional y signals plurality, and the optional n signals accusativity. Verb endings use the five vowels for a kind of temporal placement. Thus a signals here and now, o signals ahead or the future, i behind or the past, u conditional or propositional, and e perpetual or definitional. The full verb endings are –as for present tense, –os for future tense, –is for past tense, –us for conditional, and –es for perpetual. These verb endings can also be used as morphemes, so that osa is the adjectival future and la aso means the present. The vowels are used as placers in other synthetic syllables that can be used as suffixes or ordinary morphemes. The formula [vowel](n][t] yields 10 morphemes that qualify actions. With the n the action is active, without it passive, so amanta and amata are adjectives meaningloving and loved, and ante and ate are adverbs meaning actively and passively, all these set in the present. This construction can be simply extended to specify horizontal spatial placement by [vowel](m][p] and vertical by [vowel](n][k], where the m and n specify placement relative to the sentence’s subject. Also, u means left and e means right. Thus la domompo means the house ahead while la domopo means the front of the house. Such compounds can take some getting used to, but they are regular and powerful.

Structural words: Synthesis at the phonemic level is even more expressive in its use when building the pronouns and correlatives essential to expressiveness. In Esperanto these center on the vowel i, with affixed phonemes to give the particular class of meaning. The singular pronouns are mi, ci, li, xi, and ji for I, thou, he, she, and it, and si for reflexion. Here I use E-speranto spellings where c is pronounced like the ts in tsunami, and x as sh. There are also two plural pronouns: ni and vi for we and you. There are also two prefixes that can be used: i- to widen the scope and o- to abstract it. Thus ili means he and his associates or they, and omi means someone like me or one. The pronouns can also take grammatical suffixes such as –n for the accusative (min means me) and –a for the adjectival (mia means my). The correlatives are two-dimensional, with one range of meanings given by a suffix, and an independent range by a prefix. The generic correlatives have no prefix. Having a simple vowel suffix, ia, ie, io, and iu mean respectively some kind of, somewhere, something, and somebody. Having a vowel+consonant suffix, ial, iam, iel, ies, iol, and iom, mean respectively for some reason, at some time, in some way, someone’s, in some number, and in some amount. The specific correlatives apply
a variety of prefixes to the generic correlatives. The prefixes are k–, t–, nen–, and q– (pronounced ch–),
and they give selective, indicative, negative, and inclusive meanings. For example, kiam, tiam, neniam,
and qiam mean when, then, never, and always, respectively. This description shows how phonemic synthesis can yield dramatic richness simply. Further, the correlatives can also take the i– and o– scoping prefixes, and both the pronouns and correlatives can be grammatically suffixed.

The Honey Badger of BFT Protocols §

Titel: “The Honey Badger of BFT Protocols” by Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, Dawn Song [url] [dblp]
Veröffentlicht etwa 2016 in CCS 2016 und ich habe es gelesen etwa 2024-09
Zusammenfassung: The surprising success of cryptocurrencies has led to a surge of interest in deploying large scale, highly robust, Byzantine fault tolerant (BFT) protocols for mission-critical applications, such as financial transactions. Although the conventional wisdom is to build atop a (weakly) synchronous protocol such as PBFT (or a variation thereof), such protocols rely critically on network timing assumptions, and only guarantee liveness when the network behaves as expected. We argue these protocols are ill-suited for this deployment scenario.

notes

backlog := latest transactions not yet committed

byzantine system := system where nodes might drop messages or might forge messages

fairness property = censorship resilience := If transaction tx is input to N-f correct nodes then it is eventually output by every node

IND-CCA := indistinguishable ciphertext under (CCA1 → non-adaptive) (CCA2 → adaptive) chosen ciphertext attack

IND-CPA := indistinguishable ciphertext under chosen plaintext attack

minting := generating blocks w.r.t. proof of stake

PBFT := Practical Byzantine Fault Tolerance (PBFT)

proof of stake := set of validators (subset of nodes) agree consensus on next block; weight of each validator’s vote depends on the size of its deposit (=stake)

slashing := punishment if you break network rules (in particular if your a validator/special node) (e.g. reduce or remove completely all staked tokens)

staking := a personal token is used as wage and at risk if your node misbehaves (via Polkadot protocol)

Sybil attack := one person creates many pseudonyms to gain disproportional influence

threshold cryptography := more than t of n parties need to cooperate to decipher a ciphertext (t <= n)

quotes

  • “Second, even when the weak synchrony assumptions are satisfied in practice, weakly synchronous protocols degrade significantly in throughput when the underlying network is unpredictable.” (Miller et al., 2016, p. 2)
  • “We propose HoneyBadgerBFT, the first BFT atomic broadcast protocol to provide optimal asymptotic efficiency in the asynchronous setting. We therefore directly refute the prevailing wisdom that such protocols a re necessarily impractical.” (Miller et al., 2016, p. 2)
  • “For this reason, many of Bitcoin’s enthusiastic supporters refer to it as the “Honey Badger of Money” [41].” (Miller et al., 2016, p. 1)
  • “For example, the Visa processes 2,000 tx/sec on average, with a peak of 59,000 tx/sec [1].” (Miller et al., 2016, p. 1)
  • “This inefficiency has two root causes. The first cause is redundant work among the parties.” (Miller et al., 2016, p. 2) “The second cause is the use of a suboptimal instantiation of the Asynchronous Common Subset (ACS) subcomponent.” (Miller et al., 2016, p. 2)
  • “By contrast, decentralized cryptocurrencies such as Bitcoin and Ethereum opt for a “permissionless” blockchain, where enrollment is open to anyone, and nodes may join and leave dynamically and frequently. To achieve security in this setting, known consensus protocols rely on proofs-of-work to defeat Sybil attacks, and pay an enormous price in terms of throughput and latency, e.g., Bitcoin commits transactions every 10 min, and its throughput limited by 7 tx/sec even when the current block size is maximized.” (Miller et al., 2016, p. 2)
  • “Traditionally, such a primitive is called total order or atomic broadcast [23]; in Bitcoin parlance, we would call it a blockchain.” (Miller et al., 2016, p. 2)
  • “The key invention we contribute is a novel reduction from ABC to ACS that provides better efficiency (by an O(N) factor) through batching, while using threshold encryption to preserve censorship resilience (see Section 4.4).” (Miller et al., 2016, p. 3)
  • “A convenient (but very strong) network assumption is synchrony: a ∆-synchronous network guarantees that every message sent is delivered after at most a delay of ∆ (where ∆ is a measure of real time).” (Miller et al., 2016, p. 3)
  • “Weaker timing assumptions come in several forms. In the unknown-∆ model, the protocol is unable to use the delay bound as a parameter. Alternatively, in the eventually synchronous model, the message delay bound ∆ is only guaranteed to hold after some (unknown) instant, called the “Global Stabilization Time.” Collectively, these two models are referred to as partial synchrony [26]. Yet another variation is weak synchrony [26], in which the delay bound is time varying, but eventually does not grow faster than a polynomial function of time [20].” (Miller et al., 2016, p. 3)
  • “At any given time, the designated leader is responsible for proposing the next batch of transactions. If progress isn’t made, either because the leader is faulty or because the network has stalled, then the nodes attempt to elect a new leader.” (Miller et al., 2016, p. 4)
  • “Our model particularly matches the deployment scenario of a “permissioned blockchain” where transactions can be submitted by arbitrary clients, but the nodes responsible for carrying out the protocol are fixed.” (Miller et al., 2016, p. 4)
  • “We assume each pair of nodes is connected by a reliable authenticated point-to-point channel that does not drop messages.” (Miller et al., 2016, p. 4)
  • “The adversary is given complete control of up to f faulty nodes, where f is a protocol parameter. Note that 3 f + 1 ≤ N (which our protocol achieves) is the lower bound for broadcast protocols in this setting.” (Miller et al., 2016, p. 5)
  • “A finite transaction delay implies censorship resilience.” (Miller et al., 2016, p. 5)
  • “The protocol proceeds in epochs, where after each epoch, a new batch of transactions is appended to the committed log. At the beginning of each epoch, nodes choose a subset of the transactions in their buffer (by a policy we will define shortly), and provide them as input to an instance of a randomized agreement protocol. At the end of the agreement protocol, the final set of transactions for this epoch is chosen.” (Miller et al., 2016, p. 5)
  • “Therefore instead of simply choosing the first element(s) from its buffer (as in CKPS01 [15]), each node in our protocol proposes a randomly chosen sample, such that each transaction is, on average, proposed by only one node.” (Miller et al., 2016, p. 5)
  • “For deterministic erasure coding, we use the zfec library [52], which implements Reed-Solomon codes. For instantiating the common coin primitive, we implement Boldyreva’s pairing-based threshold signature scheme [11]. For threshold encryption of transactions, we use Baek and Zheng’s scheme [7] to encrypt a 256-bit ephemeral key, followed by AES-256 in CBC mode over the actual payload.” (Miller et al., 2016, p. 8)
  • “The total communication cost (per node) is estimated as: m_all = r(BmT + NmE) + N2((1 + log N)mH + mD + 4mS) where mE and mD are respectively the size of a ciphertext and decryption share in the TPKE scheme, and mS is the size of a TSIG signature share.” (Miller et al., 2016, p. 9)
  • “Even for small networks, HoneyBadgerBFT provides significantly better robustness under adversarial conditions as noted in Section 3. In particular, PBFT would achieve zero throughput against an adversarial asynchronous scheduler, whereas HoneyBadgerBFT would complete epochs at a regular rate.” (Miller et al., 2016, p. 10)
  • “The Tor network consists of approximately 6, 500 relays, which are listed in a public directory service.” (Miller et al., 2016, p. 10)
  • “We design our experiment setup such that we could run all N HoneyBadgerBFT nodes on a single desktop machine running the Tor daemon software, while being able to realistically reflect Tor relay paths.” (Miller et al., 2016, p. 10)
  • “We attain a maximum throughput of over 800 transactions per second of Tor. In general, messages transmitted over Tor’s relay network tends to have significant and highly variable latency.” (Miller et al., 2016, p. 11)
  • “The Byzantine agreement algorithm from Moustefaoui et al. [42] is shown in Figure 11” (Miller et al., 2016, p. 15)

summary

Very good paper proposing a novel scheme including tests on a prototype implementation confirming its claims. Assumes a lot of background from the field (which is fine) (but also explains some primitives in the appendix).

https://github.com/amiller/HoneyBadgerBFT/tree/master

The Implementation of Lua 5.0 §

Titel: “The Implementation of Lua 5.0” by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, Waldemar Celes [url] [dblp]
Veröffentlicht etwa 2005 in und ich habe es gelesen etwa 2022-12-10
Zusammenfassung: We discuss the main novelties of the implementation of Lua 5.0: its registerbased virtual machine, the new algorithm for optimizing tables used as arrays, the implementation of closures, and the addition of coroutines.

quotes

  • “Currently the compiler accounts for approximately 30% of the size of the Lua core.” (Ierusalimschy et al., p. 3)
  • “the hand-written parser is smaller, more efficient,more portable, and fully reentrant.” (Ierusalimschy et al., p. 3)
  • “Lua is really lightweight: for instance, on Linux its stand-alone interpreter, complete with all standard libraries, takes less than 150 Kbytes; the core is less than 100 Kbytes.” (Ierusalimschy et al., p. 4)
  • “We also consider Lua a simple language, being syntactically similar to Pascal and semantically similar to Scheme, but this is subjective.” (Ierusalimschy et al., p. 4)
  • “Lua represents values as tagged unions ,that is, as pairs (t; v), where t is an integer tag identifying the type of the value v, which is a union of Ctypes implementing Lua types.” (Ierusalimschy et al., p. 5)
  • “One consequence of using tagged unions to represent Lua values is that copying values is a little expensive: on a 32-bit machine with 64-bit doubles, the size of a TObject is 12bytes (or 16bytes, if doubles are aligned on 8byte boundaries) and so copying a value requires copying 3 (or 4) machine words.” (Ierusalimschy et al., p. 5)
  • “The hash function does not look at all bytes of the string if the string is too long.” (Ierusalimschy et al., p. 6)
  • “The hash part uses a mix of chained scatter table with Brent's variation [3].” (Ierusalimschy et al., p. 8)
  • “Most procedural languages avoid this problem by restricting lexical scoping (e.g., Python), not providing first-class functions (e.g., Pascal), or both (e.g., C). Functional languages do not have those restrictions. Research in non-pure functional languages like Scheme and ML has created a vast body of knowledge about compilation techniques for closures (e.g., [19, 1,21]).” (Ierusalimschy et al., p. 9)
  • “For instance, just the control flow analysis of Bigloo, an optimizer Scheme compiler [20], is more than ten times larger than the whole Lua implementation: The source for module Cfa of Bigloo 2.6f has 106,350 lines, versus 10,155 lines for the core of Lua 5.0. As explained in Section 2, Lua needs something simpler.” (Ierusalimschy et al., p. 9)
  • “Lua uses a structure called an upvalue to implement closures. Any outer local variable is accessed indirectly through an upvalue. The upvalue originally points to the stack slot wherein the variable lives (Figure 4,left). When the variable goes out of scope, it migrates into a slot inside the upvalue itself (Figure 4, right).” (Ierusalimschy et al., p. 9)
  • “Coroutines in Lua are stackful, in the sense that we can suspend a coroutine from inside any number of nested calls.” (Ierusalimschy et al., p. 10)
  • “As such, they allow programmers to implement several advanced control mechanisms, such as cooperative multithreading, generators, symmetric coroutines, backtracking, etc. [7].” (Ierusalimschy et al., p. 10)
  • “A key point in the implementation of coroutines in Lua is that the interpreter cannot use its internal C stack to implement calls in the interpreted code. (The Python community calls an interpreter that follows that restriction a stackless interpreter [23].)” (Ierusalimschy et al., p. 11)
  • “Because a function running in a coroutine may have been created in another coroutine, it may refer to variables in a different stack. This leads to what some authors call a cactus structure [18]. The use of flat closures, as we discussed in Section 5, avoids this problem altogether.” (Ierusalimschy et al., p. 11)
  • “Most instructions in a stack machine have implicit operands.” (Ierusalimschy et al., p. 12)
  • “There are 35 instructions in Lua's virtual machine.” (Ierusalimschy et al., p. 12)
  • “Branch instructions pose a difficulty because they need to specify two operands to be compared plus a jump offset.” (Ierusalimschy et al., p. 14)
  • “The solution adopted in Lua is that, conceptually, a test instruction simply skips the next instruction when the test fails;” (Ierusalimschy et al., p. 14)
  • “For function calls, Lua uses a kind of register window. It evaluates the call arguments in successive registers, starting with the first unused register. When it performs the call, those registers become part of the activation record of the called function, which therefore can access its parameters as regular local variables.” (Ierusalimschy et al., p. 15)
  • “Lua uses two parallel stacks for function calls.” (Ierusalimschy et al., p. 15)

summary

Wonderful read with a beautiful overview on the Lua 5.0 interpreter runtime design. The sections illustrate the content: {The representation of Values, Tables, Functions and Closures, Threads and Coroutines, The Virtual Machine}.

unclarity

  • “The equivalent Lua program, a={[1000000000]=1}, creates a table with a single entry.” (Ierusalimschy et al., p. 6)
    But the two-table design later on contradicts this statement. Since we skip storing the indices, we again require a huge array allocation.

The Profession as a Culture Killer §

Titel: “The Profession as a Culture Killer” by Neville Holmes [url] [dblp]
Veröffentlicht etwa 2007 in und ich habe es gelesen etwa 2021-09
Zusammenfassung:

quotes

summary

Mentions a myriad of topics. His discussion of characters of ASCII is informative, but his remarks about technology and writing systems are incomplete and biased.

The Security Risk of Lacking Compiler Protection in We… §

Titel: “The Security Risk of Lacking Compiler Protection in WebAssembly” by Quentin Stievenart, Coen De Roover, Mohammad Ghafari [url] [dblp]
Veröffentlicht etwa 2021 in QRS 2021 und ich habe es gelesen etwa 2022-12-12
Zusammenfassung: WebAssembly is increasingly used as the compilation target for cross-platform applications. In this paper, we investigate whether one can rely on the security measures enforced by existing C compilers when compiling C programs to WebAssembly. We compiled 4,469 C programs with known buffer overflow vulnerabilities to x86 code and to WebAssembly, and observed the outcome of the execution of the generated code to differ for 1,088 programs. Through manual inspection, we identified that the root cause for these is the lack of security measures such as stack canaries in the generated WebAssembly: while x86 code crashes upon a stack-based buffer overflow, the corresponding WebAssembly continues to be executed. We conclude that compiling an existing C program to WebAssembly without additional precautions may hamper its security, and we encourage more research in this direction.

potential contradiction

  • “We performed our evaluation using -O1 as the optimisation level.” (Stievenart et al., p. 7)
    “For this reason, we decided to use -O2 as the level of optimization.” (Stievenart et al., p. 6)

quotes

  • “We compiled 4,469 C programs with known buffer overflow vulnerabilities to x86 code and to WebAssembly, and observed the outcome of the execution of the generated code to differ for 1,088 programs. Through manual inspection, we identified that the root cause for these is the lack of security measures such as stack canaries in the generated WebAssembly: while x86 code crashes upon a stack-based buffer overflow, the corresponding WebAssembly continues to be executed.” (Stievenart et al., p. 1)
  • “The standard has been designed with security in mind, as evidenced among others by the strict separation of application memory from the execution environment’s memory. Thanks to this separation, a compromised WebAssembly binary cannot compromise the browser that executes the binary [2], [7].” (Stievenart et al., p. 1)
  • “This contradicts the design document of the WebAssembly standard [2] which states “common mitigations such as data execution prevention (DEP) and stack smashing protection (SSP) are not needed by WebAssembly programs”” (Stievenart et al., p. 1)
  • “The execution model of WebAssembly is stack-based” (Stievenart et al., p. 2)
  • “A WebAssembly program also contains a single linear memory, i.e., a consecutive sequence of bytes that can be read from and written to by specific instructions.” (Stievenart et al., p. 2)
  • “An example function is the following:
    (func $main (type 4)
     (param i32 i32)
     (result i32)
     (local i32)
     local.get 0)”
    (Stievenart et al., p. 2) [with two params, one return value, one local variable]
  • “After the last instruction, the function execution ends and the value remaining on the stack is the return value.” (Stievenart et al., p. 2)
  • “g0 acts as the stack pointer.” (Stievenart et al., p. 2)
  • “Moreover, multiple features of WebAssembly render programs less vulnerable than their native equivalent. Unlike in x86, the return address of a function in WebAssembly is implicit and can only be accessed by the execution environment, preventing returnoriented programming attacks among others, diminishing the potential of stack smashing attacks. Also, function pointers are supported in WebAssembly through indirect calls, where the target function of the call is contained in a statically-defined table: this reduces the number of possible control flow exploits.” (Stievenart et al., p. 3)
  • “We rely on the Juliet Test Suite 1.3 for C2 of the Software Assurance Reference Dataset [3], released in October 2017 and which has been used to compare static analysis tools that detect security issues in C and Java applications [5].” (Stievenart et al., p. 3)
  • “In total, for CWE 121, we have 2785/5906 programs (47%) that can be compiled to WebAssembly, and for CWE 122, we have 1666/3656 programs (46%) that can be compiled.” (Stievenart et al., p. 3)
  • “Moreover, stack smashing protection need to be ensured by the compiler rather than the WebAssembly runtime, that has no knowledge of how the program stack is managed.” (Stievenart et al., p. 5)
  • “We now turn our attention to the impact of optimisations on the ability to prevent vulnerable code to make it into the binary7. To illustrate this, we inspect one specific example that exhibits different behaviour depending on the optimisation levels, which is in essence similar to our first example.” (Stievenart et al., p. 5)
  • “For instance, one aspect which we have not encountered in the programs we inspected is that there is no use of function pointers.” (Stievenart et al., p. 7)
  • “Arteaga et al. [4] propose an approach to achieve code diversification for WebAssembly: given an input program, multiple variants of this program can be generated. Narayan et al. [14] propose Swivel, a new compiler framework for hardening WebAssembly binaries against Spectre attacks, which can compromise the isolation guarantee of WebAssembly. Sti ́ evenart and De Roover propose a static analysis framework [16] for WebAssembly, used to build an information flow analysis [17] to detect higher-level security concerns such as leaks of sensitive information.” (Stievenart et al., p. 7)

summary

4469 C programs (of a corpus for static code analysis test cases) are compiled to WASM and differences in the behavior on those two platforms are observed. 1088 programs showed different behavior. Specifically, it is simply concluded that WASM does not provide stack smashing detection. The study is useful, but limited in scope and with an expected result.

The UNIX Time- Sharing System §

Titel: “The UNIX Time- Sharing System” by Dennis M Ritchie, Ken Thompson, Bell Laboratories [url] [dblp]
Veröffentlicht etwa 1974-07 in Communications of the ACM und ich habe es gelesen etwa 2021-03
Zusammenfassung:

contradiction

Funny process

9.2 Per day (24-hour day, 7-day week basis)
There is a "background" process that runs at the lowest possible priority; it is used to soak up any idle CPU time. It has been used to produce a million-digit approximation to the constant e - 2, and is now generating composite pseudoprimes (base 2).

quotes

summary

Interesting retrospective read.

It was interesting to observe what has changed since then. Though I assume in 1974 they were quite experienced with their system design, such a huge design necessarily has a lot of controversial points.
The paper explains the filesystem and the shell as core concepts.

I think the idle background process and the casual statement of “there is about one crash every other day” is funny from today's perspective.

Underspecified statement

“To provide an indication of the overall efficiency of UNIX and of the file system in particular, timings were made of the assembly of a 7621-1ine program. The assembly was run alone on the machine; the total clock time was 35.9 sec, for a rate of 212 lines per sec.”

… which program? What does it do?

The design of a Unicode font §

Titel: “The design of a Unicode font” by C Bigelow, K Holmes [url] [dblp]
Veröffentlicht etwa 1993 in und ich habe es gelesen etwa 2020-10
Zusammenfassung: The international scope of computing, digital information interchange, and electronic publishing has created a need for world-wide character encoding standards. Unicode is a comprehensive standard designed to meet such a need. To be readable by humans, character codes require fonts that provide visual images — glyphs — corresponding to the codes. The design of a font developed to provide a portion of the Unicode standard is described and discussed.

ambiguity

“The inclusion of non-alphabetic symbols and non-Latin letters in these 8-bit character
sets required font developers to decide whether the assorted symbols and non-Latin letters
should be style-specific or generic.”

A definition of style-specific and generic would be nice.

“Hence, although accents need to be clearly differentiated, they do not need to be emphatic, and, indeed, overly tall or heavy accents can be more distracting than helpful to readers.”

What does empathic mean in this context?

nicely written

“To design such a font is a way to study and appreciate, on a microcosmic scale, the manifold variety of literate culture and history.”

“A ‘roman’ typeface design (perhaps upright would be a less culture-bound term, since a Unicode font is likely to include Greek, Cyrillic, Hebrew, and other scripts)” …

question

One of the advantages of Unicode is that it includes a Generic Diacritical Marks set of ‘floating’ diacritics that can be combined with arbitrary letters. These are not case-sensitive, i.e. there is only one set of floating diacritics for both capitals and lowercase

Isn't this a property of the font file format?

quotes

summary

A neat paper. Due to my lack of understanding of the status quo in 1993, I cannot judge on the approach and quality. There are some considerations, I am not familiar with (e.g. why do we need to distinguish only two kinds of diacritics - lowercase and uppercase), but the paper gives a broad overview over design decisions that need to be made, when designing a ‘Unicode’ font. They designed 1700 characters, but Unicode 1.1 specifies 34,168 characters. It is part of the paper to discuss “One big font vs. many little fonts”.

The problem with unicode §

Titel: “The problem with unicode” by N. Holmes [url] [dblp]
Veröffentlicht etwa 2003-06 in und ich habe es gelesen etwa 2021-08
Zusammenfassung:

quotes

summary

In this article, the author argues that Unicode is blunder, too complex and unstable. First, he puts markup and plaintext into contrast. Then he continues to discuss the Latin writing system suggesting a eight-bit categorization system (without actually assigning glyphs). He continues to discuss keyboards leading towards CJK setups. In the end, he claims, Unicode does not take full advantage of the systematic graphical features of the various writing systems.

At least from today's perspective, the author claims to solve typesetting problems without actually offering a solution in detail. The suggested modifiers in Table 1 require further discussion by the typographers. Does the font specify the positioning of diacritics or is it part of his suggested scheme? In the end, these discussions are today solved in OpenType and the original bit-level encoding does not matter. His suggestion for various 8-bit systems requires a proper annotation of encodings per text file.

In the end, I think the author has some valid points, but his statements lack depth and time has solved these issues differently than he suggested.

typo

esthetically → aesthetically

Too Much Crypto §

Titel: “Too Much Crypto” by Jean-Philippe Aumasson [url] [dblp]
Veröffentlicht etwa 2020-01-03 in Real-World Crypto 2020 und ich habe es gelesen etwa 2020/01
Zusammenfassung: We show that many symmetric cryptography primitives would not be less safe with significantly fewer rounds. To support this claim, we review the cryptanalysis progress in the last 20 years, examine the reasons behind the current number of rounds, and analyze the risk of doing fewer rounds. Advocating a rational and scientific approach to round numbers selection, we propose revised number of rounds for AES, BLAKE2, ChaCha, and SHA-3, which offer more consistent security margins across primitives and make them much faster, without increasing the security risk.

ambiguity

“Where we examine the reasons behind the number of rounds, comment on the risk posed by quantum computers, and finally propose new primitives for a future where less energy is wasted on computing superfluous rounds.”

Is this an English sentence?

notes

Good paper; its survey is its strong suit. However, the particular choice of proposed parameters has little justification

typo

If all the best cryptanalysts could find was a distinguisher in the chosen-ciphertext related-key model, then even less likely are practical key recovery attack in the chosen-plaintext model.

typo

But as as noted in §2, numbers such as

Toward decent text encoding §

Titel: “Toward decent text encoding” by N. Holmes [url] [dblp]
Veröffentlicht etwa 1998 in und ich habe es gelesen etwa 2020-08
Zusammenfassung:

quotes

summary

This article debates requirements for a decent text encoding scheme. It criticizes Unicode and argues in favor of Multicode.

Reading this 1998 article in 2021 certainly creates some issues. First and foremost, it is not understandable to me why the author equates Unicode to a 16-bit encoding (last page, left column). Unicode is called “obese character set” and 8-bit encodings are praised. The author neglects UTF-8 invented in 1992 without the “obese” property. His lack of understanding for writing systems is shown when describing “the traditional Chinese writing system” as “the one exception” “which encompasses thousands of distinct characters”. This statement excludes the CJK community which includes Kanji in Japanese text and Hanja in Hanguel (admittedly reduced to minority use since 1970 and limited to South Korea).

At the same time Holmes praises 8-bit encoding systems like the Multicode scheme, which makes computations like “convert to lowercase” difficult to implement (thus ignoring computational burden).

It seems to me the author did not properly research his topic of interest. But I agree upon the goal mentioned in the article:

“For text encoding, the world needs a standard for each writing system that suits each and every language using that system.”

Transitivaj kaj netransitivaj verboj en Esperanto §

Titel: “Transitivaj kaj netransitivaj verboj en Esperanto” by Kiselman Christer [url] [dblp]
Veröffentlicht etwa 1995 in La Stato kaj Estonteco de la Internacia Lingvo Esperanto und ich habe es gelesen etwa 2026-04-09
Zusammenfassung:

citaĵoj

  • “Mi starigas la sekvajn demandojn, kaj esperas kontribui al ties respondoj. Ĉu entute indas distingi la netransitivajn disde la transitivaj verboj en Esperanto? Se jes, kiel oni devas difini transitivecon? Kiuj verboj estas pli oftaj, la transitivaj aŭ la netransitivaj? Ĉu ekzistas reguloj laŭ kiuj oni povas formi transitivajn respektive netransitivajn verbojn (krom per -igi kaj -iĝi, kompreneble)? Kial Zamenhof ne reguligis la aferon? Ĉu entute oni povas vidi iujn regulaĵojn en la verboj rilate al transitiveco?” (Christer, 1995, p. 1)
  • “ju pli la ago influas la objekton, des pli tiu tendencas esti rekta.” (Christer, 1995, p. 2)
  • “oni povas diri same bone mi dankas vin; mi dankas al vi ; simile pri helpi : helpu linhelpu al li.” (Christer, 1995, p. 2)
  • “La sufiksoj -igi kaj -iĝi havas klaran efikon rilate al la valento: la unua pliigas la valenton je unu; la dua malpliigas la valenton je unu.” (Christer, 1995, p. 2)
  • “Se mi pagigas ŝuldanton, la ŝuldanto pagas; PIV havas nur tiun sencon de pagigi. Male, pri manĝigi du sencoj estas registritaj en PIV. Se mi manĝigas kokidon, ĉu mi donas al la kokido ion por manĝi aŭ ĉu mi igas iun manĝi ĝin? Ambaŭ interpretoj ekzistas; oni esprimas ilin dirante ke manĝigi povas signifi aŭ ‘manĝantigi’ aŭ ‘manĝatigi’.” (Christer, 1995, p. 3)
  • “Cetere oni povas daŭrigi la aplikon de la sufikso: la libro brulis, ĉar iu bruligis ĝin, kaj la reĝo bruligigis ĝin.” (Christer, 1995, p. 3)
  • “Tamen ni devas akcepti ke -igi ne povas esti aplikata al ĉiu transitiva verbo, ĉar ni ne povas amasigi sufiksojn senlime.” (Christer, 1995, p. 3)
  • “Ni notu ke estas la subjekto de la origina frazo kiu estas eliminita per la apliko de la sufikso, dum la origina objekto farigˆas subjekto en la nova frazo.” (Christer, 1995, p. 3)
  • “Sed kio do okazas se la verbo estas unuvalenta? […] Pro tio kelkaj opinias ke verbo kia sidiĝi, kvankam fundamenta kaj eĉ troviĝanta en la unua libro (Zamenhof 1887: faldfolio), estas nelogika (vidu Weidmann 1989:38). Por eliri Weidmann mencias du solvojn: aŭ oni konsideras sidiĝi kiel simpligon de sidigiĝi (kio do konservas la ideon pri valentpliigo kaj valentmalpliigo), aŭ oni diras ke sidiĝi tute ne venas de la verbo sidi sed de la adjektiva sida; sidiĝi tiam estas analoga al ruĝiĝi.” (Christer, 1995, p. 3)
  • “Lock (1989) komentis ke la vera malfacilaĵo ne temas pri transitiveco sed pri lernado de la signifo de la verboj.” (Christer, 1995, p. 5)
  • “Ŝajnas ke Zamenhof antaŭ la publikigo de Esperanto provis pli skemisman aliron al la vortfarado, precipe pri kelkaj verboj, sed ke li poste forlasis tiujn solvojn.” (Christer, 1995, p. 5)
  • “reciprokaj rilatoj inter la vortoj […] doni, preni ; vendi, aĉeti ; instrui, lerni” (Christer, 1995, p. 6)
  • “mi listigis ĉiujn verbojn kiuj estiĝas de radikoj en la Fundamento de Esperanto kaj estas registritaj en la Plena Ilustrita Vortaro (PIV). Tio signifas ke la listo enhavas ĉiujn primitivajn fundamentajn verbojn kaj ĉiujn derivitajn kaj kunmetitajn verbojn en PIV kies ĉefelemento estas fundamenta.” (Christer, 1995, p. 7)
  • “La klaso T estas pli ol duoble pli granda ol la klaso N.” (Christer, 1995, p. 7)
  • “En PIV estas la jena Zamenhofa ekzemplo, kiu montras la transitivecon de preterkuri : rivero, kiu preterkuras urbojn, kastelojn k vinberĝardenojn. Sed la rivero ne rajtas – laŭ PIV – preterflui la urbojn, ĉar la verbo preterflui estas ja netransitiva!” (Christer, 1995, p. 8)
  • “Ni povas noti ke la sekvantaj tre ofte uzataj verboj mankas en PIV: klari, necesi, pali, preti, proksimi, sami, varmi, same kiel ruĝi kaj ĉiuj similaj verboj derivitaj de koloradjektivoj, krom la menciita verdi. Ŝajnas do ke PIV en tiu rilato ne plu reprezentas la parolan Esperanton.” (Christer, 1995, p. 10)

eraroj

  • de la Internacia Lingvo (p. 1)
  • perfikso prefikso (p. 8)

notoj

  • nombro de koncernantoj = valento
  • forsarko = de. Brachlegung/Verhinderung eo. malebligo
  • Ĉu Zamenhof iel menciis tr/ntr en la Fundamento: tute ne. Li nur kontribuis vortekzemplojn
  • Liaj demandoj kaj mia respondo laŭ la teksto:

    • “Ĉu entute indas distingi la netransitivajn disde la transitivaj verboj en Esperanto?”
      Ne, Kiselman distingas kvar kategoriojn:

      • Verboj kiuj ĉiam postulas akuzativobjekton.
      • Verboj kiuj akceptas akuzativobjekton de ĝenerala speco (sed ne postulas akuzativobjekton).
      • Verboj kiuj akceptas akuzativobjekton, sed nur de limigita speco.
      • Verboj kiuj ne akceptas akuzativobjekton.
    • “Se jes, kiel oni devas difini transitivecon?”
      El la kvar kategorioj oni nomiĝas verboj el la du unaj kategorioj “transitiva” kaj el la du lastaj kategorioj “netransitiva”
    • “Kiuj verboj estas pli oftaj, la transitivaj aŭ la netransitivaj?”
      transitivaj verboj okazas proksime duoblaj ofte
    • “Ĉu ekzistas reguloj laŭ kiuj oni povas formi transitivajn respektive netransitivajn verbojn (krom per -igi kaj -iĝi, kompreneble)?”
      Nu, afiksoj havas influon… ekz. pri-, preter-, tra-, kaj trans-
    • “Kial Zamenhof ne reguligis la aferon?”
      Zamenhof provis pli skemisman aliron al la vortfarado kaj forlasis la solvon
    • “Ĉu entute oni povas vidi iujn regulaĵojn en la verboj rilate al transitiveco?”
      Laŭ mi reguloj ne ekzistas sed Kiselman klare diskutas efikon de afiksoj. Unu tendenco estas: ju pli la ago influas la objekton, des pli la verbo tendencas esti rekta

summary

La artikolo klare priskribas la situacio de vortfarado rilate al verboj. Li diskutas la nocio de transitiveco kaj citas antaŭaj opinioj kaj rezultoj de la temo. Mi legis la artikolon kaj ne sufiĉe memorigis la sufiksoj -igi kaj -iĝi. Post du jaroj mi debatis la temon kun mia plejŝatata lingvisto kaj skribis mian propran artikolon. Post mi detale denove legis ĉi tiun artikolon de Kiselman. Mi devas akcepti ke li faris pli bonan laboron ol mi.

Tweaks and Keys for Block Ciphers: The TWEAKEY Framewo… §

Titel: “Tweaks and Keys for Block Ciphers: The TWEAKEY Framework” by Jérémy Jean, Ivica Nikolić, Thomas Peyrin [url] [dblp]
Veröffentlicht etwa 2014 in ASIACRYPT 2014 und ich habe es gelesen etwa 2020-06
Zusammenfassung: We propose the TWEAKEY framework with goal to unify the design of tweakable block ciphers and of block ciphers resistant to related-key attacks. Our framework is simple, extends the key-alternating construction, and allows to build a primitive with arbitrary tweak and key sizes, given the public round permutation (for instance, the AES round). Increasing the sizes renders the security analysis very difficult and thus we identify a subclass of TWEAKEY, that we name STK, which solves the size issue by the use of finite field multiplications on low hamming weight constants. We give very efficient instances of STK, in particular, a 128-bit tweak/key/state block cipher Deoxys-BC that is the first AES-based ad-hoc tweakable block cipher. At the same time, Deoxys-BC could be seen as a secure alternative to AES-256, which is known to be insecure in the related-key model. As another member of the TWEAKEY framework, we describe Kiasu-BC, which is a very simple and even more efficient tweakable variation of AES-128 when the tweak size is limited to 64 bits.

Properties of tweakable block ciphers:

quotes (on contributions)

quotes (on history)

Non-intuitive results on birthday paradox bounds

Future work

Tweakey

Performance and area

summary

I think this paper is some good research product. I am not an expert on symmetric cryptography and cannot judge upon the security analysis and possible attacks, but to me it seems to consider relevant properties. Unrelated to the paper, I was not aware of beyond-birthday-attack-security which totally intrigued me. Related to the paper, follow-up work could be made regarding the question “What are sufficient conditions to achieve a secure tweakable block cipher with Tweakey?”. Well done!

typo

Underproduction: An Approach for Measuring Risk in Ope… §

Titel: “Underproduction: An Approach for Measuring Risk in Open Source Software” by Kaylea Champion, Benjamin Mako Hill [url] [dblp]
Veröffentlicht etwa 2021-02 in SANER 2021 und ich habe es gelesen etwa 2022-04
Zusammenfassung: The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call ‘underproduction’ which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widelyinstalled software components in Debian.

Lehman’s laws of software evolution

(via “Laws of Software Evolution Revisited” by M M Lehman)

“All relate specifically to E-type systems that is, broadly speaking, to software systems that solve a problem or implement a computer application in the real world”:

  1. Continuing Change: An E-type program that is used must be continually adapted else it becomes progressively less satisfactory.
  2. Increasing Complexity: As a program is evolved its complexity increases unless work is done to maintain or reduce it.
  3. Self Regulation: The program evolution process is self regulating with close to normal distribution of measures of product and process attributes.
  4. Conservation of Organisational Stability: The average effective global activity rate on an evolving system is invariant over the product
    life time.
  5. Conservation of Familiarity: During the active life of an evolving program, the content of successive releases is statistically invariant
  6. Continuing Growth: Functional content of a program must be continually increased to maintain user satisfaction over its lifetime.
  7. Declining Quality: E-type programs will be perceived as of declining quality unless rigorously maintained and adapted to a changing operational environment.
  8. Feedback System: E-type Programming Processes constitute Multi-loop, Multi-level Feedback systems and must be treated as such to be successfully modified or improved.

peer production

as defined by Yochai Benkler (in the 1990s):

  1. decentralized goal setting and execution
  2. a diverse range of participant motives, including non-financial ones
  3. non-exclusive approaches to poverty (e.g. copyleft or permissive licensing)
  4. governance through participation, notions of meritocracy, and charisma (rather than through property or contract)

quotes

  • “The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call ‘underproduction’ which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs.” (Champion and Hill, 2021, p. 1)
  • “In this paper, we describe an approach for identifying other important but poorly maintained FLOSS packages” (Champion and Hill, 2021, p. 1)
  • “In an early and influential practitioner account, Raymond argued that FLOSS would reach high quality through a process he dubbed ‘Linus’ law’ and defined as ‘given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone’ [5]. Benkler coined the term ‘peer production’ to describe the method through which many small contributions from large groups of diversely motivated individuals could be integrated together into high quality information goods like software [6].
    A growing body of research suggests reasons to be skeptical about Linus’ law [7] and the idea that simply opening the door to one’s code will attract a crowd of contributors [8, 9].”
    (Champion and Hill, 2021, p. 1)
  • “Over time, it has become clear that peer produced FLOSS projects’ reliance on volunteer labor and self-selection into tasks has introduced types of risk that traditional software engineering processes have typically not faced. Foremost among these is what we call ‘underproduction.’ We use the term underproduction to refer to the fact that although a large portion of volunteer labor is dedicated to the most widely used open source projects, there are many places where the supply of quality software and volunteer labor is far out of alignment with demand. Because underproduction may go unnoticed or unaddressed until it is too late, we argue that it represents substantial risk to the stability and security of software infrastructure.” (Champion and Hill, 2021, p. 2)
  • “How can we measure underproduction in FLOSS?” (Champion and Hill, 2021, p. 2)
  • “Our paper contributes to software engineering research in three distinct ways. First, we describe a broad conceptual framework to identify relative underproduction in peer produced FLOSS repositories: identifying software packages of lower relative quality than one would expect given their relative popularity. Second, we describe an application of this conceptual framework to a dataset of 21,902 source packages from the Debian GNU/Linux distribution using measures derived from multilevel Bayesian regression survival models. Finally, we present results from two experiments. The first experiment identifies a pool of relatively underproduced software in Debian. The second experiment seeks to validate our application of our framework for identifying underproduction by correlating underproduction with an alternate indicator of risk.” (Champion and Hill, 2021, p. 2)
  • “FLOSS began with the free software movement in the 1980s and its efforts to build the GNU operating system as a replacement for commercial UNIX operating systems [23]. Over time, free software developers discovered that their free licenses and practices of working openly supported new forms of mass collaboration and bug fixes [4].” (Champion and Hill, 2021, p. 2)
  • “‘Peer production’ is a term coined by Yochai Benkler to describe the model of organizing production discovered by FLOSS communities in the early 1990s that involved the mass aggregation of many small contributions from diversely motivated individuals. Benkler [9] defines peer production for online groups in terms of four criteria: (1) decentralized goal setting and execution, (2) a diverse range of participant motives, including non-financial ones, (3) non-exclusive approaches to property (e.g. copyleft or permissive licensing), and (4) governance through participation, notions of meritocracy, and charisma, rather than through property or contract.” (Champion and Hill, 2021, p. 2)
  • “The process of building and maintaining software is often collaborative and social, including not only code but code comments, commit messages, pull requests, and code reviews, as well as bug reporting, issue discussing, and shared problem-solving [24].” (Champion and Hill, 2021, p. 2)
  • “The team found that the laws are frequently not upheld in FLOSS, especially when effort from outside a core team is considered. This work suggests that the effort available to maintain a piece of FLOSS software may increase as it grows in popularity.” (Champion and Hill, 2021, p. 3)
  • “Prior studies have suggested that bug resolution rate is closely associated of a range of important software engineering outcomes, including codebase growth, code quality, release rate, and developer productivity [32, 33, 34]. By contrast, lack of maintenance activity as reflected in a FLOSS project’s bug tracking system can be considered a sign of failure [35].” (Champion and Hill, 2021, p. 3)
  • “In particular, we are inspired by a study of Wikipedia by Warncke-Wang et al. [36] who build off previous work by Gorbatˆ ai [37] to formalize what Warncke-Wang calls the “perfect alignment hypothesis” (PAH). The PAH proposes that the most heavily used peer produced information goods (for Warncke-Wang et al., articles in Wikipedia) will be the highest quality, that the least used will be the lowest quality, and so on. In other words, the PAH proposes that if we rank peer production products in terms of both quality and importance—for example, in the simple conceptual diagram shown in Figure 1—the two ranked lists will be perfectly correlated.” (Champion and Hill, 2021, p. 3)
  • “Despite the central role that FLOSS plays in peer production, we know of no efforts to conceptualize or measure underproduction in software.” (Champion and Hill, 2021, p. 3)
  • “A low quality Wikipedia article on an extremely popular subject seems likely to pose much less risk to society than a bug like the Heartbleed vulnerability described earlier which could occur when FLOSS is underproduced.” (Champion and Hill, 2021, p. 3)
  • “The measure of deviation resulting from this process serves as our measure of (mis-)alignment between quality and importance (i.e., over- or underproduction).” (Champion and Hill, 2021, p. 3)
  • “With a community in operation since 1993, Debian is widely used and is the basis for other widelyused distributions like Ubuntu. Debian had more than 1,400 different contributors in 20201 and contains more than 20,000 of the most important and widely used FLOSS packages.” (Champion and Hill, 2021, p. 4)
  • “A single source package may produce many binary packages. For examples, although it is an outlier, the Linux kernel source package produces up to 1,246 binary packages from its single source package (most are architecture specific subcollections of kernel modules).” (Champion and Hill, 2021, p. 4)
  • “However, software engineering researchers have noted that the quantity of bugs reported against a particular piece of FLOSS may be more related to the number of users of a package [50, 52], or the level of effort being expended on bug-finding [1] in ways that limit its ability to serve as a clear signal of software quality. In fact, Walden [1] found that OpenSSL had a lower bug count before Heartbleed than after. Walden [1] argued that measures of project activity and process improvements are a more useful sign of community recovery and software quality than bug count.” (Champion and Hill, 2021, p. 4)
  • “Time to resolution has been cited as an important measure of FLOSS quality by a series of software engineering scholars” (Champion and Hill, 2021, p. 4)
  • “A second challenge in measuring quality as time to resolution comes from the fact that the distribution of bugs across packages is highly unequal. Most of the packages we examine (14,604 of 21,902) have 10 or fewer bugs and more than one out of six (3,857 of 21,902) have only one bug reported.” (Champion and Hill, 2021, p. 5)
  • “Given this construction, Uj will be zero when a package is fully aligned, negative if it is overproduced, and positive if it is underproduced.” (Champion and Hill, 2021, p. 6)
  • “Our first experiment describes results from the application of our method described in §V and suggests that a minimum of 4,327 packages in Debian are underproduced.” (Champion and Hill, 2021, p. 6)
  • “Underproduction is a concept borrowed from economics and involves a relationship between supply and demand.” (Champion and Hill, 2021, p. 8)
  • “For example, resolution time is an imperfect and partial measure of quality.” (Champion and Hill, 2021, p. 8)
  • “Our results suggest that underproduction is extremely widespread in Debian. Our non-parametric survival analysis shown in Figure 2 suggests that Debian resolves most bugs quickly and that release-critical bugs in Debian are fixed much more quickly than non-release-critical bugs. The presence of substantial underproduction in widely-installed components of Debian exposes Debian’s users to risk.” (Champion and Hill, 2021, p. 9)
  • “One striking feature of our results is the predominance of visual and desktop-oriented components among the most underproduced packages (see Figure 5). Of the 30 most underproduced packages in Debian, 12 are directly part of the XWindows, GNOME, or KDE desktop windowing systems. For example, the “worst” ranking package, GNOME Power Manager (gnome-power-manager) tracks power usage statistics, allows configuration of power preferences, screenlocking, screensavers, and alerts users to power events such as an unplugged AC adaptor.” (Champion and Hill, 2021, p. 9)
  • “These results might simply reflect the difficulty of maintaining desktop-related packages. For example, maintaining gnomepower-manager includes complex integration work that spans from a wide range of low-level kernel features to high-level user-facing and usability issues.” (Champion and Hill, 2021, p. 9)
  • “FLOSS acts as global digital infrastructure. Failures in that infrastructure ripple through supply chains and across sectors.” (Champion and Hill, 2021, p. 9)

summary

The paper defines the notion of underproduction from economics for software projects. Roughly the notion captures the imbalance between activity/attention of open source packages in relation to demand (values below 1, in particular). To empirically quantify underproduction, they looked at the time for bug resolution versus installments of 21,902 Debian packages. 4,327 packages have been identified as underproduced (about 20%).

All decisions are made with proper rationale and limitations are discussed in section 7. The normalization of data, in particular assignment of bugs to packages through BTS, must have been taken a large efforts. However, my confidence in the statistical model is rather low (for example, I am not sure a uniform model for packages with such diverging bug reporting property - as explained on page 5 - is appropriate). A ‘control group’ with commercial software projects would be nice, but is obviously infeasible. I would like to point out that this is purely subjective and I cannot support this with a formal statement since my empirical background is very small.

The paper is well-written except for the screwup on page 5.

The result, that the worst underproduced applications are GUI applications, is interesting.

Understanding memory and thread safety practices and i… §

Titel: “Understanding memory and thread safety practices and issues in real-world Rust programs” by Boqin Qin, Yilun Chen, Zeming Yu, Linhai Song, Yiying Zhang [url] [dblp]
Veröffentlicht etwa 2020-06 in PLDI 2020 und ich habe es gelesen etwa 2021-01
Zusammenfassung: Rust is a young programming language designed for systems software development. It aims to provide safety guarantees like high-level languages and performance efficiency like low-level languages. The core design of Rust is a set of strict safety rules enforced by compile-time checking. To support more low-level controls, Rust allows programmers to bypass these compiler checks to write unsafe code.

questions

quotes

summary

This recent paper is a nice read about programming language design. One of the weaknesses is that claims have not been properly addressed. The abstract reads that it answers “how and why do programmers write unsafe code, what memory-safety issues real Rust programs have, and what concurrency bugs Rust programmers make” which is a broad statement. However, the answer can be only found in examples provided and the discussion discusses different questions; namely “When and why to use unsafe code? How to properly encapsulate unsafe operations? How to change unsafe code to safe code?”

Rust has many builtin features to prevent certain memory and concurrency related issues prevalent in unsafe programming languages like C/C++. In particular, the authors looked at rust's unsafe{} feature which gives programmers certain superpowers which are not possible in “safe rust”. These superpowers mean that checks are omitted and programmers need to guarantee properties themselves.

The authors manually looked at bugs in popular software packages from various fields and categorized those bugs. Their goal is to answer the questions:

  • When and why to use unsafe code?
  • How to properly encapsulate unsafe operations?
  • How to change unsafe code to safe code?

It turns out, that some issues are preventable by improved tooling support and better understanding of primitives among programmers. At the same time, rust proves that its safety features contribute to the correctness of programs.

One open question to me, that is not covered: Did someone

Implementations considered:

Insights:

Suggestion:

typo

Unicode and math, a combination whose time has come — … §

Titel: “Unicode and math, a combination whose time has come — Finally!” by Barbara Beeton [url] [dblp]
Veröffentlicht etwa 2000 in und ich habe es gelesen etwa 2023-12
Zusammenfassung: To technical publishers looking at ways to provide mathematical content in electronic form (Web pages, e-books, etc.), fonts are seen as an “f-word”. Without an adequate complement of symbols and alphabetic type styles available for direct presentation of mathematical expressions, the possibilities are limited to such workarounds as .gif and .pdf files, either of which limits the flexibility of presentation.

quotes

  • “The STIX project (Scientific and Technical Information eXchange), representing a consortium of scientific and technical publishers and scientific societies, has been trying to do something about filling this gap.” (Beeton, 2000, p. 176)
  • “Negotiations have been underway since mid-1997 (the wheels of standards organizations grind exceedingly slowly), but things are beginning to happen.” (Beeton, 2000, p. 176)
  • “Needless to say, font foundries have never been overly eager to provide an unlimited supply of new symbol shapes of arcane design and often intricate production requirements.” (Beeton, 2000, p. 176)
  • “In standardese, a term can have only one meaning. The basic ISO definition [5] is: character A member of a set of elements used for the organisation, control, or representation of data.” (Beeton, 2000, p. 177)
  • “glyph A recognizable abstract graphic symbol which is independent of any specific design.” (Beeton, 2000, p. 177)
  • “The alphanumeric soup of standardized codes has already been mentioned.” (Beeton, 2000, p. 177)
  • “ISO 646 is the “international” version of ASCII.” (Beeton, 2000, p. 177)
  • “Computer manufacturers and other commercial organizations dependent on computer technology became dissatisfied with the progress of the ISO working group responsible for standardizing codes, and, in 1988, formed the Unicode Consortium for the purpose of creating a unified international code standard on which new multinational computer technology could be based. The ISO old guard was joined or replaced by the Unicode members, and since 1991 Unicode and ISO 10646 have been parallel.” (Beeton, 2000, p. 178)
  • “In the Unicode 3.0 manual [8], only one reference can be unambiguously associated with math symbols: ISO 6862, Information and documentation — Mathematics character set for bibliographic information interchange (no explicit references are shown in the Unicode 2.0 manual).” (Beeton, 2000, p. 178)
  • “The Unicode Standard avoids duplicate encoding of characters by unifying them within scripts across languages; characters that are equivalent in form are given a single code.” (Beeton, 2000, p. 180)
  • “The first is the Hamiltonian formula well known in physics; the second is an unremarkable integral equation.” (Beeton, 2000, p. 183)
  • “These alphabets are needed for proper composition of mathematics:

    • lightface upright Latin, Greek and digits
    • boldface upright Latin, Greek and digits
    • lightface italic Latin, Greek and digits
    • boldface italic Latin, Greek and digits
    • script
    • fraktur
    • bold fraktur
    • open-face (blackboard bold) including digits
    • lightface upright sans serif Latin and digits
    • lightface italic sans serif Latin
    • boldface upright sans serif Latin, Greek, and digits
    • boldface italic sans serif Latin and Greek
    • monospace Latin and digits” (Beeton, 2000, p. 183)

summary

Exciting. The text describes early efforts during Unicode 3.0 to include technological & math symbols into the Unicode standard. As an intro, a history of character sets and Unicode is given. Kudos to all people involved into the efforts!

Very interesting: Standard variants defined using a Variation Selector (VS)  example list

Vnodes: An Architecture for Multiple File System Types… §

Titel: “Vnodes: An Architecture for Multiple File System Types in Sun UNIX” by S R Kleiman [url] [dblp]
Veröffentlicht etwa 1986 in USENIX summer 1986 und ich habe es gelesen etwa 2023-07
Zusammenfassung:

quotes

  • “vfs_sync(vfsp): Write out all cached information for vfsp.Note that this isnot necessarily done synchronously. When the operation returns all data has not necessarily been written out, however ithas been scheduled.” (Kleiman, 1986, p. 7)
  • “The current interface has been in operation since the summer of 1984, and isareleased Sun product.” (Kleiman, 1986, p. 9)
  • “Vnodes has been proven to provide aclean, well defined interface to different file system implementations.” (Kleiman, 1986, p. 9)
  • “In addition, a prototype "/proc" file system[5] has been implemented.” (Kleiman, 1986, p. 9)

summary

In this paper, the author S. R. Kleiman explains the filesystem interface developed for Sun UNIX. The basic idea is to have a uniform vfs and vnodes interface with C structs which is implemented for every filesystem. The interface with its proposed semantics is presented.

Very nice to see a white paper on such fundamental software architecture.

I am not familiar with filesystem requirements, but the interface, examples, and some rationale is provided. I was surprised fsync is not synchronous. The fid structure was not understandable to me either (does the unique file ID have a length of 1 byte).

typo

page 6, “the file pointer is is changed”

What is a “document”? §

Titel: “What is a “document”?” by Michael K. Buckland [url] [dblp]
Veröffentlicht etwa 1997 in JASIS 1997 und ich habe es gelesen etwa 2023-10
Zusammenfassung:

quotes

  • “In the late 19th century, there was increasing concern with the rapid increase in the number of publications, especially of scientific and technical literature. Continued effectiveness in the creation, dissemination, and utilization of recorded knowledge was seen as needing new techniques for managing the growing literature.” (Buckland, 1997, p. 804)
  • “Early in the 20th century, the word ‘‘documentation’’ was increasingly adopted in Europe instead of ‘‘bibliography’’ to denote the set of techniques needed to manage this explosion of documents.” (Buckland, 1997, p. 804)
  • “Loosjes (1962, pp. 1–8) explained documentation in historical terms: Systematic access to written texts, he wrote, became more difficult after the invention of printing resulted in the proliferation of texts; scholars were increasingly obliged to delegate tasks to specialists; assembling and maintaining collections was the field of librarianship; bibliography was concerned with the descriptions of documents; the delegated task of creating access for scholars to the topical contents of documents, especially of parts within printed documents and without limitation to particular collections, was documentation.” (Buckland, 1997, p. 805)
  • “Paul Otlet (1868–1944), is known for his observation that documents could be three-dimensional, which enabled the inclusion of sculpture.” (Buckland, 1997, p. 805)
  • “Similarly, the International Institute for Intellectual Cooperation, an agency of the League of Nations, developed, in collaboration with Union Français des Organismes de Documentation, technical definitions of ‘‘document’’ and related technical terms in English, French, and German versions and adopted:
    Document: Any source of information, in material form, capable of being used for reference or study or as an authority. Examples: manuscripts, printed matter, illustrations, diagrams, museum specimens, etc.” (Buckland, 1997, p. 805)
  • “In 1951 Briet published a manifesto on the nature of documentation, Qu'est-ce que la documentation, which starts with the assertion that "A document is evidence in support of a fact." ("Un document est une preuve à l'appui d'un fait" (Briet, 1951, 7). She then elaborates: A document is "any physical or symbolic sign, preserved or recorded, intended to represent, to reconstruct, or to demonstrate a physical or conceptual phenomenon". ("Tout indice concret ou symbolique, conservé ou enregistré, aux fins de représenter, de reconstituer ou de prouver un phénomène ou physique ou intellectuel." p. 7.) The implication is that documentation should not be viewed as being concerned with texts but with access to evidence.” (Buckland, 1997, p. 806)
  • “We infer, however, from her discussion that:

    1. There is materiality: Physical objects and physical signs only;
    2. There is intentionality: It is intended that the object be treated as evidence;
    3. The objects have to be processed: They have to be made into documents; and, we think,
    4. There is a phenomenological position: The object is perceived to be a document.” (Buckland, 1997, p. 806)
  • “indexicality–the quality of having been placed in an organized, meaningful relationship with other evidence–” (Buckland, 1997, p. 806)
  • “A document is the repository of an expressed thought.” (Buckland, 1997, p. 806)
  • “Ranganathan's view of "document" as a synonym for "embodied micro thought" on paper "or other material, fit for physical handling, transport across space, and preservation through time" was adopted by the Indian Standards Institution (1963, 24), with a note explaining that the term "document" "is now extended in use to include any embodied thought, micro or macro and whether the physical embodiment is exclusive to one work or is shared by more than one work." (Buckland, 1997, p. 807)

summary

The paper revisits various notions of “document” in academic literature. Is an audio recording a document? Is a specimen in a museum a document? Is an expressed thought a document? The paper illustrates that over time, the definition shifted from its physical nature to a broader contextualized meaning. The paper does not provide an answer, but a summary.

When a Patch is Not Enough - HardFails: Software-Explo… §

Titel: “When a Patch is Not Enough - HardFails: Software-Exploitable Hardware Bugs” by Ghada Dessouky, David Gens, Patrick Haney, Garrett Persyn, Arun Kanuparthi, Hareesh Khattri, Jason M. Fung, Ahmad-Reza Sadeghi, Jeyavijayan Rajendran [url] [dblp]
Veröffentlicht etwa 2018-12-01 in und ich habe es gelesen etwa 2020-07
Zusammenfassung: Modern computer systems are becoming faster, more efficient, and increasingly interconnected with each generation. Consequently, these platforms also grow more complex, with continuously new features introducing the possibility of new bugs. Hence, the semiconductor industry employs a combination of different verification techniques to ensure the security of System-on-Chip (SoC) designs during the development life cycle. However, a growing number of increasingly sophisticated attacks are starting to leverage cross-layer bugs by exploiting subtle interactions between hardware and software, as recently demonstrated through a series of real-world exploits with significant security impact that affected all major hardware vendors.

HCF

“A behavior humorously hinted at in IBM System/360 machines in the form of a Halt-and-Catch-Fire (HCF) instruction.”

→ See also Christopher Domas' research on x86 since 2017
→ Pentium F00F (C7C8) bug

quotes

summary

A technical report discussing how bugs are introduced in the hardware design process and slip through testing tools. Specifically, they define HardFails as RTL bugs that are difficult to detect and can be triggered from software potentially compromising the entire platform. Their classification in 4 HardFails properties {Cross-modular effects, Timing-flow gap, Cache-state gap, Hardware/Firmware-interactions} is non-exhaustive (as pointed out in the conclusion). In my opinion, it is too copious when discussing the hardware development process and laying out the advantages/disadvantages of various tools. I think it could have been more concise (e.g. in “Proof assistant and theorem-proving” the scalability issue is mentioned twice).
Besides that, I think it give a nice overview over the issues hardware design has to deal with and yes, we need better tool support. But there is (obviously) no solution to the mentioned problems in the paper.
Catherine pointed out that CLKScrew in Table 2 does not need Firmware interaction and Foreshadow has nothing to do with Firmware interaction.

You Really Shouldn't Roll Your Own Crypto: An Empirica… §

Titel: “You Really Shouldn't Roll Your Own Crypto: An Empirical Study of Vulnerabilities in Cryptographic Libraries” by Jenny Blessing, Michael A. Specter, Daniel J. Weitzner [url] [dblp]
Veröffentlicht etwa 2021-07 in und ich habe es gelesen etwa 2021-10
Zusammenfassung: The security of the Internet rests on a small number of opensource cryptographic libraries: a vulnerability in any one of them threatens to compromise a significant percentage of web traffic. Despite this potential for security impact, the characteristics and causes of vulnerabilities in cryptographic software are not well understood. In this work, we conduct the first comprehensive analysis of cryptographic libraries and the vulnerabilities affecting them. We collect data from the National Vulnerability Database, individual project repositories and mailing lists, and other relevant sources for eight widely used cryptographic libraries.

quotes

summary

Well-designed methodology. All decisions made are well-documented and put into context. I am just still a little bit skeptical about the statement “You really shouldn't roll your own crypto”. The data shows that cryptographic bugs occur and cryptographic libraries should prevent errors on an API level. However, it does not really show results regarding “own crypto”. It does not provide examples for custom-designed cryptographic libraries or its effects in industry.

Prior work:

π is the Minimum Value for Pi §

Titel: “π is the Minimum Value for Pi” by C. L. Adler, James Tanton [url] [dblp]
Veröffentlicht etwa 2000 in und ich habe es gelesen etwa 2021-10
Zusammenfassung:

summary

If we consider Pi as the ratio of the circumference to its diameter, the value depends on the chosen metric. Varying this metric gives various values from 4 to π and to 4 again. Thus π is the minimum value of Pi. An analytic proof follows. Nice little result. Also attached is a proof that a² + b² ≥ 2ab. If the bright grey area equals a・b and the dark grey area equals a・b as well, then we can discover the area of 2・a・b. If the dark grey area outside the large square is a² and the large square b², we can observe that a² + b² ≥ 2ab because of the white square on the top.