Description

Welcome to my personal blog!

Personalized Notes

Here is some of the additional notes of my own interpretation of the materials.

Fourier Transform

Brief

This is a personal note on the Fourier series, the Fourier transform and the Laplace transform. The notes are written in LaTeX with the intention to learn to use LaTeX. The document's formation began on August 19, 2022.

Compiled Notes

Source

Statistics

Combination, Permutation and Ordering

Multinomial Coefficient

In context of combinations, the number of ways to choose $r$ objects from a total of $n$ is given by $(r n) = \frac{n !}{r ! ( n - r )!}$ . This is because the act of choosing is equvilant to categorizing or labeling the objects into 2 distinct categories, the chosen ones or the ones that is not chosen. There are $n!$ ways to order $n$ objects. However, the order of both the chosen group and the non chosen group does not matter, and there are $r!$ ways to order the chosen group, $(n - r)!$ ways of ordering the non chosen group; hence, the number of ways to categorize the objects is $(r n) = \frac{n !}{r ! ( n - r )!}$ .

Extending this concept, to partitioning the objects into $m$ categories, then if we need $k_{i}$ indistinguishable objects in the $i \in {1, 2, \dots, m}$ th category, then the number of ways to categorize is given by the multinomial coefficient $(k _{1} , k _{2} , \dots , k _{m} n) = \frac{n !}{k _{1} ! k _{2} ! \dots k _{m} !}$ .

Note: $k_{1} + k_{2} + \dots + k_{m} = n$ must be true.

Example

The number of distinct words that can be produced from the letters STATISTICS is $(3 , 3 , 1 , 2 , 1 10)$ , because the count of each letter is given by {S: 3, T: 3, A: 1, I: 2, C: 1}.

Permutation and Ordering

When the ordering is important, it is equivalent to categorizing $n$ objects into $r + 1$ distinct categories, the category for the non-chosen objects and the categories with labels first, second, third, etc... Each category except for the non-chosen category contains exactly 1 object. Therefore, the number of ways to choose $r$ objects from $n$ is $n^{(r)} = (n - r , 1 , 1 , \dots , 1 n) = \frac{n !}{( n - r )!}$ .

In other words, the act of ordering is fundamentally the same as giving unique labels.

Partition into Multiple Categories

The number of ways to partition $n$ indistinguishable objects into $k$ categories is given by
$(n n + k - 1) = (k - 1 n + k - 1) = (n , k - 1 n + k - 1)$ . To partition the objects into $k$ categories, we need $k - 1$ sticks to seperate them. Hence, it requires $n + k - 1$ spaces in total to place all objects and sticks. Out of those spots, $k - 1$ of them would be for sticks and $n$ of them for objects.

To ensure that each category has at least $c$ object, the sticks have to be seperated by at least $c$ object (e.g. for $c = 2$ , $OO ∣ OO ∣ OO ∣ \dots ∣ OO ∣ OO$ ). Then, we have to categorized (i.e. insert) the remaining $n - c k$ objects, and by the result above, there are $(k - 1 n + k ( 1 - c ) - 1)$ ways of doing so.

Note: Each category is distinguishable, so $(1, 2, 3)$ is different from $(2, 1, 3)$ .

The difference with the multinomial coefficient scenario is that this partition does not restrict the number of objects in each category, but both requires that every object is in one of the categories.

Examples

Number of ways to give $n$ indistinguishable coconuts to $k$ distinguishable friends: $(n n + k - 1)$
Number of possible tallies of 20 indistinguishable votes to 6 distinguishable canidates: $(6 25)$

Probability Distributions

Binomial and Multinomial Distribution

For $x$ Successes in $n$ bernoulli trials, there are $(x n)$ ways of arranging the Successes and Failures. Therefore, the probability of $x$ successes is given by $(x n) p^{x} (1 - p)^{n - x}$ where $p$ is the probability of success in each bernoulli trial.

For experiments that have more than 2 possible outcomes in each trial, the multinomial coefficient would be useful.

Note: each trial is independent and the union of all outcomes requires to be the entire sample space.

Example

3 dices are rolled independently, find the probability of getting 2 six's and 1 five. The outcome of each trial is either 5, 6 or other, so the probability is: $(2 , 1 , 0 3) (\frac{1}{6})^{2} (\frac{1}{6})^{1} (\frac{4}{6})^{0} = \frac{1}{72}$ .

Alternatively, the only outcomes of the experiment that satisfy the conditions are $(5, 6, 6), (6, 5, 6), (6, 6, 5)$ , and we get the same result: $\frac{3}{6 ^{3}} = \frac{1}{72}$ .

Poisson Distribution

For any interval $I$ of space time, it can be uniformly partitioned into $n$ non overlapping sub-intervals ( $I_{1}, I_{2}, \dots, I_{n}$ ). Assume that each event is independent from each other. If an event occured in $I$ , then it is equally likely to occur in any of the $n$ sub-intervals. For each of the event that happened, the probability of it being in any of the sub-interval follows the discrete uniform distribution.

This is a different interpretation of the binomial distribution, this interpretation can be extended further to obtain the poisson distribution. As $n$ approaches $\infty$ , each sub-interval becomes a single moment in space time. The distribution of the probability of an occurance being at any of the moments is still uniform due to the property of independence; In fact, a continuous uniform distribution. The probability of occurances at a given moment, however, approaches 0.

If $k = λ (b - a)$ occurances are expected in the interval $I = [a, b]$ , then the probability of $x$ occurances actually takes place is $P (X = x) = lim_{n \to \infty} (x n) (\frac{k}{n})^{x} (\frac{n - k}{n})^{n - x} = \frac{λ ^{x} e ^{- λ}}{x !}$ .

Due to their similarity, the poisson distribution can be approximated with the binomial distribution when $n$ is large.

Product of Two Normal PDFs

Notation

$f_{X_{i}} (x_{i}) = ϕ (x_{i}; μ_{i}, σ_{i}^{2})$ represents the PDF with $X_{i} \sim N (μ_{i}, σ_{i}^{2})$ , and
$Φ$ represent the CDF of the standard normal distribution

Derivation

Consider $X_{1} \sim N (μ_{1}, σ_{1}^{2})$ and $X_{2} \sim N (μ_{2}, σ_{2}^{2})$ independently, and try to find $f_{X_{1}} (x) f_{X_{2}} (x)$ .

$f_{X_{1}} (x) f_{X_{2}} (x) = \frac{1}{2 π σ _{1} σ _{2}} e^{- \frac{1}{2} [\frac{( x - μ _{1} ) ^{2}}{σ _{1}^{2}} + \frac{( x - μ _{2} ) ^{2}}{σ _{2}^{2}}]}$

Consider the exponent part and by completing the squares,

$\frac{( x - μ _{1} ) ^{2}}{σ _{1}^{2}} + \frac{( x - μ _{2} ) ^{2}}{σ _{2}^{2}} = \frac{1}{σ _{1}^{2} σ _{2}^{2}} [(σ_{1}^{2} + σ_{2}^{2}) x^{2} - 2 (μ_{1} σ_{2}^{2} + μ_{2} σ_{1}^{2}) x + (μ_{1}^{2} σ_{2}^{2} + μ_{2}^{2} σ_{1}^{2})]$ $= \frac{1}{σ _{1}^{2} σ _{2}^{2}} [x σ_{1}^{2} + σ_{2}^{2} - \frac{μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}]^{2} + \frac{1}{σ _{1}^{2} σ _{2}^{2}} [μ_{1}^{2} σ_{2}^{2} + μ_{2}^{2} σ_{1}^{2} - \frac{( μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2} ) ^{2}}{σ _{1}^{2} + σ _{2}^{2}}]$ $= \frac{σ _{1}^{2} + σ _{2}^{2}}{σ _{1}^{2} σ _{2}^{2}} [x - \frac{μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}]^{2} + \frac{1}{σ _{1}^{2} σ _{2}^{2} ( σ _{1}^{2} + σ _{2}^{2} )} [μ_{1} - μ_{2}]^{2}$

So,

$f_{X_{1}} (x) f_{X_{2}} (x) = ϕ (μ_{1}; μ_{2}, σ_{1}^{2} + σ_{2}^{2}) ϕ (x; \frac{μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}, \frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}})$

, which yields the result of a scaled normal pdf with $X \sim N (\frac{μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}, \frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}})$ .

Product of Three Normal PDFs

For the case of $f_{X_{1}} (x) f_{X_{2}} (x) f_{X_{3}} (x)$ , by using the above results, we get the result of a scaled normal pdf with a mean of $\frac{\frac{μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}} σ _{3}^{2} + μ _{3} \frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}}}{\frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}} + σ _{3}^{2}} = \frac{μ _{1} σ _{2}^{2} σ _{3}^{2} + μ _{2} σ _{1}^{2} σ _{3}^{2} + μ _{3} σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} σ _{2}^{2} + σ _{1}^{2} σ _{3}^{2} + σ _{2}^{2} σ _{3}^{2}}$ , a variance of $\frac{\frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}} σ _{3}^{2}}{\frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}} + σ _{3}^{2}} = \frac{σ _{1}^{2} σ _{2}^{2} σ _{3}^{2}}{σ _{1}^{2} σ _{2}^{2} + σ _{1}^{2} σ _{3}^{2} + σ _{2}^{2} σ _{3}^{2}}$ , and a scale factor of $ϕ (μ_{1}; μ_{2}, σ_{1}^{2} + σ_{2}^{2}) ϕ (\frac{μ _{1} σ _{2}^{2} + μ _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}; μ_{3}, \frac{σ _{1}^{2} σ _{2}^{2} + σ _{1}^{2} σ _{3}^{2} + σ _{2}^{2} σ _{3}^{2}}{σ _{1}^{2} + σ _{2}^{2}})$ .

Filters

Consider altitude sensors on a rocket, their measurements often comes with some error in repsect to the real altitude. Assume that the error is normally distributed centered at 0, and the actual altitude of the rocket is uniformlly distributed $Y \sim U (a, b)$ for some values $a$ and $b$ .

Let $X_{i} \sim N (y, σ_{i})$ be independent random variables for the predicted and measured values of the current altitude. For simplicity, only consider the bivariate case with $X_{1}$ and $X_{2}$ .

The purpose of sample filters are to filter out the error in measured and predicted value to give an best estimation of the actual value, in other words, find $f_{Y ∣ X_{1}, X_{2}} (y)$ .

Bayes Filter

By Bayes Theorem, $f_{Y ∣ X_{1}, X_{2}} (y) = \frac{f _{X_{1}, X_{2} ∣ Y} ( x _{1} , x _{2} ) f _{Y} ( y )}{f _{X_{1}, X_{2}} ( x _{1} , x _{2} )} = \frac{f _{Y, X_{1}, X_{2}} ( y , x _{1} , x _{2} )}{f _{X_{1}, X_{2}} ( x _{1} , x _{2} )}$ , and $f_{Y} (y) = \frac{1}{b - a}$ .

The denominator is an integral, $f_{X_{1}, X_{2}} (x_{1}, x_{2}) = \int_{a}^{b} f_{Y, X_{1}, X_{2}} (y, x_{1}, x_{2}) d y = \int_{a}^{b} f_{X_{1}, X_{2} ∣ Y} (x_{1}, x_{2}) f_{Y} (y) d y$ . However, since $f_{Y} (y)$ is independent from $y$ , then $f_{Y ∣ X_{1}, X_{2}} (y) = \frac{f _{X_{1}, X_{2} ∣ Y} ( x _{1} , x _{2} )}{\int _{a}^{b} f _{X_{1}, X_{2} ∣ Y} ( x _{1} , x _{2} ) d y}$

Since $X_{1}$ and $X_{2}$ are independent, then $f_{X_{1}, X_{2} ∣ Y} (x_{1}, x_{2}) = f_{X_{1} ∣ Y} (x_{1}) f_{X_{2} ∣ Y} (x_{2}) = ϕ (x_{1}; y, σ_{1}^{2}) ϕ (x_{2}; y, σ_{2}^{2})$ . By the results from the Product of Two Normal PDFs, $f_{X_{1}, X_{2} ∣ Y} (x_{1}, x_{2}) = ϕ (x_{1}; x_{2}, σ_{1}^{2} + σ_{2}^{2}) ϕ (y; \frac{x _{1} σ _{2}^{2} + x _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}, \frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}})$ . Therefore, $f_{Y ∣ X_{1}, X_{2}} (y) = \frac{ϕ ( y ; μ , σ ^{2} )}{Φ ( \frac{a - μ}{σ} \leq z < \frac{b - μ}{σ} )}$ , where $μ = \frac{x _{1} σ _{2}^{2} + x _{2} σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}$ and $σ^{2} = \frac{σ _{1}^{2} σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2}}$ .

For the special case $a \to - \infty$ and $b \to \infty$ , $f_{Y ∣ X_{1}, X_{2}} (y) = ϕ (y; μ, σ^{2})$ .

$λ$ -Calculus

All mathematical objects can be described as a lambda expression, including numbers, data structures etc... It was also proven to be equivalently capable to Turing's machine.

Syntax and Notation
Beta-Reduction
Combinators
Boolean
Data Structure
Church Numeral
- Successor
- Addition
- Predecessor
- Subtraction
- Multiplication
- Power
- Is Zero
- Ordering

Syntax and Notation

In lambda calculus, the functions are defined to take only one argument, with a signifier $λ$ indicating the start of a function definition, followed by a placeholder for the parameter with a ' $.$ ' symbol to separate the parameter with the return expression.

For examples $λ x . x$ is the identity function.

For any functions and parameters, an application of function $f$ to the parameter $x$ is denoted as $f x$ . For multiple application of function, $f ab$ in lambda calculus is equivalent to $(f (a)) (b)$ in mathematics. One could use parenthesis to change the order of function application. $f (ab)$ would be equivalent to $(f \circ a) (b)$ .

This does not provide the ability for multiple parameters, but it could be achieved with curing. For example, $λ f . λ x . f x$ . A shorthand notation would be $λ f x . f x$ , but it is not a function with two parameters, but instead a function that takes in a function $f$ , and returns another function, which takes an argument $x$ and returns $f x$ . The ordering does matter, and in this example, this lambda expression is different from $λ x f . f x$ .

$β$ -Reduction

The $β$ -reduction is a process of reducing a given lambda expression to the " $β$ -normal form" (the simplest form). Example:

$((λa . a) λb . λ c . b) (x) λ e . f$

$⟹ (λb . λ c . b) (x) λ e . f$

$⟹ (λ c . x) λ e . f$

$⟹ x$

Combinators

A combinator is a lambda expression that every variable is quantified. For examples, $λ f x . f x$ is a combinator while $λ x . f x$ is not, because $f$ is not quantified in the second example.

Identity

$I := λ x . x$ is the identity, which maps all inputs to themselves.

Mockingbird

$M := λ f . ff$ is the mockingbird combinator. It applies itself to itself. When $I$ is supplied to $M$ , $M I$ can be reduced down to $I$ . However, $MM$ would result in an infinite number of self application when reducing it.

Kestrel

$K := λab . a$ always returns the first curried argument, so $K M I$ could be reduced to $M$ . It can be used as a constant, after applying an argument to it once, the result of a second application is not dependent on the second parameter. For example, $K M$ is a function that always returns $M$ no matter the input.

Kite

$K I$ always returns the second curried argument, so $K I M I$ could be reduced to $I$ . By expanding this combinator, we obtain $(λab . a) λ x . x = λb x . x$ .

Cardinal

$C := λ f ab . f ba$ reverses the arguments of a function application. For example, $λ f . C f$ can be reduced to $λab . f ba$ .

Bluebird

$B := λ f gx . f (gx)$ is a combinator that allows composition of functions. It composes the function $f$ and $g$ .

Y Fixed-Point Combinator

This combinator is defined as $Y := λ f . (λ x . f (xx)) (λ x . f (xx))$ , and it is equivalent to $λ f . M (λ x . f (M x))$ . By reducing this combinator we obtain:

$⟹ λ f . f (M (λ x . f (M x)))$

$⟹ λ f . f (f (M (λ x . f (M x))))$

$⟹ λ f . f (f (f (M (λ x . f (M x)))))$

and so on. This combinator's behavior is similar to a non-stopping recursion. With some modifications, it is equivalent to recursion in Turing machines.

Boolean

Logical operations can be performed with lambda calculus.

True and False

However, a definition for the logical true and false is required. Recall that everything in lambda calculus are functions, so the true and false can be defined as functions that take two arguments.

True: returns the first argument (the $K$ combinator)
False: returns the second argument (the $K I$ combinator)

Negation

$N_{o t} := λ x . x (K I) K$ is the combinator for the logical not operation. When x is true, it would return $K I$ which represents false. Otherwise, it would return $K$ which represents true. Notice that the cardinal combinator is exactly identical, which means that the cardinal is the combinator for boolean negation.

Conjunction

$A_{n d} := λab . ab (K I)$ is the combinator for the logical and operation. Consider both cases of $a$ , this expression is also equivalent to $λab . aba$

Disjunction

$O_{r} := λab . a K b$ is the combinator for the logical or operation. Consider both cases of $a$ , this expression is also equivalent to $λab . aab$

Data Structure

$λ x f . f x$ Data structure are used to hold data to be processed after, so the function would be given and applied after the data is stored. It would hold the data $x$ until $f$ is given, which will then return the value of the processed data.

For example, let the data structure $D$ be $λab f . f ab$ , this structure stores a pair of values. If we store the value true and false, we obtain $X := DK (K I)$ . At this point, $X$ is a function that takes in an argument $f$ and returns $f K (K I)$ , so applying the logical or operator would give $X (λab . aab) = K$ , while supplying the logical or and operator yields $X (λab . aba) = K I$ .

Since everything in lambda calculus is expressed in functions, there is no data types, so the same combinator can be used for different types of data.

Church Numeral

Everything in lambda calculus are functions, so the natural numbers can be defined as functions that has two arguments:

a function $f$
an argument $x$ that will be supplied to $f$

In lambda calculus, a natural number $n$ are defined as a combinator that applied $f$ $n$ time on $x$ .

For examples:

$λ f x . x$
$λ f x . f x$
$λ f x . f (f x)$
$λ f x . f (f (f x))$

Successor

A successor combinator is needed to allow the creation of any natural number by induction. $S_{u cc} := λn f x . f (n f x)$ It first applies $f$ $n$ times on $x$ , then an addition application. Notice that this is equivalent to $λn f . B f (n f)$ .

Addition

Addition of numbers $a$ and $b$ can be defined as the $b^{t h}$ succession of $a$ , or equivalently, the $a^{t h}$ succession of $b$ . $A_{dd} := λab . a (S_{u cc}) b = λab . b (S_{u cc}) a$

Predecessor

In order to perform subtraction, we would need a combinator for the predecessor of the church numerals. Suppose there is a pair of values, where the second value is one greater than the first value, then the first value is the predecessor of the second.

Definitions:

$P_{ai r} := λab f . f ab$ as defined in Data Structure

$1_{s t} := λab . a$ returns the first value of a pair

$2_{n d} := λab . b$ returns the second value of a pair

$P_{zero} := P_{ai r} (λ f x . x) (λ f x . x)$ is the pair $(0, 0)$

$Φ := λ p . (P_{ai r} (p 2_{n d}) S_{u cc} (p 2_{n d}))$ constructs a successor pair of the given pair.

Notice that when $Φ$ is applied $n$ times to the pair $(0, 0)$ , the second value of the resultant pair would be $n$ , and since the first value is always the predecessor of the second, the first value would be $n - 1$ for all $n > 0$ . For $n = 0$ . Hence, the predecessor combinator for all $n > 0$ would be: $P_{re d} := λn . (n Φ P_{zero}) 1_{s t}$

This definition of predecessor implies that the predecessor of 0 is 0 itself.

Subtraction

Subtraction of the natural numbers $a$ and $b$ can be defined as the $b^{t h}$ predecessor of $a$ . $S_{u b} := λab . b (P_{re d}) a$

For $b > a$ , after $a$ or more application of $P_{re d}$ on $a$ , the return value would always be 0.

Multiplication

Multiplication of natural numbers is repeated addition. $M_{u l} := λab f . a (b f)$

Power

Exponential of natural numbers is repeated multiplication. $P_{o w} := λb e . e b$

Is Zero

$I_{s 0} := λn . n (K (K I)) (K)$ When n is zero, there is no function application. Otherwise, the function $K (K I)$ , which is the constant function for false, would be called (perhaps repeatedly).

Ordering

$L_{e q} := λab . I_{s 0} (S_{u b} ab)$ This definition uses the property discussed in subtraction.

$E_{q} := λab . A_{n d} (I_{s 0} (S_{u b} ab)) (I_{s 0} (S_{u b} ab))$ If $a \leq b$ and $b \leq a$ , then $a = b$ .

$G_{t} := λab . N_{o t} (L_{e q} ab)$ If $\neg (a \leq b)$ , then $a > b$ .

$G_{e q} := λab . O_{r} (G_{t} ab) (E_{q} ab)$ If $(a > b) \lor (a = b)$ , then $a \geq b$ .

HTTP Proxy

Introduction

This project is meant to learn and understand how the internet works. This simple HTTP proxy server forwards all client requests to the targeted server, and provide an additional layer of security for the users.

Objectives

On a HTTP request to the server, the client establishes a TCP connection with the server. However, in order to do so, both client and server have to know the TCP port number of the other side. Similarly, to establish a TCP connection, IP packets containing information of the IP address of both end will be sent between them. Both connection ends will know the IP address and TCP port of the other end.

This can be a problem for the client. The location of the device can be easily figured out using IP lookups, and the server may respond differently based on the region/area of the client. For example, the same product on the same website might offer a different price for customers in Canada in comparison with the ones in Russia. Which is against the will of customers for goods of lower prices.

By hiding the IP address and TCP port information of the client from the server, proxy servers provides an additional level of anonymity to the clients.

Request

The initial request from client would be parsed to obtain the designated address or domain name, and the proxy server would establish a TCP connection with the given host. Immediately after this, the proxy server would modify the request and send it to the service provider.

URL

In a HTTP request from any client, a URL in absolute form is provided; however, the absolute form contains the domain name of the server, which often shows up in proxies connections. However, the goal is to prevent the server's acknowledgement of the proxy, so the proxy server would replace the entire URL with the requested path and the query string.

Proxy Parameters

The client would also include proxy parameters in the header of the request, such as Proxy-Connection or Cache-Control to configure the connection between the proxy and the client. However, this information does not need to be forwarded to the server, and hence, they are removed as well.

Error

The proxy server is required to handle some errors, such as an invalid address or domain that cannot be reached, or the client is requesting resources of the proxy server itself. These errors should be handled and returned to the client, then the connection would be terminated.

Additional Information

Click link for more information about HTTP Requests.

Response

All responses from the server would be blindly forwarded back to the client.

Environment

Running on macOS, this project is implemented using the language and version of C98 with the standard C library, r. The compiler used is Apple clang version 15.0.0.

Client Connection

In the main function, a TCP listener would be created and binded to the given address. After initialization, the proxy server would accept incoming client connections using a while loop and call the handler function with the file descriptor of the TCP connection.

For all clients, their initial request would be parsed to obtain the designated host and the proxy parameters, then this information would be removed from the request. This step is done by if-else statements with the strstr function provided by the standard C library. This is not the best approach for large scale application as it would result in lengthy code and poor readability; however, this proxy server only looks for a limited amount of parameters while leaving others untouched, this approach is a much simpler method to implement.

The function would then create a TCP connection to the requested server parsed in the previous step. Immediately, the modified request would be sent to the designated server.

After the initial request, a while loop would continuously attempt to read from both ends with blocking. If a message is received from either ends, then the message would be blindly forwarded to the other end. In the scenario where any of the connection is closed or broken, the proxy would close both connection and the handler function would return.

Concurrency

In the case when a connection is maintained for a while, the thread would be blocked to wait for incoming messages of the existing connection, thus, no new clients can be accepted, and only one client can be serviced at a time.

To solve this problem, when TCP connection is accepted, a new thread would be created to run the handler function with the TCP file descriptor supplied. Each thread handles one client.

Documentation

There are no function documentations.

Code

Style

The code is not properly formatted and documented for readability. It is filled with long functions with heavily nested if-else statements, and the control flow is not clear.

The messages printed to standard inputs and error are not properly formatted. In addition, the error messages do not contain enough information and no stack trace of any sort is presented.

Few comments are presented.

Robustness

There is neither error handling nor bound checking for indexing. An array is statically allocated to store pthread_ts created, but the index used to record the number of threads created grows with bound checking. It would definitely cause an overflow when the total amount of clients exceeds the limit. There is also no maintaining functionality for the array to continuously check the status of the threads and removed them from array when the thread is joined. This made the problem worse. However, this is simple to implement, and this project is meant for learning, so there won't be a huge amount of requests.

Testing

There are neither function tests nor unit tests. All tests performed were done by executing the compiled program and sending requests with the Google Chrome.

WiFi Controlled Robot

Concept

Remote controlled rescue and reconnaissance vehicle used in situations where people are unable to help. Programmed and controlled through Arduino, and contains various sensors for collecting data. The robot is controlled over the WiFi by a computer.

Applications

Rescue and reconnaissance are two of the future applications. There are many times when wired robots cannot be used in rescues and reconnaissance, because the wires would tangle up when moving around. It is very difficult and time-consuming to untangle them remotely, which wastes time in a rescue. Time is life.

Due to the limitations of Bluetooth, rescue and reconnaissance robots have a limited communication bandwidth. This limits the amount of sensors and equipment the robots can use, such as 360° camera, microphone, earthquake sensors, infrared temperature detection camera, etc... Not to mention the accelerating development of the internet and wireless internet communication. Wireless and high speed are the advantages of a network controlled robot that Bluetooth and wires cannot have and characteristics that make network controlled robots stand out. With that said, there are way more applications of this technology in the future than just simply a rescue and reconnaissance vehicle.

Log #1 - December 15th, 2021

What was done

I downloaded and installed the libraries for the ESP8266, then I tested it with a simple LED blinking code.

Brainstorm/Reflection

The importance of device drivers are crucial. They are responsible for the communication between the operating system and the devices. It is impossible to use the full abilities of the devices without the help of the drivers. The basic drivers that UEFI loads during the DXE stage can help operating system to utilize most devices, but certainly not all, and definitely don’t push the devices to their full capabilities.

It is important to look at the specifications of the products when purchasing a cable, because some only have two wires inside for power delivery and no wires for data transmission.

Issues and Fixes

The Arduino IDE was not able to detect the USB port connecting to the ESP8266. So I searched for the driver online and installed it. It still didn’t work. I looked for clues on the internet, and an article suggests that it might be the cable that I was using was a power delivering cable, which is not able to transmit data. So I replaced that cable with another one. The Arduino IDE was able to detect the port and upload the code.

Accomplishments and Next Steps

I tested the ESP8266 with a simple LED blinking code. Next step is to connect it to the Wi-Fi.

Log #2 - December 16th, 2021

What was done

I researched about Wi-Fi connections on ESP8266, looked at documentation of the ESP8266 from the official website of the manufacturer, and also found an Arduino library (ESP8266Wi-Fi) online. I wrote some code to connect it to my own Wi-Fi. It scans all the nearby available SSIDs and their signal strength.

Brainstorm/Reflection

Using Wi-Fi in this robot is a better choice than Bluetooth and wired-controlled. This conclusion stands because of multiple reasons. Firstly, Wi-Fi can provide faster communication between the robot and the controller, which allows more control of advanced sensors and rescue equipment that requires bandwidth. Secondly, using Wi-Fi allows the controller to control multiple robots at the same time which Bluetooth does not allow. Thirdly, the ability to connect to Wi-Fi means that the robot can be connected into the internet, which allows distant controls of the robots, unlike Bluetooth which would have a significant latency to convert the Bluetooth data. Even Though wired robots also have the advantages listed above, the data transmission wires of wired robots might tangle up, which has no benefits in a rescue. On the other hand, Bluetooth’s low power consumption is one of its advantages, but the rescue robot is big enough to fit a battery inside. In addition, the ESP8266 can turn off the antenna (the different sleep modes) to save battery.

Issues and Fixes

No issues today.

Accomplishments and Next Steps

I tested the ESP8266 by scanning nearby Wi-Fi routers and connecting it to my Wi-Fi. Next step is to make a connection to my computer.

Log #3 - December 17th, 2021

What was done

I researched about creating a TCP connection using the ESP8266Wi-Fi library, and wrote some code for the arduino to establish a TCP connection. I wrote some C code on my computer to establish a TCP connection with the robot using sockets.

Brainstorm/Reflection

The reason that I chose TCP instead of UDP connection is that I need the commands and data to be transferred in order, and ensure that my commands from the controller (computer) are received by the robot. This is a great advantage in rescue, because oftentimes, the robot is not in sight of the controller, which means that the controller would have a difficult time figuring out which commands are executed or not.

If I am able (if I have time) to put a camera on the robot, I would use a UDP connection for the video data instead. The reasoning behind this is that it is more about speed and latency than the smoothness of the images when it comes to live video connections. If a frame is lost during transmission, simply ignore it. Rescue teams need the newest images instead of the old ones.

Issues and Fixes

No issues today.

Accomplishments and Next Steps

I connected the ESP8266 to my computer through a TCP connection. Next step is to control it with my computer.

Log #4 - December 31th, 2021

What was done

I configured my VS Code for Arduino. I wrote a lot of code for the communication between the controller and the robot. I tested the communication and execution of commands. Then, I had made huge amounts of changes to the code for better communication supporting commands, messages, data etc…

During the winter break, Henry had typed out the documentation for the summative assignment. In the six hundred words documentation, he briefly mentioned the history of the technology, current and future applications and technology, as well as the possible improvements.

Brainstorm/Reflection

The choice of IDE is important in a project, especially in a large project. The coder (me) had to spend most of my time with the IDE to code. An efficient IDE would save a lot of time and energy. The Arduino IDE had a very limited amount of functionalities compared to VS Code. I found it annoying scrolling through the code that I didn’t want to look at. However, with VS Code, it shows the declarations when the mouse hovers over it and jumps to the line of code I wanted to see with a single click.

Issues and Fixes

It was not able to send messages and data. To solve this problem, the controller has to send an extra byte stating the type (command, message or data) of the data before sending the actual data.

There was a problem with sending messages between the robot and the controller. When the robot or the controller receives a message, it doesn’t know the length of the message/data. To solve this problem, two extra bytes of data have to be sent before the actual data is being sent to tell the receiver the length of the message. So the receiver would not receive the next command or data that is not relevant to the data that the receiver is handling.

Accomplishments and Next Steps

The robot is able to receive commands from the controller and execute them.

Log #5 - January 10th, 2022

What was done

I added a lot of code to handle errors for the controller, including the return of status from the robot, and the error handling when setting up the controller. It is much easier to debug.

Testing the error handling functionality by trying to allocate a huge amount of memory

Brainstorm/Reflection

Error handling is a big part of this rescue robot. If an error occurs during rescue, there is not much time for engineers to find out what happened and fix the problem. The robot has to tell the controller everything about its status.

Issues and Fixes

There was a bug in the error handling code. I later found out that it was because a big bracket was missing in the code, but the indentation was correct. It took quite a while to find that bug.

Accomplishments and Next Steps

It is able to handle errors.

Log #6 - January 11th, 2022

What was done

When I tried to work on the ESP8266, one of the diodes for supplying power was killed from high voltage (9 volts). I measured it with a multimeter. The voltage across the diode was low (around 1 volt). I try to remove the diode and replace it with a piece of copper wire, but it is too small to work with. Luckily, the ESP8266 can still run with battery, and I have done most of the coding for it.

Brainstorm/Reflection

Voltage protection is one of the most important parts of modern day computers. It is there to protect circuits and electrical components from damage. I remember when I was little, computers were more fragile. There are many grounding and temperature protections. That was the only classroom with air conditioning.

Issues and Fixes

One of the diodes on the ESP12e development kit was damaged. I was not able to fix it.

Accomplishments and Next Steps

I killed one of the diodes.

Log #7 - January 12th, 2022

What was done

I found out that my router was assigned with a public IP address, which allows me to connect to it from the internet with port forwarding. I looked up the public IP address and configured port forwarding on my modem. I asked Henry to request the IP address and the specific port on his browser. The ESP8266 was able to receive the requests, even though it cannot decode nor parse the requests.

Brainstorm/Reflection

There is not enough IPv4 address for every device on Earth. Engineers came up with two solutions, one of them is IPv6, and the other is Network Address Translation (NAT). NAT is based on IPv4, not all devices are assigned with a public IP address. NAT translates the private IP address into a public IP address that is being shared with other devices. There are a lot of changes and time needed to transition to IPv6, but since NAT is based on IPv4, it takes less time and money to use NAT than IPv6.

One of the drawbacks of NAT is that it doesn’t allow a device to access another device with a private IP address though the internet. Address translation happens at the modem, which I have access to. I can set up port forwarding on my modem, so when a device tries to access the specified port of the modem, the request would be forwarded to ESP8266.

It is lucky to have NAT instead of a double NAT. In a double NAT situation, I have to set up port forwarding in routers where NAT happens, but I do not have access to all the routers. In that case, Henry would not be able to access my ESP8266.

Issues and Fixes

No issues today.

Accomplishments and Next Steps

It is able to connect to the internet.

Log #8 - January 13th, 2022

What was done

I realized that I had made a big mistake in using the C language. I should have used other programming languages instead for cross-platform compatibility. So I read the documentation of Microsoft about creating sockets on Windows, and tried to write some code.

Reading the Windows API

Brainstorm/Reflection

Compatibility and portability are important. It saves a lot of time rewriting code just to run the same functionality on another computer.

Issues and Fixes

I wrote my code on macOS, which means that the code cannot compile on the Windows platform. I tried to learn about coding Windows programs and actually write some code for it.

Accomplishments and Next Steps

Learned a bit about Windows.

Log #9 - January 14th, 2022

What was done

I wrote more code for turning the motors using the motor driver on the ESP8266 development kit and tested it. It is able to go in a straight line. I also wrote code for the ultrasonic sensor.

Brainstorm/Reflection

The reason that the robot was not able to move forward in a straight line is that the motors were spinning at different speeds, even though the PWM signals used to control the L293D chip were identical. There are many factors contributing to this issue. The major factor is the errors during manufacturing. The motors and wheels are not manufactured perfectly, resulting in this issue of the robot.

Issues and Fixes

The ultrasonic sensor kept returning weird values, later found out that it was because of a wrong value in the code.

Accomplishments and Next Steps

The robot is able to go in a straight line and detect distances with the ultrasonic sensor.

Log #10 - January 23rd, 2022

What was done

I had rewrote the code for it to run on Windows operating system, and tested it on a virtual machine. I tested the wheels of the robot, and recorded data to find out the relationship between runtimes and distances, and constructed a function for the robot to travel a specific distance with the data collected.

Plotted Data Recorded

Brainstorm/Reflection

I measured the diameter of the wheel and used that to derive a function for computing the time needed. However, because of the various errors including manufacturing errors, I still have to fine tune the code and values, which would take a lot of time.

Issues and Fixes

The code was not able to read a file, later found out that it is because of a ‘+’ that should not be there.

Accomplishments and Next Steps

The robot is able to go to a specific location.

Log #11 - January 24th, 2023

What was done

I ran the program on a computer from a different network, the robot is controlled with commands.

Brainstorm/Reflection

Issues and Fixes

Accomplishments and Next Steps

Simple testing for long range.

Log #12 - January 25th, 2022

What was done

I did some optimizing to the code to improve both readability and performance, and tested it.

Brainstorm/Reflection

Oftentimes, the use of switch statements is a better choice than if-else statements. In terms of readability, switch statements give a sense of matching, which is exactly what is needed in most scenarios. Switch statements would create a table-like structure to store the addresses of jumps. This way it can jump directly to the right address and execute code faster. It has two advantages, firstly, switch statements do not evaluate all predicates, and secondly, CPU instruction pipelines will not be disturbed multiple times due to branch prediction failures.

Issues and Fixes

The program stopped when reading the help menu file. I later found out that the default buffer size was too small, so I fixed it, and the problem disappeared.

Accomplishments and Next Steps

I did some optimizing of the code. Next step is to compile it into an executable file.

Documentation

Many of the videos and images of the project were lost and cannot be documented in the engineering log.

Firewall

Under certain networks, the connection might be identified as a threat by the firewall and be terminated the connection between the client and the robot. This problem occured when testing under the TDSB (Toronto District School Board) network.

Code

Style

The code is not properly formated and documented for readibility. It is filled with long functions with heavily nested if-else statements, and the control flow is not clear.

Robustness

The error handling only does not cover all possible exceptions.

Testing

There are neither function tests nor unit tests. All tests performed were done by executing the compiled program.

Compatibility

There were issues with cross-platform compatibility, the Win32 API is different from the API for UNIX like operating systems, and code had to be re-written for the same functionality on a different operating system.

racketx

Link to repository: https://github.com/jianxun-p/racketx

Brief

This project is a meant for learning purpose. After taking the course CS135, which I learned Racket, I was inspired by the concepts of functional programming and the essentials of Racket, so I started this project that allows me to write Racket code in Rust.

Macros

Macros and meta programming are considered one of the significant part of the Rust programming language, it avoids many repetitions of code and embeds Domain Specific Languages. I started off with using macro_rules, which provided extensive pattern matching capabilities, but it was not enough for this complicated task. As a solution, I turned towards using proc_macro. It was very powerful, yet, it consumes surprisingly more time to compile.

Example

Here is an example of using the racket macro.

extern crate racketx;
use racketx::racket;
pub fn main() {
    racket!{
        (define add_2_num (lambda (x y) (+ x y)))
        (define result (add_2_num 239 (- 9)))
    };
    assert_eq!(result, 230);
}

Features

definition of constants
lambda
integer operations (+, -, *, /)
negative integer (alternate solution: express with subtraction, e.g. -8 would be (- 8))
non-integer
conditional statements
lists
function definitions (alternate solution: use constants definition with lambda)
modular arithmetic

TLS

Introduction

This implementation of the TLS protocol is meant for learning purposes.

Stage 1

Brief

At stage 1, this project implemented the server end of TLS1.2 with C++20. However, the build up of shit code made it nearly impossible to continue after successfully establishing the first TLS connection and deciphering the very first application data. As a result, the project was re-started.

TLS version 1.2

In the inital Client Hello handshake message, the client will specifiy all supported cipher suites and public crytographic parameters, it also includes a 32 bytes of random data and a sequence of bytes for session ID. The random bytes would be used later in shared-key derivation.

The server will respond with a Server Hello message, it also contains 32 random bytes and session ID, but it would choose one cipher suite and the required public crytographic parameters. The server would also send a Certificate signed by a CA to the client. The certificate contains a public key for the clients to encrypt a pre-master secret. Only the server is capable of decrypting the pre-master secret due to the properties of asymetric encryption, and the certificate is signed with the private key of the CA, which nobody acknowledges, so by using the public key of the CA, held by the operating systems, it is sufficient to assure that the public key contained inside the certificate has not been modified by any attackers in the network. On top of that, since only the server has the private key, nobody else except the client and the server knows the pre-master secret.

The pre-master secret itself is not long enough to be directly used as session keys, and to avoid bad implementation of TLS that use the same pre-master secret repeatedly, the pre-master secret with be extended to obtain the master secret using the random bytes sent. If the extended-master secret extension was specified in the client hello and server hello messages, all handshake message would also be included to add an additional level of security.

After key negotiation, both the server and client will send a Change Cipher Spec handshake message to indicate that all messages beyond would be encrypted with the session key. At the end of the handshake, it is required to send a hash of all messages encrypted with the new session key to ensure that the handshake messages are not modified by attackers and to ensure that there are no errors in key calculation of both sides.

Features

There is limited amount of features and support, and the cipher suites supported are not the most secure; in fact, it only supports RSA key exchange and AES in GCM mode for stream cipher using SHA2 as the hashing algorithm.

Learn

wireshark
openssl

Reflection

Issues

The structure and style of code resulted in poor readability. The comprimises made during development that were supposed to be fixed have never been revisted. The extensive use of stack memory, pointers and casting for fast implementation made it difficult to follow. This bad habit resulted in extremely unsafe code for deployment; in fact, segmentation fault and stack buffer overflow occured several times. The documentation can be improved by specifying the side effects of the code. The structure of the implmentation is done well. Few functions and classes are responsible of long heavy tasks. Despite the improvement of documentation from previous projects, the style cannot keep up with the growing size of this project.

Jianxun's Blog