Debian uses LDAP for storing information about users, hosts and other
objects. The wrapping around this is called userdir-ldap, or ud-ldap
for short. It provides a mail gateway, web UI and a couple of schemas
for different object types.
Back in late 2018 and early 2019, we (DSA) removed support for ISO5218
in userdir-ldap, and removed the corresponding data. This made some
people upset, since they were using that information, as imprecise as
it was, to infer people’s pronouns. ISO5218 has four values for sex,
unknown, male, female and N/A. This might have been acceptable when
the standard was new (in 1976), but it wasn’t acceptable any longer in
2018.
A couple of days ago, I finally got around to adding support to
userdir-ldap to let people specify their pronouns. As it should be,
it’s a free-form text field. (We don’t have localised fields in LDAP,
so it probably makes sense for people to put the English version of
their pronouns there, but the software does not try to control that.)
So far, it’s only exposed through the LDAP gateway, not in the web UI.
If you’re a Debian developer, you can set your pronouns using
echo "pronouns: he/him" | gpg --clearsign | mail changes@db.debian.org
I see that four people have already done so in the time I’ve taken to
write this post.
JP was puzzled that using podman run --memory=2G … would not result in the 2G limit being visible inside the container.
While we were able to identify this as a visualization problem — tools like free(1) only look at /proc/meminfo and that is not virtualized inside a container, you'd have to look at /sys/fs/cgroup/memory.max and friends instead — I couldn't leave it at that.
And then I remembered there is actually something that can provide a virtual (cgroup-aware) /proc for containers: LXCFS!
But does it work with Podman?!
I always used it with LXC, but there is technically no reason why it wouldn't work with a different container solution — cgroups are cgroups after all.
As we all know: there is only one way to find out!
Take a fresh Debian 12 VM, install podman and verify things behave as expected:
And after installing (and starting) lxcfs, we can use the virtual /proc/meminfo it generates by bind-mounting it into the container (LXC does that part automatically for us):
The same of course works with all the other proc entries lxcfs provides (cpuinfo, diskstats, loadavg, meminfo, slabinfo, stat, swaps, and uptime here), just bind-mount them.
And yes, free(1) now works too!
bash-5.1# free -m total used free shared buff/cache availableMem: 2048 3 1976 0 67 2044Swap: 0 0 0
Just don't blindly mount the whole /var/lib/lxcfs/proc over the container's /proc.
It did work (as in: "bash and free didn't crash") for me, but with /proc/$PID etc missing, I bet things will go south pretty quickly.
The late Pope Francis asked a group of approximately four hundred
bishops to work together from 2021 to 2024 on a review of how people of
Catholic faith interact and advance as a movement. In formal
terms, this committee of bishops was given the title
Synod on Synodality. The term Synod is used widely
in all Christian religions to refer to committees, boards or meetings of
those groups at any level of the church heirarchy. The term
Synodality is specific to the Catholic Church. The Synod has
an official web page where they
attempt to explain Synodality.
Various working groups were created on a wide range of topics. In this
review, I am only looking at working group three, which examined the topic
the mission in the digital environment. I then go on to
provide some of my own evidence about the topics the working group
is considering.
Even
amateur radio packet repeaters are in scope although
amateur radio licensing doesn't allow the explicit transmission of
religious material.
The Vatican was an early adopter of shortwave radio. Pope Leo XIV and
Monsignor Lucio Adrian Ruiz, secretary of the Dicastero per la Comunicazione
visited Vatican Radio's broadcasting facility this week:
Reading the outputs from both the working group and the overall
Synod, I feel that the church as a whole did not decide to either
embrace or reject
social control media. They are acknowledging
that it is part of the digital landscape and trying to decide how
the church relates to it.
How the Synod process evolved at a high level
Before delving into the details, here is an overview of
the process and the reports that came out at different times,
with direct links to the translated editions.
The main web site for the Synod is at
www.Synod.va and it is available
in various languages. It appears that the content was created in
Italian and translated to English and other languages. This makes
it a little bit more difficult to read.
There was an extended gathering in Rome in October 2023 where
an initial draft report was produced.
Key points from the final report as it relates to the digital environment
At point 58, the report notes that Christians may be attempting to
proclaim the Gospel through their participation in a digital environment.
58. ... Christians, each according to their diverse roles - within the family and
other states of life; in the workplace and in their professions; engaged civilly, politically,
socially or ecologically; in the development of a culture inspired by the Gospel, including the
evangelisation of the digital environment - walk the paths of the world and proclaim the Gospel
where they live, sustained by the gifts of the Spirit.
59. In doing so, they ask the Church not to abandon them but rather to enable them to feel
that they are sent and sustained in mission.
This point appears to encourage the church to contemplate the situation
faced by those under the influence of a digital environment but it does not
necessarily imply the digital environment is good or bad.
At point 112, concerning mobility, which includes people from all levels
of society, the report notes:
Some maintain strong bonds with their country of origin, especially with
the help of digital media, and thus can find it difficult to form connections
in their new country; others find themselves living without roots.
This is an excellent observation. In Europe, I've met couples who
have relationships entirely dependent upon devices they use for
automated machine translation. When new people arrive in town, the
WhatsApp culture encourages neighbors to spend weeks or months talking
behind their backs without ever looking them in the eye.
113. The spread of digital culture, particularly evident among young people, is profoundly
changing their experience of space and time; it influences their daily activities, communication
and interpersonal relationships, including faith. The opportunities that the internet provides are
reshaping relationships, bonds and boundaries. Nowadays, we often experience loneliness and
marginalisation, even though we are more connected than ever. Moreover, those with their own
economic and political interests can exploit
social media to spread ideologies and generate
aggressive and manipulative forms of polarisation. We are not well prepared for this and ought
to dedicate resources to ensure that the digital environment becomes a prophetic space for
mission and proclamation. Local Churches should encourage, sustain and accompany those
who are engaged in mission in the digital environment. Christian digital communities and
groups, particularly young people, are also called to reflect on how they create bonds of
belonging, promoting encounter and dialogue. They need to offer formation among their peers,
developing a synodal way of being Church. The internet, constituted as a web of connections,
offers new opportunities to better live the synodal dimension of the Church.
This paragraph acknowledges the dangers of digital technology, especially
social control media and the key words are
"We are not well prepared for this". Yet it suggests that local churches
should "encourage" more of these online risks. I don't feel the word
"encourage" is the right word to use but I don't think they should
discourage either.
149. The synodal process has insistently drawn attention to some specific areas of
formation of the People of God for synodality. The first of these concerns the impact of the
digital environment on learning processes, concentration, the perception of self and the world,
and the building of interpersonal relationships. Digital culture constitutes a crucial dimension
of the Church’s witness in contemporary culture and an emerging missionary field. This
requires ensuring that the Christian message is present online in reliable ways that do not
ideologically distort its content. Although digital media has great potential to improve our lives,
it can also cause harm and injury through bullying, misinformation, sexual exploitation and
addiction. Church educational institutions must help children and adults develop critical skills
to safely navigate the web.
These comments are very relevant and very consistent with my own
testimony, some of which is reproduced later in this report.
150. Another area of great importance is the promotion in all ecclesial contexts of a
culture of safeguarding, making communities ever safer places for minors and vulnerable
persons.
When I raised this topic in the free software communities, my family
was attacked ruthlessly. See the
emails I sent at the end of 2017 and comments about IBM
Red Hat later in this
report.
Sources related to working group three, the mission in a digital environment
The Synod.va web site published as list of
all the working groups. The web site includes a brief video about
each group and a link to their most recent reports.
The video for working group three lasts a little bit less than two
minutes. Here are some of the key quotes and my own observations:
"Today, people, especially the young, have learnt to
live simultaneously and seamlessly in both digital and
physical spaces."
I feel that statement is quite wrong. People have learnt how to use
digital spaces. One recent research report suggests that
nearly seventy percent of young people feel bad after using social media.
In other words, they feel pressured into using it. Therefore, they
are not living seamlessly. People are suffering.
The statements made in the video are not the statements
presented in the final report. We will get to that. Nonetheless, whenever
social control media is mentioned, there is a tendency for
people to make these generalisations about being unable to live without
it. Every time we see a statement like this, it is important to
challenge it.
"How does the church use and approriate the digital culture?"
The rhetorical question is interesting. In reality, the Silicon
Valley overloads use and appropriate any content that we give them.
The church doesn't use them, they use us. How do you think they got
so rich?
A better question might be "how does the church
complement the shortcomings of digital cultures?".
"This environment
is now “indistinguishable from the sphere of everyday life.”",
Pope Francis was a smart guy and he had some smart people around him,
including the late Cardinal Pell. We can trace that quote right back to the
thinking of Alan Turing. Turing is considered to be the grandfather of computer
science and a martyr. Turing gave us exactly the same concept in the
legendary Turing test, which Turing himself called the imitation game in
1949.
Another way to interpret this phenomena is to say that the masses
have been brainwashed by the Silicon Valley overlords.
The choices being made by
Facebook’s leadership are a huge problem — for children, for public safety,
for democracy — that is why I came forward. And let’s be clear:
it doesn’t have to be this way. We are here today because of
deliberate choices Facebook has made.
The summary from the working group goes on...
"To proclaim the Gospel effectively in our contemporary
culture, we must discern the opportunities and challenges
presented by this new dimension of the “place”"
That particular quote acknowledges that there are both
opportunities and challenges. The jubilee year is all about hope
and I really hope the working group members are reading the stuff
from whistleblowers, child psychologists and
even coroners who are warning us about the impact of Facebook and their ilk.
Nonetheless, the report includes the phrase "greater immersion"
and I feel the church should not assume "greater immersion" is a default
course of action.
The summary also touches on the concept of jurisdiction. The
Catholic Church has traditionally organized itself on a geographical
basis. The Internet allows people to connect and form virtual
communities without any geographical connection.
On a sidenote, in the days before the Internet, the church was
able to move high-risk priests from a parish on one side of the city
to the other side of the city and not worry about anybody joining
the dots. I went through the papers from Australia's Royal Commission
meticulously and found this note from the legendary Father X___:
That means that if anyone in Australia, learning that
Father Z___ had treatment because of something that happened in Boston
and going there to find out, would run into a dead end.
The letter in question was penned just before the Internet came
onto public consciousness. Looking at those words today, it is a
stark reminder about how the Internet is tipping life on its head.
The working group goes on to comment that they are seeking
"practical recommendations or proposals" from across the community,
on any topic related to the Church's mission in the digital environment.
People engaged in the free software movement, whether they are
Catholic or not, can contact their local diocese to find out who
is locally coordinating the response to these challenges.
Another phrase that caught my eye:
"today we live in a digital culture"
Not exactly. Some people would say that a digital culture is being
imposed on us. Institutions like politics and the media are hooked on it
and they put it up on a pedestal. Therefore, it is even more vital that
other institutions, such as the church, take the role of questioning
everything about digital culture and also maintaining viable alternatives.
Life without mobile phones, life without apps
Mobile phones and apps are closely related. There are some people
who choose to live without a smart phone, in other words, they
only have half the problems of a full mobile phone. Some people also
choose to have smart phones without the Google or Apple app store,
for example, people who install the
Replicant or
LineageOS and use the
F-Droid app store to limit their phone to ethical apps.
In practical terms, there are people who are unable to navigate their
home town without using their phone. An interesting question arises
for the church, what proportion of followers are unable to identify the
most direct route from their home to their closest church without looking
at an app? It would be interesting to analyze the responses based on
various factors such as age and years of residence in the parish.
Another key question, closely related to the above, is how many
parishioners can recall regular mass times and key events in the parish
calendar without looking at their phone? It is great to have this
information visible on the parish web site, nonetheless, when
people are truly engaged in the parish and the community, this
information will be committed to memory. The more pervasive this
information is in a community, the more resilient the community.
Authentication systems undermining human dignity
Today we frequently see companies insisting they need to have
our mobile phone numbers to "authenticate" us or to "sign" documents
by text message.
This type of thing is particularly creepy. Many people are familiar
with the Nazi-era practice of burning identification numbers into the
skin of Jewish prisoners. Mobile phone numbers serve a similar
functional purpose. Even though the numbers are not physically
burnt into our skin, it is often inconvenient for people to change
their number.
There are many closely related phenomena, including web sites
demanding users authenticate themselves from a Gmail or Facebook
account.
At the level of the church, the state, education, health care and
financial services, it is vital to ensure everybody can participate
in the way they want to without giving up their dignity.
The church needs to become just as vocal about these topics
as it is about themes such as abortion.
Need to emphasize consent
Concerns about consent and coercion have become a big topic in
the world today. Ironically, the
social control media platforms
pretending to help give women a platform are violating the
principle of consent in so many other ways.
Consider, for example, people who spent time creating a profile
on Facebook or Twitter, sometimes over many years, connecting with
hundreds or thousands of followers and then being confronted with the
demand to add their mobile phone number to their account. If they
don't add their mobile phone number, their account is blocked. There
is no genuine technical reason to have a mobile phone number in the
account as many of these services worked exactly the same way for
many years before such demands became commonplace.
People are not freely consenting to share their phone numbers
with Mark Zuckerberg and Elon Musk. The services have been bastardized
to ambush their users with these demands.
Significantly, this culture of ambushing and coercing people
trickles down into society. In Australia, Chanel Contos started
a highly publicized petition/journal with stories from women at
elite private schools who felt they had been ambushed, bullied and
coerced into unwanted physical encounters.
Ironically, Miss Contos publicized her concerns through the very
same platforms that are undermining our understanding of consent and
privacy.
The church itself has had to do a lot of soul searching on topics
of consent and abuses of power. This puts the church in an interesting
position where we can say that even considering some of the most shocking
revelations about abuse, those responsible are the lesser evil compared to
the overlords in Silicon Valley.
It is remarkable how quickly the institutions of Silicon Valley have
abandoned all checks and balances and seen fit to do as they please.
The Catholic Church and other religious institutions can now
take what they have learnt from the critical analysis of their own mistakes
and warn society how stupid it would be to go down the same path again
with these digital gangsters.
Digital technology is much more than social control media
The church is not new to technology. Early printing presses
were installed in church premises. Caxton installed England's
first press at Westminster Abbey. Other sites included Oxford
and St Alban's Abbey. Prior to the printing press, reading and
writing were activities reserved for clerics and many of their
works only existed in Latin. The printing press enabled the
mass production of bibles in German and English languages. This,
in turn, had a huge impact on the standardization of the language
just as it helped standardize the moral attitudes that Silicon Valley
is ripping out underneath us. The King James Version of the bible is
widely recognized for its impact on the English language.
The standardization of language was only one side-effect of
this invention. The reformation was another. As people gained
books and the power of reading, they became less dependant upon
the clerics.
Likewise,
social control media today is having an impact on our culture,
for better or worse. Just as printing presses enabled the reformation,
social control media may lead to further changes in the way humans organize
ourselves around religious structures and beliefs. The overlords
in Silicon Valley are actively contemplating these roles for themselves.
Elon Musk has even dressed up as Satan. If the Catholic Church doesn't
offer a compelling alternative to these power shifts then it will
be taken out of the church's hands.
Frances Haugen (Facebook whistleblower): almost no one outside of Facebook knows
what happens inside Facebook. The company’s leadership keeps vital information from
the public, the U.S. government, its shareholders, and governments around the world.
The documents I have provided prove that Facebook has repeatedly misled us about
what its own research reveals about the safety of children, its role in spreading hateful
and polarizing messages, and so much more.
Whereas previous generations went to clerics for advice, followed
by reading the bible themselves, the youth today go to a search engine
and tomorrow people may be putting their faith in artificial intelligence.
We can already see evidence of search engines,
social control media and
AI bots guiding people to increased levels of conflict with their
neighbors or putting people on dark paths of isolation, self-harm and
suicide.
Catholic Church resources relevant to digital environment
Catholic Church has a big role in education and schools, therefore,
the church can see the impact of
social control media and the church can
enforce bans for children and provide training to staff and parents.
Teachers, as employees of the church or the state, have reported a
rise in bullying from parents who group together on messaging apps.
In one recent case,
British police sent six officers to humiliate a parent who had used
WhatsApp to agitate about the local
school. The conflict, the adversarial nature of this environment and
the huge waste of police resources are all consequences of the way
the technology is designed and used in society. Each incident like
this provides an insight about opportunities for the Catholic Church
to ask "is there a better way?".
Words from Frances Haugen help explain the six police officers
laying seige to the parents of small children:
I saw that Facebook repeatedly encountered conflicts
between its own profits and our safety. Facebook consistently resolved those conflicts
in favor of its own profits. The result has been a system that amplifies division,
extremism, and polarization — and undermining societies around the world.
The Catholic Church is a large employer in many countries.
This gives the church the ability to make decisions about the use
of mobile phones and messaging apps in the employer/employee
relationship. An employer can't prohibit staff from using these
things in their personal time but they can decide to eliminate
any official use of these gimmicks for work purposes. The employer/employee
relationship provides another opportunity to provide training about the
importance of human dignity above the demands of our devices.
The public agenda in the digital environment, abortion of our species
With many politicians and journalists now living their lives through
social control media, their ability to evaluate which issues are worthy
of public debate are heavily influenced by the issues that are supposedly
trending online. There is a notion that issues are trending online
as a consequence of public interest while the reality is the managers
of online platforms exert influence to ensure some issues appear
to grow organically while significant but inconvenient topics are
conveniently buried in the flood of news.
In this context, the Catholic Church provides an alternative
route to put issues on the agenda for public discussion, regardless of
whether a particular issue appears to be "trending" or not. This
power is most often used for issues close to the church's teaching,
such as lobbying about abortion, but there is no reason the church
can't use the same resources to lobby against the abortion of
the human race by AI.
Aid for victims of discrimination by Silicon Valley overlords and online
mobs
The Catholic Church traces its origins to the persecution of Jesus
and the martyrs Saint Peter and Saint Paul.
"But let us pass from ancient examples, and come unto those who have in the times nearest to us, wrestled for the faith. Let us take the noble examples of our own generation. Through jealousy and envy the greatest and most just pillars of the Church were persecuted, and came even unto death. Let us place before our eyes the good Apostles. Peter, through unjust envy, endured not one or two but many labours, and at last, having delivered his testimony, departed unto the place of glory due to him. Through envy Paul, too, showed by example the prize that is given to patience: seven times was he cast into chains; he was banished; he was stoned; having become a herald, both in the East and in the West, he obtained the noble renown due to his faith; and having preached righteousness to the whole world, and having come to the extremity of the West, and having borne witness before rulers, he departed at length out of the world, and went to the holy place, having become the greatest example of patience." (first epistle of Clement to the Corinthians, 5:1 - 5:7)
These words account for the persecution of Peter and Paul under
the Emperor Nero almost two thousand years ago.
Eight hundred years ago, the Magna Carta arrived and over time,
it is has inspired the US Bill of Rights, the Universal Declaration
of Human Rights and the abolition of capital punishment.
Yet today we see the Silicon Valley overlords wish to throw all of
that out the window and take us back to the time of Nero.
Everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits.
Everyone has the right to the protection of the moral and material interests resulting from any scientific, literary or artistic production of which he is the author.
When we look at the web sites of well known free software projects like
Debian and Fedora, we see them openly proclaiming their desire to censor
certain people. Anybody who speaks up about ethical issues in our
industry has been subject to these extreme reprisals from time to time.
The similarities between these cases and the growing list of victims
is clear proof that they are not random. There is a coordinated effort
to roll back or circumvent civil rights. If a digital space or digital
world does exist, then it is eerily similar to the world where Roman
Emperors used grisly executions to perpetuate control through fear.
The Catholic Church can seek out the victims who have been canceled,
victims who have been de-platformed and people who have
something to say about human dignity in the era of AI. Whether or not
these people are
Catholics or not, the concerns
that independent experts
have been trying to research and publicize need to be elevated above
the noise from public relations departments.
At the same time, the horrific impact inflicted on our families is
often hidden from public view.
Children in the digital environment
It is telling that we found very similar tactics used by
Harvey Weinstein and Chris Lamb, former leader of the Debian Project.
This is significant because Lamb was trained through the Google
Summer of Code and funded by Google, including a large payment of
$300,000 shortly before three victims revealed the scandal.
Despite Debian's promise of transparency, the money was only revealed
more than six months later and Google's name is never publicly
connected to the numbers.
When Weinstein had concerns about the behavior of some women,
he would send nasty rumors about "behavior" to other people in the
industry. There's something snobby about these attitudes to
human behavior.
When women made complaints to the police, the film director
Peter Jackson spoke up and
confirmed Weinstein had been using these dirty tricks,
spreading rumors about behavior of women who were not
submissive enough for his liking.
"I recall Miramax telling us they were a nightmare to work with and we should avoid them at all costs. This was probably in 1998," Jackson said.
"At the time, we had no reason to question what these guys were telling us - but in hindsight, I realise that this was very likely the Miramax smear campaign in full swing."
A range of people have come forward showing that Chris Lamb was doing
exactly the same thing in his role at Debian. Under copyright law,
co-authors do not have any obligation to the person elected to
serve as Debian Project Leader from time to time. We are all equals.
Subject: Re: Debian Developer status
Date: Tue, 18 Dec 2018 10:36:09 +0900
From: Norbert Preining <norbert@preining.info>
To: Daniel Pocock <daniel@pocock.pro>
Hi Daniel,
even if, going through a lawsuite like this in the UK is out and above
my abilities and financial possibilities.
But I am scared that Lamb actually also hosed an application for a
company in NY, a job related to Debian. If that has happened, and I can
reasonably document it, I would consider a defamation law suite.
> Lamb is a resident of the UK and sending emails from the UK
> https://regainyourname.com/news/cyberbullying-cyberstalking-and-online-harassment-a-uk-study/
Thanks for the links, I will keep them in mind.
Norbert
--
PREINING Norbert http://www.preining.info
Accelia Inc. + JAIST + TeX Live + Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
Even more disturbing, Lamb started his attacks on my family at
the very same time that Cardinal George Pell was convicted in 2018.
My second cousin had been a member of Cardinal George Pell's former
choir in Melbourne. Lamb and his co-conspirators, funded by Google,
started anonymous rumors about abuse.
Multiple people came forward with evidence that Lamb was behaving
like Weinstein, spreading the rumors behind our backs. When
Dr Preining and I spoke up, a third victim saw the scandal and
identified himself publicly on Christmas Day:
Subject: Re: Censorship in Debian
Date: Tue, 25 Dec 2018 23:44:38 +0100
From: martin f krafft
Organization: The Debian project
To: debian-project@lists.debian.org
Hello project,
It's very sad to read about what's going on.
I know that there's been at least another case, in which DAM and AH
have acted outside their mandate, threatening with project
expulsion, and choosing very selectively with whom they communicate.
I know, because I was being targeted.
Neither DAM nor AH (the same people still active today) made
a single attempt to hear me. None of my e-mails to either DAM or AH
were ever answered.
Instead, DAM ruled a verdict, and influenced other people to the
point that "because DAM ruled" was given as a reason for other
measures. This was an unconstitutional abuse of DAM's powers, and in
the case of AH, the whole mess also bordered on libel. Among others,
the current DPL Chris Lamb promised a review in due time, but
nothing ever happened.
... [ snip ] ...
Yet if it is not safe for the engineers who make this technology,
it is certainly not safe for kids.
On 5 October 2021, I raised the concerns about children in this culture
with the report
Google, FSFE & Child Labor.
Red Hat, a subsidiary of IBM since 2019, started legal action to
censor and discredit my concerns. They accused me of
bad faith for publishing that article. Yet the legal panel ruled that
Red Hat was harassing me and engaged in an abuse of the the
administrative procedure.
The irony, of course, is that the Cardinals wear red hats, like the
name of the company
Red Hat who
were found to be abusing me. Chris Lamb
at Debian had started the rumors about my family when
Cardinal Pell was convicted.
The manner in which this intersected our lives and our faith,
the abuse rumors after the late Cardinal Pell's conviction,
my visit to the Carabinieri on the day the Cardinal died,
the wedding day, on Palm Sunday, being a copy-cat (unconfirmed) suicide,
the crucifixion of Dr Stallman at Easter and
the Debian Christmas lynchings, it is staggering. As they say in crime
movies, follow the money.
Digital environment subjects parishioners to third-party surveillance
The Catholic Church was born out of persecution and it has to be
remembered that surveillance is a cornerstone of persecution.
The fact that the largest services, like Google, Facebook and Twitter
are all ostensibly free is proof that they gain all of their profit
from their ability to conduct effective surveillance and manipulation
of the population.
At one time, the church used to fulfil similar roles. Followers
would submit themselves to a form of surveillance through the sacrament
of confession, where they would receive counsel from their priest.
Priests seek to exert some influence from the pulpet, with the threat
of ex-communication and from time to time, the odd inquisition or
persecution of somebody who was ahead of his time like Galileo.
If tech companies can approximate all these functions so effectively
with algorithms, we run the risk that religion becomes redundant.
Therefore, attempting to perform the church's role through a medium
that is substituting itself for the role of religion is a lot like
digging one's own grave.
Through a series of public inquiries and whistleblowers, we've
heard the extent to which these overlords are stripping away our dignity.
Their goal is to anticipate our every decision, influence who we talk to,
influence how we vote and influence every last cent in our budget.
If every one of those decisions is controlled and even micromanaged
for us, with scientific precision, right down to the last cent in our
bank account each month, by the influence of algorithms,
what space is left in our consciousness for the influence of the Gospel?
Mission: remaining relevant
Therefore, the question assigned to the working group about the
mission in the digital environment
could be rephrased as how does religion, of any nature, remain
relevant at all?
For many families in affluent cultures today, the church is engaged
out of tradition for weddings, funerals and sometimes education for
the children.
For the church to empower parishioners with technology, rather than
losing parishioners to technology, we need to ask questions about some
of the topics raised by the free software movement.
How to ensure each person has full control over their devices,
including right to repair and right to change the operating system.
Develop strategies to protect people from the risks of technology.
For example,
social control media allows small but very noisy groups to
do intense harm to their victims with the deliberate and repeated spread
of gossip and defamation. It is becoming harder and harder to ensure that
no person or minority is excluded by online vendettas. How to provide
support to people targetted by these toxic people?
How to ensure that every person and group can take their turn to speak?
Mission: protecting society from the same mistakes
Australia went through the process of having a Royal Commission
into abuses by a wide range of institutions, including the church.
Yet that was too late for many of the people who have either died or
lost their family members, health and careers. Wouldn't it be great
to make such strong interventions before rather than after catastrophic
failures have occurred? It is high time for the same level of scrutiny on
social control media bosses and the exploitation and manipulation
of the public on multiple levels.
Conclusion
Social control media is rapidly becoming a front for artificial
intelligence. As the Turing test (imitation game) has suggested
to us since 1949, it is inevitable that each new iteration of this
phenomena will become more and more indistinguishable from reality.
As such, it may present itself not only as a substitute for fellow
human beings but as an alternative
to the church. People may be duped into accepting it as their God.
In other words,
social control media may make the church irrelevant
and after it does that, it may go on to make humanity irrelevant.
Just look at the way people make faces at me after my father died.
The rudeness I experience on an almost daily basis started at a time of grief.
People are brainwashed to set aside even the most basic respect
for human dignity, the respect for a family at a time of grief
and it just becomes another opportunity to use each other for sport.
This aspect of my life was entirely created by
social control media
and the people who are defining that space in my own profession.
In her testimony to Congress, Frances Haugen told us:
I believe what I did was right and necessary for the common good — but I know
Facebook has infinite resources, which it could use to destroy me.
In 2018, I attended the UN Forum on Business and Human Rights in Geneva,
making some brief comments about Facebook and Twitter falling into the
wrong hands. The UN Forum occurred at the same time the jury was considering
the charges against Cardinal George Pell. Pell was convicted and these
social control media platforms filled up with rumors about my family
and I, the very phenomena Haugen herself seems to be afraid of.
A new minor release 0.2.6 of our RcppRedis
package arrived on CRAN today.
RcppRedis
is one of several packages connecting R to the fabulous Redis in-memory datastructure store (and
much more). It works equally well with the newer fork Valkey. RcppRedis
does not pretend to be feature complete, but it may do some things
faster than the other interfaces, and also offers an optional coupling
with MessagePack binary
(de)serialization via RcppMsgPack. The
package has been “deployed in production” as a risk / monitoring tool on
a trading floor for several years. It also supports pub/sub
dissemination of streaming market data as per this
earlier example.
This update brings new functions del, lrem,
and lmove (for the matching Redis / Valkey commands) which
may be helpful in using Redis (or Valkey) as a job queue.
We also extended the publish accessor by supporting text
(i.e. string) mode along with raw or
rds (the prior default which always serialized R objects) just how
listen already worked with these three cases. The change
makes it possible to publish from R to subscribers not running R as they
cannot rely on the R deserealizer. An example is provided by almm, a live market
monitor, which we introduced in this
blog post. Apart from that the continuous integration script
received another mechanical update.
The detailed changes list follows.
Changes in version 0.2.6
(2025-06-24)
The commands DEL, LREM and
LMOVE have been added
The continuous integration setup was updated once more
The pub/sub publisher now supports a type argument similar to the
listener, this allows string message publishing for non-R
subscribers
to /etc/rc.local. After that I only had to enable the Sensors plugin below Statistics -> Setup -> General plugins and check 'Monitor all except specified` in its "Configure" dialog.
Single signon is a pretty vital part of modern enterprise security. You have users who need access to a bewildering array of services, and you want to be able to avoid the fallout of one of those services being compromised and your users having to change their passwords everywhere (because they're clearly going to be using the same password everywhere), or you want to be able to enforce some reasonable MFA policy without needing to configure it in 300 different places, or you want to be able to disable all user access in one place when someone leaves the company, or, well, all of the above. There's any number of providers for this, ranging from it being integrated with a more general app service platform (eg, Microsoft or Google) or a third party vendor (Okta, Ping, any number of bizarre companies). And, in general, they'll offer a straightforward mechanism to either issue OIDC tokens or manage SAML login flows, requiring users present whatever set of authentication mechanisms you've configured.
This is largely optimised for web authentication, which doesn't seem like a huge deal - if I'm logging into Workday then being bounced to another site for auth seems entirely reasonable. The problem is when you're trying to gate access to a non-web app, at which point consistency in login flow is usually achieved by spawning a browser and somehow managing submitting the result back to the remote server. And this makes some degree of sense - browsers are where webauthn token support tends to live, and it also ensures the user always has the same experience.
But it works poorly for CLI-based setups. There's basically two options - you can use the device code authorisation flow, where you perform authentication on what is nominally a separate machine to the one requesting it (but in this case is actually the same) and as a result end up with a straightforward mechanism to have your users socially engineered into giving Johnny Badman a valid auth token despite webauthn nominally being unphisable (as described years ago), or you reduce that risk somewhat by spawning a local server and POSTing the token back to it - which works locally but doesn't work well if you're dealing with trying to auth on a remote device. The user experience for both scenarios sucks, and it reduces a bunch of the worthwhile security properties that modern MFA supposedly gives us.
There's a third approach, which is in some ways the obviously good approach and in other ways is obviously a screaming nightmare. All the browser is doing is sending a bunch of requests to a remote service and handling the response locally. Why don't we just do the same? Okta, for instance, has an API for auth. We just need to submit the username and password to that and see what answer comes back. This is great until you enable any kind of MFA, at which point the additional authz step is something that's only supported via the browser. And basically everyone else is the same.
Of course, when we say "That's only supported via the browser", the browser is still just running some code of some form and we can figure out what it's doing and do the same. Which is how you end up scraping constants out of Javascript embedded in the API response in order to submit that data back in the appropriate way. This is all possible but it's incredibly annoying and fragile - the contract with the identity provider is that a browser is pointed at a URL, not that any of the internal implementation remains consistent.
I've done this. I've implemented code to scrape an identity provider's auth responses to extract the webauthn challenges and feed those to a local security token without using a browser. I've also written support for forwarding those challenges over the SSH agent protocol to make this work with remote systems that aren't running a GUI. This week I'm working on doing the same again, because every identity provider does all of this differently.
There's no fundamental reason all of this needs to be custom. It could be a straightforward "POST username and password, receive list of UUIDs describing MFA mechanisms, define how those MFA mechanisms work". That even gives space for custom auth factors (I'm looking at you, Okta Fastpass). But instead I'm left scraping JSON blobs out of Javascript and hoping nobody renames a field, even though I only care about extremely standard MFA mechanisms that shouldn't differ across different identity providers.
Someone, please, write a spec for this. Please don't make it be me.
If we ever thought a couple of years or decades of constant use would get
humankind to understand how an asymetric key pair is to be handled… It’s
time we moved back to square one.
I had to do an online tramit with the Mexican federal government to get a
statement certifying I successfully finished my studies, and I found this
jewel of user interface:
So… I have to:
Submit the asymetric key I use for tax purposes, as that’s the ID the
government has registered for me. OK, I didn’t expect it to be used for
this purpose as well, but I’ll accept it. Of course, in our tax system
many people don’t require having a public key generated (“easier”
regimes are authenticated by password only), but all professionals with
a cédula profesional (everybody getting a unviersitary title) is now
compelled to do this step.
Not only I have to submit my certificate (public key)… But also the
private part (and, of course, the password that secures it).
I understand I’m interacting with a Javascript thingie that runs only
client-side, and I trust it is not shipping my private key to their
servers. But given it is an opaque script, I have no assurance about
it. And, of course, this irks me because I am who I am and because I’ve
spent several years thinking about cryptography. But for regular people,
it just looks as a stupid inconvenience: they have to upload two weird
files with odd names and provide a password. What for?
This is beyond stupid. I’m baffled.
(of course, I did it, because I need the fsckin’ document. Oh, and of
course, I paid my MX$1770, ≈€80, for it… which does not make me too
happy for a tramit that’s not even shuffling papers, only storing the right
bits in the right corner of the right datacenter, but anyhow…)
In 2021, the late Pope Francis started the
Synod on Synodality, a process which
finished with a final report in October 2024.
The
list of working groups includes a group dedicated to the challenges
of polygamy, especially in regions where the church may recruit new
followers who already have multiple partners in their family.
The final report from the Synod in October 2024 only mentioned
Polygamy once. It appears the working group didn't identify a way forward
that the bishops could agree on and it remains an open topic for the church.
Out of all Christian religions, the Catholic church is one of the most
strict in relation to polygamy. Catholic Catechism, para. 2387:
polygamy is not in accord with the moral law. [Conjugal] communion is radically contradicted by polygamy; this, in fact, directly negates the plan of God that was revealed from the beginning, because it is contrary to the equal personal dignity of men and women who in matrimony give themselves with a love that is total and therefore unique and exclusive.
Notice the word exclusive is part of the
Catholic definition.
In our modern world with
social control media and
artificial intelligence people's brains are being re-wired and this has
a direct impact on the way people form and perceive relationships.
It could be argued that some people are now so totally intertwined with
social control media that they no longer have an exclusive mental
bond with their real-world partner.
Facebook chooses what information billions of people see, shaping their
perception of reality. Even those who don’t use Facebook are impacted by the
radicalization of people who do. A company with control over our deepest thoughts,
feelings and behaviors needs real oversight.
In other words, Facebook's algorithms have become a third person in
many marriages. Facebook's algorithms are complementing the decisions
of parents over their children, and not in a good way.
I saw that Facebook repeatedly encountered conflicts
between its own profits and our safety. Facebook consistently resolved those conflicts
in favor of its own profits. The result has been a system that amplifies division,
extremism, and polarization — and undermining societies around the world. In some
cases, this dangerous online talk has led to actual violence that harms and even kills
people. In other cases, their profit optimizing machine is generating self-harm and
self-hate — especially for vulnerable groups, like teenage girls. These problems have
been confirmed repeatedly by Facebook’s own internal research.
Alan Turing forecast this phenomena in 1949 with his proposal for
the imitation game. Today we call it the Turing Test. The implication
of Turing's thinking is that as each new iteration of the algorithms emerges,
it becomes harder and harder for a human to distinguish the algorithms
from a real human being.
If the human is unable to distinguish the algorithms from another real
human being then it is only logical to suggest that the human may
begin forming emotional bonds with algorithms and the personas created by
artificial intelligence.
Much has been written in research studies about the interaction between
social control media and dopamine in the brain. Our brains can have
natural highs with dopamine, for example, when a baby smiles at us
and our brains can have highs when we see something artificial, like
an AI-generated video of a baby on Facebook. More research is needed
to understand the extent to which these substitute stimuli undermine
real-world family functioning.
But it’s not just dopamine getting in on the action. Oxytocin, often dubbed the “cuddle hormone,” also plays a role in our online social bonding. When we engage in positive interactions on social media, our brains release oxytocin, creating a sense of connection and trust. It’s as if our brains can’t quite tell the difference between a virtual hug and a real one.
Scary.
We need to look at this phenomena as a form of virtual polygamy or
cyberpolygamy and when we discuss the challenges of polygamy, it may not
be fair to focus on polygamy in Africa and not simultaneously talk about
the virtual phenomena.
Looking at the open relationships in the open source software ecosystem,
a lot of these things are alluded to but never said out loud.
In 2016, people began spreading rumors about a developer,
Dr Jacob Appelbaum. Various news reports appeared. The magazine
Die Zeit published an article
"What has this man done?". Anybody sharing links to the article was
immediately punished in certain communities. The article notes:
Sitting across from them is a young American woman. She had gotten to
know the others just a couple of days before, but she appears to be
uncomfortable at this party. She doesn’t talk much but listens in a
friendly manner to what is being said.
...
Mr. Appelbaum’s party guests number about 20 and are programmers,
hackers and activists from all around the world.
One theme related to the Dr Appelbaum crisis is the notion of open
relationships in the free and open source software communities. When
the crisis began in 2016 there was a lot of discussion about what
really goes on at the parties. News reports appeared. People found
it embarassing.
These are the people who are creating the technological foundation
for many of the online services we depend on. Therefore, if the polygamy
phenomena is valid in these communities then it is inevitable that it
becomes morally acceptable in those technologies extrapolated from our work.
Woody Allen released the film
Vicky Cristina Barcelona in 2008. We saw parallels in the DebConf
room lists that people are now sharing. The
Debian Pregnancy Cluster followed and immediately after that,
in 2014, people decided to organize
Women's MiniDebConf in Barcelona, as in the movie. Other people quit.
As far as I can tell, the event has never been repeated.
The Debian cases may be an edge case, typical of cult-like groups
but the virtual polygamy phenomena of
social control media feels like a much broader risk.
Frances Haugen, the Facebook whistleblower, handed over an enormous
volume of documents revealing the extent to which Facebook's algorithms
ingratiate themselves to their subjects. Haugen demonstrated what Facebook
does with chilling effect on certain types of subject, for example,
teenage girls with eating disorders.
The rewiring of the brain, substitution of virtual love for human love
isn't only an issue in the husband-wife, parent-child relationships.
Look at the
death of Abraham Raji at DebConf23 in India.
A couple of days after Abraham drowned, they took a group photo in
the hotel swimming pool and published it with the caption
"Come on in and join us".
Compare that to the way Amnesty International responded when two
staff committed suicide. Amnesty commissioned a series of external
reports and promptly published the reports for all their donors, volunteers
and staff to read them. After the
Debian Suicide Cluster, not one report was ever published.
Vast sums of money have been spent
trying to stop people publishing evidence about the deaths.
To the outside observer, the manner in which these groups cut-and-paste
a boilerplate statement about each death and then carry on as if nothing
happened may appear extremely callous. We need to look more closely
to understand the dynamics of these relationships. Many of these people
rarely meet each other in the real world. If ninety-nine percent of the
relationship with Abraham was based on electronic communications, does
that mean people had not formed a human relationship with him before
meeting for the first time at the conference?
This is perplexing. Stepping back, we find that people had a
less-than-human relationship with the volunteer who died but on the
other hand, when using
social control media, some people are bonding with the algorithms
and experiences even more strongly than they bond with family life
in the real world.
To put it another way, we can't simply worry about the impact of
hidden friendships on
social control media, we need to worry about the algorithms themselves
re-wiring those parts of the human mind that are normally reserved
for the exclusive part of a married relationship. Or what
was considered to be exclusive in healthy marriages that occurred
before the
social control media came into existance.
It is important to look at a complete diagram like this because some
of these people are actively involved in cyberbullying attacks against
other open source software developers. To stop cyberbullying, we need
to identify the origins.
For some time I’ve been noticing news reports about PFAs [1]. I hadn’t thought much about that issue, I grew up when leaded petrol was standard, when almost all thermometers had mercury, when all small batteries had mercury, and I had generally considered that I had already had so many nasty chemicals in my body that as long as I don’t eat bottom feeding seafood often I didn’t have much to worry about. I already had a higher risk of a large number of medical issues than I’d like due to decisions made before I was born and there’s not much to do about it given that there are regulations restricting the emissions of lead, mercury etc.
I just watched a Veritasium video about Teflon and the PFA poisoning related to it’s production [2]. This made me realise that it’s more of a problem than I realised and it’s a problem that’s getting worse. PFA levels in the parts-per-trillion range in the environment can cause parts-per-billion in the body which increases the risks of several cancers and causes other health problems. Fortunately there is some work being done on water filtering, you can get filters for a home level now and they are working on filters that can work at a sufficient scale for a city water plant.
Also they noted that donating blood regularly can decrease levels of PFAs in the bloodstream. So presumably people who have medical conditions that require receiving donated blood regularly will have really high levels.
When I was younger, and definitely naïve, I was so looking forward to AI, which
will help us write lots of good, reliable code faster. Well, principally me, not
thinking what impact it will have industry-wide. Other more general concerns,
like societal issues, role of humans in the future and so on were totally not on
my radar.
At the same time, I didn’t expect this will actually happen. Even years later,
things didn’t change dramatically. Even the first release of ChatGPT a few years
back didn’t click for me, as the limitations were still significant.
Hints of serious change
The first hint of the change, for me, was when a few months ago (yes, behind the
curve), I asked ChatGPT to re-explain a concept to me, and it just wrote a lot
of words, but without a clear explanation. On a whim, I asked Grok—then recently
launched, I think—to do the same. And for the first time, the explanation
clicked and I felt I could have a conversation with it. Of course, now I forgot
again that theoretical CS concept, but the first step was done: I can ask an LLM
to explain something, and it will, and I can have a back and forth logical
discussion, even if on some theoretical concept. Additionally, I learned that
not all LLMs are the same, and that means there’s real competition and that leap
frogging is possible.
Another topic on which I tried to adopt early and failed to get mileage out of
it, was GitHub Copilot (in VSC). I tried, it helped, but didn’t feel any
speed-up at all. Then more recently, in May, I asked Grok what’s the state of
the art in AI-assisted coding. It said either Claude in a browser tab, or in VSC
via continue.dev extension.
The continue.dev extension/tooling is a bit of a strange/interesting thing. It
seems to want to be a middle-man between the user and actual LLM services, i.e.
you pay a subscription to continue.dev, not to Anthropic itself, and they manage
the keys/APIs, for whatever backend LLMs you want to use. The integration with
Visual Studio Code is very nice, but I don’t know if long-term their business
model will make sense. Well, not my problem.
Claude: reverse engineering my old code and teaching new concepts
So I installed the latter and subscribed, thinking 20 CHF for a month is good
for testing. I skipped the tutorial model/assistant, created a new one from
scratch, just enabled Claude 3.7 Sonnet, and started using it. And then, my mind
was blown-not just by the LLM, but by the ecosystem. As said, I’ve used GitHub
copilot before, but it didn’t seem effective. I don’t know if a threshold has
been reached, or Claude (3.7 at that time) is just better than ChatGPT.
I didn’t use the AI to write (non-trivial) code for me, at most boilerplate
snippets. But I used it both as partner for discussion - “I want to do x, what
do you think, A or B?�, and as a teacher, especially for fronted topics, which
I’m not familiar with.
Since May, in mostly fragmented sessions, I’ve achieved more than in the last
two years. Migration from old school JS to ECMA modules, a webpacker (reducing
bundle size by 50%), replacing an old Javascript library with hand written code
using modern APIs, implementing the zoom feature together with all of keyboard,
mouse, touchpad and touchscreen support, simplifying layout from manually
computed to automatic layout, and finding a bug in webkit for which it also
wrote a cool minimal test (cool, as in, way better than I’d have ever, ever
written, because for me it didn’t matter that much). And more. Could I have done
all this? Yes, definitely, nothing was especially tricky here. But hours and
hours of reading MDN, scouring Stack Overflow and Reddit, and lots of trial and
error. So doable, but much more toily.
This, to me, feels like cheating. 20 CHF per month to make me 3x more productive
is free money—well, except that I don’t make money on my code which is written
basically for myself. However, I don’t get stuck anymore searching hours in the
web for guidance, I ask my question, and I get at least direction if not answer,
and I’m finished way earlier. I can now actually juggle more hobbies, in the
same amount of time, if my personal code takes less time or differently said, if
I’m more efficient at it.
Not all is roses, of course. Once, it did write code with such an endearing
error that it made me laugh. It was so blatantly obvious that you shouldn’t keep
other state in the array that holds pointer status because that confuses the
calculation of “how many pointers are down�, probably to itself too if I’d have
asked. But I didn’t, since it felt a bit embarassing to point out such a dumb
mistake. Yes, I’m anthropomorphising again, because this is the easiest way to
deal with things.
In general, it does an OK-to-good-to-sometimes-awesome job, and the best thing
is that it summarises documentation and all of Reddit and Stack Overflow. And
gives links to those.
Now, I have no idea yet what this means for the job of a software engineer. If
on open source code, my own code, it makes me 3x faster—reverse engineering my
code from 10 years ago is no small feat—for working on large codebases, it
should do at least the same, if not more.
As an example of how open-ended the assistance can be, at one point, I started
implementing a new feature—threading a new attribute to a large number of call
points. This is not complex at all, just add a new field to a Haskell record,
and modifying everything to take it into account, populate it, merge it when
merging the data structures, etc. The code is not complex, tending toward
boilerplate a bit, and I was wondering on a few possible choices for
implementation, so, with just a few lines of code written that were not even
compiling, I asked “I want to add a new feature, should I do A or B if I want it
to behave like this�, and the answer was something along the lines of “I see
you want to add the specific feature I was working on, but the implementation
is incomplete, you still need to to X, Y and Z�. My mind was blown at this
point, as I thought, if the code doesn’t compile, surely the computer won’t be
able to parse it, but this is not a program, this is an LLM, so of course it
could read it kind of as a human would. Again, the code complexity is not
great, but the fact that it was able to read a half-written patch, understand
what I was working towards, and reason about, was mind-blowing, and scary. Like
always.
Non-code writing
Now, after all this, while writing a recent blog post, I thought—this is going
to be public anyway, so let me ask Claude what it thinks about it. And I was
very surprised, again: gone was all the pain of rereading three times my post to
catch typos (easy) or phrasing structure issues. It gave me very clearly points,
and helped me cut 30-40% of the total time. So not only coding, but word
smithing too is changed. If I were an author, I’d be delighted (and scared).
Here is the overall reply it gave me:
Spelling and grammar fixes, all of them on point except one mistake (I claimed
I didn’t capitalize one word, but I did). To the level of a good grammar
checker.
Flow Suggestions, which was way beyond normal spelling and grammar. It felt
like a teacher telling me to do better in my writing, i.e. nitpicking on
things that actually were true even if they’d still work. I.e. lousy phrase
structure, still understandable, but lousy nevertheless.
Other notes: an overall summary. This was mostly just praising my post 😅. I
wish LLMs were not so focused on “praise the user�.
So yeah, this speeds me up to about 2x on writing blog posts, too. It definitely
feels not fair.
Wither the future?
After all this, I’m a bit flabbergasted. Gone are the 2000’s with code without
unittests, gone are the 2010’s without CI/CD, and now, mid-2020’s, gone is the
lone programmer that scours the internet to learn new things, alone?
What this all means for our skills in software development, I have no idea,
except I know things have irreversibly changed (a butlerian jihad aside). Do I
learn better with a dedicated tutor even if I don’t fight with the problem for
so long? Or is struggling in finding good docs the main method of learning? I
don’t know yet. I feel like I understand the topics I’m discussing with the AI,
but who knows in reality what it will mean long term in terms of “stickiness� of
learning. For the better, or for worse, things have changed. After all the
advances over the last five centuries in mechanical sciences, it has now come to
some aspects of the intellectual work.
Maybe this is the answer to the ever-growing complexity of tech stacks? I.e. a
return of the lone programmer that builds things end-to-end, but with AI taming
the complexity added in the last 25 years? I can dream, of course, but this also
means that the industry overall will increase in complexity even more, because
large companies tend to do that, so maybe a net effect of not much…
One thing I did learn so far is that my expectation that AI (at this level) will
only help junior/beginner people, i.e. it would flatten the skills band, is not
true. I think AI can speed up at least the middle band, likely the middle top
band, I don’t know about the 10x programmers (I’m not one of them). So, my
question about AI now is how to best use it, not to lament how all my learning
(90% self learning, to be clear) is obsolete. No, it isn’t. AI helps me start
and finish one migration (that I delayed for ages), then start the second, in
the same day.
At the end of this—a bit rambling—reflection on the past month and a half, I
still have many questions about AI and humanity. But one has been answered: yes,
“AI�, quotes or no quotes, already has changed this field (producing software),
and we’ve not seen the end of it, for sure.
I had a peculiar question at work recently, and it went off of a tangent that
was way too long and somewhat interesting, so I wanted to share.
The question is: Can you create a set of N-bit numbers (codes), so that
a) Neither is a subset of each other, and
b) Neither is a subset of the OR of two of the others?
Of course, you can trivially do this (e.g., for N=5, choose 10000, 01000,
00100 and so on), but how many can you make for a
given N? This is seemingly an open question, but at least I found that
they are called (1,2) superimposed codes and have history at least
back to this 1964 paper.
They present a fairly elegant (but definitely non-optimal) way of
constructing them for certain N; let me show an example for N=25:
We start by counting 3-digit numbers (k=3) in base 5 (q=5):
000
001
002
003
004
010
011
etc…
Now we have 5^3 numbers. Let's set out to give them the property that we
want.
This code (set of numbers) trivially has distance 1; that is, every number
differs from every other number by at least one digit. We'd like to increase
that distance so that it is at least as large as k.
Reed-Solomon gives us an
optimal way of doing that; for every number, we add two checksum digits and
R-S will guarantee that the resulting code has distance 3. (Just trust me
on this, I guess. It only works for q >= (k+1)/2, though, and q must be
a power of an odd prime because otherwise the group theory doesn't work out.)
We now have a set of 5-digit numbers with distance 3. But if we now take any
three numbers from this set, there is at least one digit where all three must
differ, since the distance is larger than half the number of digits: Two
numbers A and B differ from each other in at least 3 of the 5 digits, and A
and C also has to differ from each other in at least 3 of the 5 digits. There
just isn't room for A and B to be the same in all the places that A differ
from C.
To modify this property into the one that we want, we encode each digit into
binary using one-hot encoding (00001, 00010, 00100, etc.). Now our 5-digit
numbers are 25-bit numbers. And due to the "all different" property in the
previous paragraph, we also have our superimposition property; there's at
least one 5-bit group where A|B shares no bits with C. So this gives us a
25-bit set with 125 different values and our desired property.
This isn't necessarily an optimal code (and the authors are very clear on
that), but it's at least systematic and easy to extend to larger sizes.
(I used a SAT solver to extend this to 170 different values, just by keeping
the 125 first and asking for 45 more that were not in conflict. 55 more
was evidently hard.) The paper has tons more information, including some
stuff based on Steiner systems
that I haven't tried to understand. And of course, there are tons more
later papers, including one by Erdős. :-)
I've applied for an account at OEIS so I can add
a sequence for the maximum number of possible codes for each N.
It doesn't have many terms known yet, because the SAT solver struggles
hard with this (at least in my best formulation), but at least it will
give the next person something to find when they are searching. :-)
The Linux kernel has an interesting file descriptor called pidfd. As the name imples, it is a file descriptor to a pid or a specific process. The nice thing about it is that is guaranteed to be for the specific process you expected when you got that pidfd. A process ID, or PID, has no reuse guarantees, which means what you think process 1234 is and what the kernel knows what process 1234 is could be different because your process exited and the process IDs have looped around.
pidfds are *odd*, they’re half a “normal” file descriptor and half… something else. That means some file descriptor things work and some fail in odd ways. stat() works, but using them in the first parameter of openat() will fail.
One thing you can do with them is use epoll() on them to get process status, in fact the pidfd_open() manual page says:
A PID file descriptor returned by pidfd_open() (or by clone(2) with the CLONE_PID flag) can be used for the following purposes:
…
A PID file descriptor can be monitored using poll(2), select(2), and epoll(7). When the process that it refers to terminates, these interfaces indicate the file descriptor as readable.
So if you want to wait until something terminates, then you can just find the pidfd of the process and sit an epoll_wait() onto it. Simple, right? Except its not quite true.
procps issue #386 stated that if you had a list of processes, then pidwait only finds half of them. I’d like to thank Steve the issue reporter for the initial work on this. The odd thing is that for every exited process, you get two epoll events. You get an EPOLLIN first, then a EPOLLIN | EPOLLHUP after that. Steve suggested the first was when the process exits, the second when the process has been collected by the parent.
I have a collection of oddball processes, including ones that make zombies. A zombie is a child that has exited but has not been wait() ed by its parent. In other words, if a parent doesn’t collect its dead child, then the child becomes a zombie. The test program spawns a child, which exits after some seconds. The parent waits longer, calls wait() waits some more then exits. Running pidwait we can see the following epoll events:
When the child exits, EPOLLIN on the child is triggered. At this stage the child is a zombie.
When the parent calls wait(), then EPOLLIN | EPOLLHUP on the child is triggered.
When the parent exits, EPOLLIN then EPOLLIN | EPOLLHUP on the parent is triggered. That is, two events for the one thing.
If you want to use epoll() to know when a process terminates, then you need to decide on what you mean by that:
If you mean it has exited, but not collected yet (e.g. a zombie possibly) then you need to select on EPOLLIN only.
If you mean the process is fully gone, then EPOLLHUP is a better choice. You can even change the epoll_ctl() call to use this instead.
A “zombie trigger” (EPOLLIN with no subsequent EPOLLHUP) is a bit tricky to work out. There is no guarantee the two events have to be in the same epoll, especially if the parent is a bit tardy on their wait() call.
What does not work is having two variables which validate each other, e.g.
variable "nat_min_ports" {
description = "Minimal amount of ports to allocate for 'min_ports_per_vm'"
default = 32
type = number
validation {
condition = (
var.nat_min_ports >= 32 &&
var.nat_min_ports <= 32768 &&
var.nat_min_ports < var.nat_max_ports
)
error_message = "Must be between 32 and 32768 and less than 'nat_max_ports'"
}
}
variable "nat_max_ports" {
description = "Maximal amount of ports to allocate for 'max_ports_per_vm'"
default = 16384
type = number
validation {
condition = (
var.nat_max_ports >= 64 &&
var.nat_max_ports <= 65536 &&
var.nat_max_ports > var.nat_min_ports
)
error_message = "Must be between 64 and 65536 and above 'nat_min_ports'"
}
}
That let directly to the following rather opaque error message:
Received an error
Error: Cycle: module.gcp_project_network.var.nat_max_ports (validation), module.gcp_project_network.var.nat_min_ports (validation)
Removed the sort of duplicate check var.nat_max_ports > var.nat_min_ports on
nat_max_ports to break the cycle.
23 years ago I was in a bad place. I'd quit my first attempt at a PhD for various reasons that were, with hindsight, bad, and I was suddenly entirely aimless. I lucked into picking up a sysadmin role back at TCM where I'd spent a summer a year before, but that's not really what I wanted in my life. And then Hanna mentioned that her PhD supervisor was looking for someone familiar with Linux to work on making Dasher, one of the group's research projects, more usable on Linux. I jumped.
The timing was fortuitous. Sun were pumping money and developer effort into accessibility support, and the Inference Group had just received a grant from the Gatsy Foundation that involved working with the ACE Centre to provide additional accessibility support. And I was suddenly hacking on code that was largely ignored by most developers, supporting use cases that were irrelevant to most developers. Being in a relatively green field space sounds refreshing, until you realise that you're catering to actual humans who are potentially going to rely on your software to be able to communicate. That's somewhat focusing.
This was, uh, something of an on the job learning experience. I had to catch up with a lot of new technologies very quickly, but that wasn't the hard bit - what was difficult was realising I had to cater to people who were dealing with use cases that I had no experience of whatsoever. Dasher was extended to allow text entry into applications without needing to cut and paste. We added support for introspection of the current applications UI so menus could be exposed via the Dasher interface, allowing people to fly through menu hierarchies and pop open file dialogs. Text-to-speech was incorporated so people could rapidly enter sentences and have them spoke out loud.
But what sticks with me isn't the tech, or even the opportunities it gave me to meet other people working on the Linux desktop and forge friendships that still exist. It was the cases where I had the opportunity to work with people who could use Dasher as a tool to increase their ability to communicate with the outside world, whose lives were transformed for the better because of what we'd produced. Watching someone use your code and realising that you could write a three line patch that had a significant impact on the speed they could talk to other people is an incomparable experience. It's been decades and in many ways that was the most impact I've ever had as a developer.
I left after a year to work on fruitflies and get my PhD, and my career since then hasn't involved a lot of accessibility work. But it's stuck with me - every improvement in that space is something that has a direct impact on the quality of life of more people than you expect, but is also something that goes almost unrecognised. The people working on accessibility are heroes. They're making all the technology everyone else produces available to people who would otherwise be blocked from it. They deserve recognition, and they deserve a lot more support than they have.
But when we deal with technology, we deal with transitions. A lot of the Linux accessibility support depended on X11 behaviour that is now widely regarded as a set of misfeatures. It's not actually good to be able to inject arbitrary input into an arbitrary window, and it's not good to be able to arbitrarily scrape out its contents. X11 never had a model to permit this for accessibility tooling while blocking it for other code. Wayland does, but suffers from the surrounding infrastructure not being well developed yet. We're seeing that happen now, though - Gnome has been performing a great deal of work in this respect, and KDE is picking that up as well. There isn't a full correspondence between X11-based Linux accessibility support and Wayland, but for many users the Wayland accessibility infrastructure is already better than with X11.
That's going to continue improving, and it'll improve faster with broader support. We've somehow ended up with the bizarre politicisation of Wayland as being some sort of woke thing while X11 represents the Roman Empire or some such bullshit, but the reality is that there is no story for improving accessibility support under X11 and sticking to X11 is going to end up reducing the accessibility of a platform.
When you read anything about Linux accessibility, ask yourself whether you're reading something written by either a user of the accessibility features, or a developer of them. If they're neither, ask yourself why they actually care and what they're doing to make the future better.
A few months ago I bought a Intel Arc B580 for the main purpose of getting 8K video going [1]. I had briefly got it working in a test PC but then I wanted to deploy it on my HP z840 that I use as a build server and for playing with ML stuff [2]. I only did brief tests of it previously and this was my first attempt at installing it in a system I use. My plan was to keep the NVidia RTX A2000 in place and run 2 GPUs, that’s not an uncommon desire among people who want to do ML stuff and it’s the type of thing that the z840 is designed for, the machine has slots 2, 4, and 6 being PCIe*16 so it should be able to fit 3 cards that each take 2 slots. So having one full size GPU, the half-height A2000, and a NVMe controller that uses *16 to run four NVMe devices should be easy.
Intel designed the B580 to use every millimeter of space possible while still being able to claim to be a 2 slot card. On the circuit board side there is a plastic cover over the board that takes all the space before the next slot so a 2 slot card can’t go on that side without having it’s airflow blocked. On the other side it takes all the available space so that any card that wants to blow air through can’t fit and also such that a medium size card (such as the card for 4 NVMe devices) would block it’s air flow. So it’s impossible to have a computer with 6 PCIe slots run the B580 as well as 2 other full size *16 cards.
Support for this type of GPU is something vendors like HP should consider when designing workstation class systems. For HP there is no issue of people installing motherboards in random cases (the HP motherboard in question uses proprietary power connectors and won’t even boot with an ATX PSU without significant work). So they could easily design a motherboard and case with a few extra mm of space between pairs of PCIe slots. The cards that are double width are almost always *16 so you could pair up a *16 slot and another slot and have extra space on each side of the pair. I think for most people a system with 6 PCIe slots with a bit of extra space for GPU cooling would be more useful than having 7 PCIe slots. But as HP have full design control they don’t even need to reduce the number of PCIe slots, they could just make the case taller. If they added another 4 slots and increased the case size accordingly it still wouldn’t be particularly tall by the standards of tower cases from the 90s! The z8 series of workstations are the biggest workstations that HP sells so they should design them to do these things. At the time that the z840 was new there was a lot of ML work being done and HP was selling them as ML workstations, they should have known how people would use them and design them accordingly.
So I removed the NVidia card and decided to run the system with just the Arc card, things should have been fine but Intel designed the card to be as high as possible and put the power connector on top. This prevented installing the baffle for directing air flow over the PCIe slots and due to the design of the z840 (which is either ingenious or stupid depending on your point of view) the baffle is needed to secure the PCIe cards in place. So now all the PCIe cards are just secured by friction in the slots, this isn’t an unusual situation for machines I assemble but it’s not something I desired.
This is the first time I’ve felt compelled to write a blog post reviewing a product before even getting it working. But the physical design of the B580 is outrageously impractical unless you are designing your entire computer around the GPU.
As an aside the B580 does look very nice. The plastic surround is very fancy, it’s a pity that it interferes with the operation of the rest of the system.
In short, the world has moved on to hosting and working with source code in Git repositories. In Debian, we work with source packages that are used to generated the binary artifacts that users know as .deb files. In Debian, there is so much tooling and culture built around this. For example, our workflow passes what we call the island test – you could take every source package in Debian along with you to an island with no Internet, and you’ll still be able to rebuild or modify every package. When changing the workflows, you risk losing benefits like this, and over the years there has been a number of different ideas on how to move to a purely or partially git flow for Debian, none that really managed to gain enough momentum or project-wide support.
Tag2upload makes a lot of sense. It doesn’t take away any of the benefits of the current way of working (whether technical or social), but it does make some aspects of Debian packages significantly simpler and faster. Even so, if you’re a Debian Developer and more familiar with how the sausage have made, you’ll have noticed that this has been a very long road for the tag2upload maintainers, they’ve hit multiple speed bumps since 2019, but with a lot of patience and communication and persistence from all involved (and almost even a GR), it is finally materializing.
Performing my first tag2upload
So, first, I needed to choose which package I want to upload. We’re currently in hard freeze for the trixie release, so I’ll look for something simple that I can upload to experimental.
I chose bundlewrap, it’s quote a straightforward python package, and updates are usually just as straightforward, so it’s probably a good package to work on without having to deal with extra complexities in learning how to use tag2upload.
So, I do the usual uscan and dch -i to update my package…
And then I realise that I still want to build a source package to test it in cowbuilder. Hmm, I remember that Helmut showed me that building a source package isn’t necessary in sbuild, but I have a habit of breaking my sbuild configs somehow, but I guess I should revisit that.
So, I do a dpkg-buildpackage -S -sa and test it out with cowbuilder, because that’s just how I roll (at least for now, fixing my local sbuild setup is yak shaving for another day, let’s focus!).
I end up with a binary that looks good, so I’m satisfied that I can upload this package to the Debian archives. So, time to configure tag2upload.
The first step is to set up the webhook in Salsa. I was surprised two find two webhooks already configured:
I know of KGB that posts to IRC, didn’t know that this was the mechanism it does that by before. Nice! Also don’t know what the tagpending one does, I’ll go look into that some other time.
Configuring a tag2upload webhook is quite simple, add a URL, call the name tag2upload, and select only tag push events:
I run the test webhook, and it returned a code 400 message about a missing ‘message’ header, which the documentation says is normal.
Next, I install git-debpush from experimental.
The wiki page simply states that you can use the git-debpush command to upload, but doesn’t give any examples on how to use it, and its manpage doesn’t either. And when I run just git-debpush I get:
jonathan@lapcloud:~/devel/debian/python-team/bundlewrap/bundlewrap-4.23.1$ git-debpush git-debpush: check failed: upstream tag upstream/4.22.0 is not an ancestor of refs/heads/debian/master; probably a mistake ('upstream-nonancestor' check) pristine-tar is /usr/bin/pristine-tar git-debpush: some check(s) failed; you can pass --force to ignore them
I have no idea what that’s supposed to mean. I was also not sure whether I should tag anything to begin with, or if some part of the tag2upload machinery automatically does it. I think I might have tagged debian/4.23-1 before tagging upstream/4.23 and perhaps it didn’t like it, I reverted and did it the other way around and got a new error message. Progress!
jonathan@lapcloud:~/devel/debian/python-team/bundlewrap/bundlewrap-4.23.1$ git-debpush git-debpush: could not determine the git branch layout git-debpush: please supply a --quilt= argument
Looking at the manpage, it looks like –quilt=baredebian matches my package the best, so I try that:
Ooh! That looked like it did something! And a minute later I received the notification of the upload in my inbox:
So, I’m not 100% sure that this makes things much easier for me than doing a dput, but, it’s not any more difficult or more work either (once you know how it works), so I’ll be using git-debpush from now on, and I’m sure as I get more used to the git workflow of doing things I’ll understand more of the benefits. And at last, my one last use case for using FTP is now properly dead. RIP FTP :)
To run a SMP system with multiple CPUs you need to have CPUs that are “identical”, the question is what does “identical” mean. In this case I’m interested in Intel CPUs because SMP motherboards and server systems for Intel CPUs are readily available and affordable. There are people selling matched pairs of CPUs on ebay which tend to be more expensive than randomly buying 2 of the same CPU model, so if you can identify 2 CPUs that are “identical” which are sold separately then you can save some money. Also if you own a two CPU system with only one CPU installed then buying a second CPU to match the first is cheaper and easier than buying two more CPUs and removing a perfectly working CPU.
Above is a pic of 2 E5-2640v4 CPUs that were in a SMP system I purchased along with a plain ASCII representation of the text on one of them. The bottom code (starting with “77”) is apparently the serial number, one of the two codes above it is what determines how “identical” those CPUs are.
The line below the sspec and above the serial number has J717B324 which doesn’t have a google hit. I looked at more than 20 pics of E5-2640v4 CPUs on ebay, they all had the code SR2NZ but had different numbers on the line below. I conclude that the number on the line below probably indicates the model AND stepping while SR2NZ just means E5-2640v4 regardless of stepping. As I wasn’t able to find another CPU on ebay with the same number on the line below the sspec I believe that it will be unreasonably difficult to get a match for an existing CPU.
For the purpose of matching CPUs I believe that if the line above the serial number matches then the CPUs can be used together. I am not certain that CPUs with this number slightly mismatching won’t work but I definitely wouldn’t want to spend money on CPUs with this number being different.
When you boot Linux the kernel identifies the CPU in a manner like the above, the combination of family and model seem to map to one spec number. The combination of family, model, and stepping should be all that’s required to have them work together.
I think that Intel did the wrong thing in not making this clearer. It would have been very easy to print the stepping on the CPU case next to the sspec or the CPU model name. It also wouldn’t have been too hard to make the CPU provide the magic number that is apparently the required match for SMP to the OS. Having the Intel web site provide a mapping of those numbers to steppings of CPUs also shouldn’t be difficult for them.
If anyone knows more about these issues please let me know.
I’ve been meaning to write a post about this bug for a while, so here
it is (before I forget the details!).
First, I’d like to thank a few people:
My friend Gabriel F. T. Gomes, who helped with debugging and simply
talking about the issue. I love doing some pair debugging, and I
noticed that he also had a great time diving into the internals of
glibc and libgcc.
My teammate Dann Frazier, who always provides invaluable insights
and was there to motivate me to push a bit further in order to
figure out what was going on.
The upstream GCC and glibc developers who finally drove the
investigation to completion and came up with an elegant fix.
I’ll probably forget some details because it’s been more than a week
(and life at $DAYJOB moves fast), but we’ll see.
The background story
Wolfi OS takes security seriously, and one of the things we have is a
package which sets the hardening compiler flags for C/C++ according to
the best practices recommended by OpenSSF. At the time of this
writing, these flags are (in GCC’s spec file parlance):
The important part for our bug is the usage of -z now and
-fno-strict-aliasing.
As I was saying, these flags are set for almost every build, but
sometimes things don’t work as they should and we need to disable
them. Unfortunately, one of these problematic cases has been glibc.
There was an attempt to enable hardening while building glibc, but
that introduced a strange breakage to several of our packages and had
to be reverted.
Things stayed pretty much the same until a few weeks ago, when I
started working on one of my roadmap items: figure out why hardening
glibc wasn’t working, and get it to work as much as possible.
Reproducing the bug
I started off by trying to reproduce the problem. It’s important to
mention this because I often see young engineers forgetting to check
if the problem is even valid anymore. I don’t blame them; the anxiety
to get the bug fixed can be really blinding.
Fortunately, I already had one simple test to trigger the failure.
All I had to do was install the py3-matplotlib package and then
invoke:
$ python3 -c 'import matplotlib'
This would result in an abortion with a coredump.
I followed the steps above, and readily saw the problem manifesting
again. OK, first step is done; I wasn’t getting out easily from this
one.
Initial debug
The next step is to actually try to debug the failure. In an ideal
world you get lucky and are able to spot what’s wrong after just a few
minutes. Or even better: you also can devise a patch to fix the bug
and contribute it to upstream.
I installed GDB, and then ran the py3-matplotlib command inside it.
When the abortion happened, I issued a backtrace command inside GDB
to see where exactly things had gone wrong. I got a stack trace
similar to the following:
#0 0x00007c43afe9972c in __pthread_kill_implementation () from /lib/libc.so.6
#1 0x00007c43afe3d8be in raise () from /lib/libc.so.6
#2 0x00007c43afe2531f in abort () from /lib/libc.so.6
#3 0x00007c43af84f79d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4 0x00007c43af86d4d8 in _Unwind_RaiseException () from /usr/lib/libgcc_s.so.1
#5 0x00007c43acac9014 in __cxxabiv1::__cxa_throw (obj=0x5b7d7f52fab0, tinfo=0x7c429b6fd218 <typeinfo for pybind11::attribute_error>, dest=0x7c429b5f7f70 <pybind11::reference_cast_error::~reference_cast_error() [clone .lto_priv.0]>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#6 0x00007c429b5ec3a7 in ft2font__getattr__(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [clone .lto_priv.0] [clone .cold] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#7 0x00007c429b62f086 in pybind11::cpp_function::initialize<pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::scope, pybind11::sibling>(pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#1}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0] ()
from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#8 0x00007c429b603886 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
...
Huh. Initially this didn’t provide me with much information. There
was something strange seeing the abort function being called right
after _Unwind_RaiseException, but at the time I didn’t pay much
attention to it.
OK, time to expand our horizons a little. Remember when I said that
several of our packages would crash with a hardened glibc? I decided
to look for another problematic package so that I could make it crash
and get its stack trace. My thinking here is that maybe if I can
compare both traces, something will come up.
I happened to find an old discussion where Dann Frazier mentioned that
Emacs was also crashing for him. He and I share the Emacs passion,
and I totally agreed with him when he said that “Emacs crashing is
priority -1!” (I’m paraphrasing).
I installed Emacs, ran it, and voilà : the crash happened again. OK,
that was good. When I ran Emacs inside GDB and asked for a backtrace,
here’s what I got:
#0 0x00007eede329972c in __pthread_kill_implementation () from /lib/libc.so.6
#1 0x00007eede323d8be in raise () from /lib/libc.so.6
#2 0x00007eede322531f in abort () from /lib/libc.so.6
#3 0x00007eede262879d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4 0x00007eede2646e7c in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1
#5 0x00007eede3327b11 in backtrace () from /lib/libc.so.6
#6 0x000059535963a8a1 in emacs_backtrace ()
#7 0x000059535956499a in main ()
Ah, this backtrace is much simpler to follow. Nice.
Hmmm. Now the crash is happening inside _Unwind_Backtrace. A
pattern emerges! This must have something to do with stack unwinding
(or so I thought… keep reading to discover the whole truth). You
see, the backtrace function (yes, it’s a function) and C++’s
exception handling mechanism use similar techniques to do their jobs,
and it pretty much boils down to unwinding frames from the stack.
I looked into Emacs’ source code, specifically the emacs_backtrace
function, but could not find anything strange over there. This bug
was probably not going to be an easy fix…
The quest for a minimal reproducer
Being able to easily reproduce the bug is awesome and really helps
with debugging, but even better is being able to have a minimal
reproducer for the problem.
You see, py3-matplotlib is a huge package and pulls in a bunch of
extra dependencies, so it’s not easy to ask other people to “just
install this big package plus these other dependencies, and then run
this command…”, especially if we have to file an upstream bug and
talk to people who may not even run the distribution we’re using. So
I set up to try and come up with a smaller recipe to reproduce the
issue, ideally something that’s not tied to a specific package from
the distribution.
Having all the information gathered from the initial debug session,
especially the Emacs backtrace, I thought that I could write a very
simple program that just invoked the backtrace function from glibc
in order to trigger the code path that leads to _Unwind_Backtrace.
Here’s what I wrote:
After compiling it, I determined that yes, the problem did happen with
this small program as well. There was only a small nuisance: the
manifestation of the bug was not deterministic, so I had to execute
the program a few times until it crashed. But that’s much better than
what I had before, and a small price to pay. Having a minimal
reproducer pretty much allows us to switch our focus to what really
matters. I wouldn’t need to dive into Emacs’ or Python’s source code
anymore.
At the time, I was sure this was a glibc bug. But then something else
happened.
GCC 15
I had to stop my investigation efforts because something more
important came up: it was time to upload GCC 15 to Wolfi. I spent a
couple of weeks working on this (it involved rebuilding the whole
archive, filing hundreds of FTBFS bugs, patching some programs, etc.),
and by the end of it the transition went smooth. When the GCC 15
upload was finally done, I switched my focus back to the glibc
hardening problem.
The first thing I did was to… yes, reproduce the bug again. It had
been a few weeks since I had touched the package, after all. So I
built a hardened glibc with the latest GCC and… the bug did not
happen anymore!
Fortunately, the very first thing I thought was “this must be GCC”,
so I rebuilt the hardened glibc with GCC 14, and the bug was there
again. Huh, unexpected but very interesting.
Diving into glibc and libgcc
At this point, I was ready to start some serious debugging. And then
I got a message on Signal. It was one of those moments where two
minds think alike: Gabriel decided to check how I was doing, and I was
thinking about him because this involved glibc, and Gabriel
contributed to the project for many years. I explained what I was
doing, and he promptly offered to help. Yes, there are more people
who love low level debugging!
We spent several hours going through disassembles of certain functions
(because we didn’t have any debug information in the beginning),
trying to make sense of what we were seeing. There was some heavy GDB
involved; unfortunately I completely lost the session’s history
because it was done inside a container running inside an ephemeral VM.
But we learned a lot. For example:
It was hard to actually understand the full stack trace leading to
uw_init_context_1[cold]. _Unwind_Backtrace obviously didn’t
call it (it called uw_init_context_1, but what was that [cold]
doing?). We had to investigate the disassemble of
uw_init_context_1 in order to determined where
uw_init_context_1[cold] was being called.
The [cold] suffix is a GCC function attribute that can be used to
tell the compiler that the function is unlikely to be reached. When
I read that, my mind immediately jumped to “this must be an
assertion”, so I went to the source code and found the spot.
We were able to determine that the return code of
uw_frame_state_for was 5, which means _URC_END_OF_STACK.
That’s why the assertion was triggering.
After finding these facts without debug information, I decided to bite
the bullet and recompiled GCC 14 with -O0 -g3, so that we could
debug what uw_frame_state_for was doing. After banging our heads a
bit more, we found that fde is NULL at this excerpt:
// ...
fde =_Unwind_Find_FDE (context->ra +_Unwind_IsSignalFrame (context) -1,
&context->bases);
if (fde == NULL)
{
#ifdef MD_FALLBACK_FRAME_STATE_FOR
/* Couldn't find frame unwind info for this function. Try a
target-specific fallback mechanism. This will necessarily
not provide a personality routine or LSDA. */returnMD_FALLBACK_FRAME_STATE_FOR (context, fs);
#else
return _URC_END_OF_STACK;
#endif
}
// ...
We’re debugging on amd64, which means that
MD_FALLBACK_FRAME_STATE_FOR is defined and therefore is called. But
that’s not really important for our case here, because we had
established before that _Unwind_Find_FDE would never return NULL
when using a non-hardened glibc (or a glibc compiled with GCC 15). So
we decided to look into what _Unwind_Find_FDE did.
The function is complex because it deals with .eh_frame , but we
were able to pinpoint the exact location where find_fde_tail (one of
the functions called by _Unwind_Find_FDE) is returning NULL:
if (pc < table[0].initial_loc + data_base)
return NULL;
We looked at the addresses of pc and table[0].initial_loc + data_base, and found that the former fell within libgcc’s text
section, which the latter fell within /lib/ld-linux-x86-64.so.2
text.
At this point, we were already too tired to continue. I decided to
keep looking at the problem later and see if I could get any further.
Bisecting GCC
The next day, I woke up determined to find what changed in GCC 15 that
caused the bug to disappear. Unless you know GCC’s internals like
they are your own home (which I definitely don’t), the best way to do
that is to git bisect the commits between GCC 14 and 15.
I spent a few days running the bisect. It took me more time than I’d
have liked to find the right range of commits to pass git bisect
(because of how branches and tags are done in GCC’s repository), and I
also had to write some helper scripts that:
Modified the gcc.yaml package definition to make it build with the
commit being bisected.
Built glibc using the GCC that was just built.
Ran tests inside a docker container (with the recently built glibc
installed) to determine whether the bug was present.
At the end, I had a commit to point to:
commit 99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener <rguenther@suse.de>
Date: Fri May 3 14:04:41 2024 +0200
tree-optimization/114589 - remove profile based sink heuristics
Makes sense, right?! No? Well, it didn’t for me either. Even after
reading what was changed in the code and the upstream bug fixed by the
commit, I was still clueless as to why this change “fixed” the problem
(I say “fixed” because it may very well be an unintended consequence
of the change, and some other problem might have been introduced).
Upstream takes over
After obtaining the commit that possibly fixed the bug, while talking
to Dann and explaining what I did, he suggested that I should file an
upstream bug and check with them. Great idea, of course.
It’s a bit long, very dense and complex, but ultimately upstream was
able to find the real problem and have a patch accepted in just two
days. Nothing like knowing the code base. The initial bug became:
In the end, the problem was indeed in how the linker defines
__ehdr_start, which, according to the code (from
elf/dl-support.c):
if (_dl_phdr == NULL)
{
/* Starting from binutils-2.23, the linker will define the
magic symbol __ehdr_start to point to our own ELF header
if it is visible in a segment that also includes the phdrs.
So we can set up _dl_phdr and _dl_phnum even without any
information from auxv. */externconstElfW(Ehdr) __ehdr_start attribute_hidden;
assert (__ehdr_start.e_phentsize ==sizeof*GL(dl_phdr));
_dl_phdr = (constvoid*) &__ehdr_start + __ehdr_start.e_phoff;
_dl_phnum = __ehdr_start.e_phnum;
}
But the following definition is the problematic one (from elf/rtld.c):
This symbol (along with its counterpart, __ehdr_end) was being
run-time relocated when it shouldn’t be. The fix that was pushed
added optimization barriers to prevent the compiler from doing the
relocations.
I don’t claim to fully understand what was done here, and Jakub’s
analysis is a thing to behold, but in the end I was able to confirm
that the patch fixed the bug. And in the end, it was indeed a glibc
bug.
Conclusion
This was an awesome bug to investigate. It’s one of those that
deserve a blog post, even though some of the final details of the fix
flew over my head.
I’d like to start blogging more about these sort of bugs, because I’ve
encountered my fair share of them throughout my career. And it was
great being able to do some debugging with another person, exchange
ideas, learn things together, and ultimately share that deep
satisfaction when we find why a crash is happening.
I have at least one more bug in my TODO list to write about (another
one with glibc, but this time I was able to get to the end of it and
come up with a patch). Stay tunned.
P.S.: After having published the post I realized that I forgot to
explain why the -z now and -fno-strict-aliasing flags were
important.
-z now is the flag that I determined to be the root cause of the
breakage. If I compiled glibc with every hardening flag except -z now, everything worked. So initially I thought that the problem had
to do with how ld.so was resolving symbols at runtime. As it turns
out, this ended up being more a symptom than the real cause of the
bug.
As for -fno-strict-aliasing, a Gentoo developer who commented on the
GCC bug above mentioned that this OpenSSF bug had a good point against
using this flag for hardening. I still have to do a deep dive on what
was discussed in the issue, but this is certainly something to take
into consideration. There’s this very good write-up about strict
aliasing in general if you’re interested in understanding it better.
Everybody is trying out AI assistants these days, so I figured I'd jump on that train and see how fast it derails.
I went with CodeRabbit because I've seen it on YouTube — ads work, I guess.
I am trying to answer the following questions:
Did the AI find things that humans did not find (or didn't bother to mention)
Did the AI output help the humans with the review (useful summary etc)
Did the AI output help the humans with the code (useful suggestions etc)
Was the AI output misleading?
Was the AI output distracting?
To reduce the amount of output and not to confuse contributors, CodeRabbit was configured to only do reviews on demand.
What follows is a rather unscientific evaluation of CodeRabbit based on PRs in two Foreman-related repositories,
looking at the summaries CodeRabbit posted as well as the comments/suggestions it had about the code.
The summary CodeRabbit posted is technically correct.
This update introduces several changes across CI configuration, Ansible roles, plugins, and test playbooks. It expands CI test coverage to a new Ansible version, adjusts YAML key types in test variables, refines conditional logic in Ansible tasks, adds new default variables, and improves clarity and consistency in playbook task definitions and debug output.
Yeah, it does all of that, all right.
But it kinda misses the point that the addition here is "Ansible 2.19 support", which starts with adding it to the CI matrix and then adjusting the code to actually work with that version.
Also, the changes are not for "clarity" or "consistency", they are fixing bugs in the code that the older Ansible versions accepted, but the new one is more strict about.
Then it adds a table with the changed files and what changed in there.
To me, as the author, it felt redundant, and IMHO doesn't add any clarity to understand the changes.
(And yes, same "clarity" vs bugfix mistake here, but that makes sense as it apparently miss-identified the change reason)
And then the sequence diagrams…
They probably help if you have a dedicated change to a library or a library consumer,
but for this PR it's just noise, especially as it only covers two of the changes (addition of 2.19 to the test matrix and a change to the inventory plugin), completely ignoring other important parts.
Overall verdict: noise, don't need this.
comments posted
CodeRabbit also posted 4 comments/suggestions to the changes.
Guard against undefined result.task
IMHO a valid suggestion, even if on the picky side as I am not sure how to make it undefined here.
I ended up implementing it, even if with slightly different (and IMHO better readable) syntax.
Valid complaint? Probably.
Useful suggestion? So-So.
Wasted time? No.
Inconsistent pipeline in when for composite CV versions
That one was funny! The original complaint was that the when condition used slightly different data manipulation than the data that was passed when the condition was true.
The code was supposed to do "clean up the data, but only if there are any items left after removing the first 5, as we always want to keep 5 items".
And I do agree with the analysis that it's badly maintainable code.
But the suggested fix was to re-use the data in the variable we later use for performing the cleanup.
While this is (to my surprise!) valid Ansible syntax, it didn't make the code much more readable as you need to go and look at the variable definition.
The better suggestion then came from Ewoud: to compare the length of the data with the number we want to keep.
Humans, so smart!
But Ansible is not Ewoud's native turf, so he asked whether there is a more elegant way to count how much data we have than to use | list | count in Jinja (the data comes from a Python generator, so needs to be converted to a list first).
And the AI helpfully suggested to use | count instead!
However, count is just an alias for length in Jinja, so it behaves identically and needs a list.
Luckily the AI quickly apologized for being wrong after being pointed at the Jinja source and didn't try to waste my time any further.
Wouldn't I have known about the count alias, we'd have committed that suggestion and let CI fail before reverting again.
Valid complaint? Yes.
Useful suggestion? Nope.
Wasted time? Yes.
Apply the same fix for non-composite CV versions
The very same complaint was posted a few lines later, as the logic there is very similar — just slightly different data to be filtered and cleaned up.
Interestingly, here the suggestion also was to use the variable.
But there is no variable with the data!
The text actually says one need to "define" it, yet the "committable suggestion" doesn't contain that part.
Interestingly, when asked where it sees the "inconsistency" in that hunk, it said the inconsistency is with the composite case above.
That however is nonsense, as while we want to keep the same number of composite and non-composite CV versions,
the data used in the task is different — it even gets consumed by a totally different playbook — so there can't be any real consistency between the branches.
Valid complaint? Yes (the expression really could use some cleanup).
Useful suggestion? Nope.
Wasted time? Yes.
I ended up applying the same logic as suggested by Ewoud above.
As that refactoring was possible in a consistent way.
Ensure consistent naming for Oracle Linux subscription defaults
One of the changes in Ansible 2.19 is that Ansible fails when there are undefined variables, even if they are only undefined for cases where they are unused.
CodeRabbit complains that the names of the defaults I added are inconsistent.
And that is technically correct.
But those names are already used in other places in the code, so I'd have to refactor more to make it work properly.
Once being pointed at the fact that the variables already exist,
the AI is as usual quick to apologize, yay.
The repository module was updated to support additional parameters for repository synchronization and authentication. New options were added for ansible collections, ostree, Python packages, and yum repositories, including authentication tokens, filtering controls, and version retention settings. All changes were limited to module documentation and argument specification.
But it doesn't add anything you'd not get from looking at the diff, especially as it contains a large documentation chunk explaining those parameters.
No sequence diagram this time.
That's a good thing!
Overall verdict: noise (even if the amount is small), don't need this.
comments posted
CodeRabbit generated two comments for this PR.
Interestingly, none of them overlapped with the issues ansible-lint and friends found.
get rid of the FIXMEs
Yepp, that's fair
Valid complaint? Yes.
Useful suggestion? Nope. (But it's not possible in this case!)
Wasted time? No.
add validation for the new parameters
Yepp, I forgot these (not intentionally!).
The diff it suggests is nonsense, as it doesn't take into account the existing Ansible and Yum validations, but it clearly has read them as the style etc of the new ones matches.
It also managed to group the parameters correctly by repository type, so it's something.
ifmodule.foreman_params['content_type']!='ansible_collection':invalid_list=[keyforkeyin['ansible_collection_requirements']ifkeyinmodule.foreman_params]ifinvalid_list:module.fail_json(msg="({0}) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list)))++#Validateansible_collectionspecificparameters+ifmodule.foreman_params['content_type']!='ansible_collection':+invalid_list=[keyforkeyin['ansible_collection_auth_token','ansible_collection_auth_url']ifkeyinmodule.foreman_params]+ifinvalid_list:+module.fail_json(msg="({0}) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list)))++#Validateostreespecificparameters+ifmodule.foreman_params['content_type']!='ostree':+invalid_list=[keyforkeyin['depth','exclude_refs','include_refs']ifkeyinmodule.foreman_params]+ifinvalid_list:+module.fail_json(msg="({0}) can only be used with content_type 'ostree'".format(",".join(invalid_list)))++#Validatepythonpackagespecificparameters+ifmodule.foreman_params['content_type']!='python':+invalid_list=[keyforkeyin['excludes','includes','package_types','keep_latest_packages']ifkeyinmodule.foreman_params]+ifinvalid_list:+module.fail_json(msg="({0}) can only be used with content_type 'python'".format(",".join(invalid_list)))++#Validateyumspecificparameter+ifmodule.foreman_params['content_type']!='yum'and'upstream_authentication_token'inmodule.foreman_params:+module.fail_json(msg="upstream_authentication_token can only be used with content_type 'yum'")
Interestingly, it also said "Note: If 'python' is not a valid content_type, please adjust the validation accordingly." which is quite a hint at a bug in itself.
The module currently does not even allow to create content_type=python repositories.
That should have been more prominent, as it's a BUG!
Valid complaint? Yes.
Useful suggestion? Mostly (I only had to merge the Yum and Ansible branches with the existing code).
It did miss-interpret the change to a test playbook as an actual "behavior" change:
"Introduced new playbook variables for database configuration" — there is no database configuration in this repository, just the test playbook using the same metadata as a consumer of the library.
Later on it does say "Playbook metadata and test fixtures", so… unclear whether this is a miss-interpretation or just badly summarized.
As long as you also look at the diff, it won't confuse you, but if you're using the summary as the sole source of information (bad!) it would.
This time the sequence diagram is actually useful, yay.
Again, not 100% accurate: it's missing the fact that saving the parameters is hidden behind an "if enabled" flag — something it did represent correctly for loading them.
Overall verdict: not really useful, don't need this.
comments posted
Here I was a bit surprised, especially as the nitpicks were useful!
Persist-path should respect per-user state locations (nitpick)
My original code used os.environ.get('OBSAH_PERSIST_PATH', '/var/lib/obsah/parameters.yaml') for the location of the persistence file.
CodeRabbit correctly pointed out that this won't work for non-root users and one should respect XDG_STATE_HOME.
Ewoud did point that out in his own review, so I am not sure whether CodeRabbit came up with this on its own, or also took the human comments into account.
The suggested code seems fine too — just doesn't use /var/lib/obsah at all anymore.
This might be a good idea for the generic library we're working on here, and then be overridden to a static /var/lib path in a consumer (which always runs as root).
In the end I did not implement it, but mostly because I was lazy and was sure we'd override it anyway.
Valid complaint? Yes.
Useful suggestion? Yes.
Wasted time? Nope.
Positional parameters are silently excluded from persistence (nitpick)
The library allows you to generate both positional (foo without --) and non-positional (--foo) parameters, but the code I wrote would only ever persist non-positional parameters.
This was intentional, but there is no documentation of the intent in a comment — which the rabbit thought would be worth pointing out.
It's a fair nitpick and I ended up adding a comment.
Valid complaint? Yes.
Useful suggestion? Yes.
Wasted time? Nope.
Enforce FQDN validation for database_host
The library has a way to perform type checking on passed parameters, and one of the supported types is "FQDN" — so a fully qualified domain name, with dots and stuff.
The test playbook I added has a database_host variable, but I didn't bother adding a type to it, as I don't really need any type checking here.
While using "FQDN" might be a bit too strict here — technically a working database connection can also use a non-qualified name or an IP address, I was positively surprised by this suggestion.
It shows that the rest of the repository was taken into context when preparing the suggestion.
Valid complaint? In the context of a test, no. Would that be a real command definition, yes.
Useful suggestion? Yes.
Wasted time? Nope.
reset_args() can raise AttributeError when a key is absent
This is a correct finding, the code is not written in a way that would survive if it tries to reset things that are not set.
However, that's only true for the case where users pass in --reset-<parameter> without ever having set parameter before.
The complaint about the part where the parameter is part of the persisted set but not in the parsed args is wrong — as parsed args inherit from the persisted set.
The suggested code is not well readable, so I ended up fixing it slightly differently.
Valid complaint? Mostly.
Useful suggestion? Meh.
Wasted time? A bit.
Persisted values bypass argparse type validation
When persisting, I just yaml.safe_dump the parsed parameters, which means the YAML will contain native types like integers.
The argparse documentation warns that the type checking argparse does only applies to strings and is skipped if you pass anything else (via default values).
While correct, it doesn't really hurt here as the persisting only happens after the values were type-checked.
So there is not really a reason to type-check them again.
Well, unless the type changes, anyway.
Not sure what I'll do with this comment.
Valid complaint? Nah.
Useful suggestion? Nope.
Wasted time? Not much.
consider using contextlib.suppress
This was added when I asked CodeRabbit for a re-review after pushing some changes.
Interestingly, the PR already contained try: … except: pass code before, and it did not flag that.
Also, the code suggestion contained import contextlib in the middle of the code, instead in the head of the file.
Who would do that?!
But the comment as such was valid, so I fixed it in all places it is applicable, not only the one the rabbit found.
Valid complaint? Yes.
Useful suggestion? Nope.
Wasted time? Nope.
workaround to ensure LCE and CV are always sent together
A workaround was added to the _update_entity method in the ForemanAnsibleModule class to ensure that when updating a host, both content_view_id and lifecycle_environment_id are always included together in the update payload. This prevents partial updates that could cause inconsistencies.
Partial updates are not a thing.
The workaround is purely for the fact that Katello expects both parameters to be sent,
even if only one of them needs an actual update.
No diagram, good.
Overall verdict: misleading summaries are bad!
comments posted
Given a small patch, there was only one comment.
Implementation looks correct, but consider adding error handling for robustness.
This reads correct on the first glance.
More error handling is always better, right?
But if you dig into the argumentation, you see it's wrong.
Either:
we're working with a Katello setup and the host we're updating has content, so CV and LCE will be present
we're working with a Katello setup and the host has no content (yet), so CV and LCE will be "updated" and we're not running into the workaround
we're working with a plain Foreman, then both parameters are not even accepted by Ansible
The AI accepted defeat once I asked it to analyze things in more detail, but why did I have to ask in the first place?!
Valid complaint? Nope.
Useful suggestion? Nope.
Wasted time? Yes, as I've actually tried to come up with a case where it can happen.
Summary
Well, idk, really.
Did the AI find things that humans did not find (or didn't bother to mention)?
Yes. It's debatable whether these were useful (see e.g. the database_host example), but I tend to be in the "better to nitpick/suggest more and dismiss than oversee" team, so IMHO a positive win.
Did the AI output help the humans with the review (useful summary etc)?
In my opinion it did not.
The summaries were either "lots of words, no real value" or plain wrong.
The sequence diagrams were not useful either.
Luckily all of that can be turned off in the settings, which is what I'd do if I'd continue using it.
Did the AI output help the humans with the code (useful suggestions etc)?
While the actual patches it posted were "meh" at best, there were useful findings that resulted in improvements to the code.
Was the AI output misleading?
Absolutely! The whole Jinja discussion would have been easier without the AI "help".
Same applies for the "error handling" in the workaround PR.
Was the AI output distracting?
The output is certainly a lot, so yes I think it can be distracting.
As mentioned, I think dropping the summaries can make the experience less distracting.
What does all that mean?
I will disable the summaries for the repositories, but will leave the @coderabbitai review trigger active if someone wants an AI-assisted review.
This won't be something that I'll force on our contributors and maintainers, but they surely can use it if they want.
But I don't think I'll be using this myself on a regular basis.
Yes, it can be made "usable". But so can be vim ;-)
Also, I'd prefer to have a junior human asking all the questions and making bad suggestions, so they can learn from it, and not some planet burning machine.
I'm lucky enough to have a weird niche ISP available to me, so I'm paying $35 a month for around 600MBit symmetric data. Unfortunately they don't offer static IP addresses to residential customers, and nor do they allow multiple IP addresses per connection, and I'm the sort of person who'd like to run a bunch of stuff myself, so I've been looking for ways to manage this.
What I've ended up doing is renting a cheap VPS from a vendor that lets me add multiple IP addresses for minimal extra cost. The precise nature of the VPS isn't relevant - you just want a machine (it doesn't need much CPU, RAM, or storage) that has multiple world routeable IPv4 addresses associated with it and has no port blocks on incoming traffic. Ideally it's geographically local and peers with your ISP in order to reduce additional latency, but that's a nice to have rather than a requirement.
By setting that up you now have multiple real-world IP addresses that people can get to. How do we get them to the machine in your house you want to be accessible? First we need a connection between that machine and your VPS, and the easiest approach here is Wireguard. We only need a point-to-point link, nothing routable, and none of the IP addresses involved need to have anything to do with any of the rest of your network. So, on your local machine you want something like:
The addresses here are (other than the VPS address) arbitrary - but they do need to be consistent, otherwise Wireguard is going to be unhappy and your packets will not have a fun time. Bring that interface up with wg-quick and make sure the devices can ping each other. Hurrah! That's the easy bit.
Now you want packets from the outside world to get to your internal machine. Let's say the external IP address you're going to use for that machine is 321.985.520.309 and the wireguard address of your local system is 867.420.696.005. On the VPS, you're going to want to do:
Now, all incoming packets for 321.985.520.309 will be rewritten to head towards 867.420.696.005 instead (make sure you've set net.ipv4.ip_forward to 1 via sysctl!). Victory! Or is it? Well, no.
What we're doing here is rewriting the destination address of the packets so instead of heading to an address associated with the VPS, they're now going to head to your internal system over the Wireguard link. Which is then going to ignore them, because the AllowedIPs statement in the config only allows packets coming from your VPS, and these packets still have their original source IP. We could rewrite the source IP to match the VPS IP, but then you'd have no idea where any of these packets were coming from, and that sucks. Let's do something better. On the local machine, in the peer, let's update AllowedIps to 0.0.0.0/0 to permit packets form any source to appear over our Wireguard link. But if we bring the interface up now, it'll try to route all traffic over the Wireguard link, which isn't what we want. So we'll add table = off to the interface stanza of the config to disable that, and now we can bring the interface up without breaking everything but still allowing packets to reach us. However, we do still need to tell the kernel how to reach the remote VPN endpoint, which we can do with ip route add vpswgaddr dev wg0. Add this to the interface stanza as:
PostUp = ip route add vpswgaddr dev wg0 PreDown = ip route del vpswgaddr dev wg0
That's half the battle. The problem is that they're going to show up there with the source address still set to the original source IP, and your internal system is (because Linux) going to notice it has the ability to just send replies to the outside world via your ISP rather than via Wireguard and nothing is going to work. Thanks, Linux. Thinux.
But there's a way to solve this - policy routing. Linux allows you to have multiple separate routing tables, and define policy that controls which routing table will be used for a given packet. First, let's define a new table reference. On the local machine, edit /etc/iproute2/rt_tables and add a new entry that's something like:
1 wireguard
where "1" is just a standin for a number not otherwise used there. Now edit your wireguard config and replace table=off with table=wireguard - Wireguard will now update the wireguard routing table rather than the global one. Now all we need to do is to tell the kernel to push packets into the appropriate routing table - we can do that with ip rule add from localaddr lookup wireguard, which tells the kernel to take any packet coming from our Wireguard address and push it via the Wireguard routing table. Add that to your Wireguard interface config as:
PostUp = ip rule add from localaddr lookup wireguard PreDown = ip rule del from localaddr lookup wireguard and now your local system is effectively on the internet.
You can do this for multiple systems - just configure additional Wireguard interfaces on the VPS and make sure they're all listening on different ports. If your local IP changes then your local machines will end up reconnecting to the VPS, but to the outside world their accessible IP address will remain the same. It's like having a real IP without the pain of convincing your ISP to give it to you.
The Internet has changed a lot in the last 40+ years. Fads have come and gone.
Network protocols have been designed, deployed, adopted, and abandoned.
Industries have come and gone. The types of people on the internet have changed
a lot. The number of people on the internet has changed a lot, creating an
information medium unlike anything ever seen before in human history. There’s a
lot of good things about the Internet as of 2025, but there’s also an
inescapable hole in what it used to be, for me.
I miss being able to throw a site up to send around to friends to play with
without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my
website, costing me thousands in network transfer for the privilege. I miss
being able to put a lightly authenticated game server up and not worry too much
at night – wondering if that process is now mining bitcoin. I miss being able
to run a server in my home closet. Decades of cat and mouse games have rendered
running a mail server nearly impossible. Those who are “brave” enough to try
are met with weekslong stretches of delivery failures and countless hours
yelling ineffectually into a pipe that leads from the cheerful lobby of some
disinterested corporation directly into a void somewhere 4 layers below ground
level.
I miss the spirit of curiosity, exploration, and trying new things. I miss
building things for fun without having to worry about being too successful,
after which “security” offices start demanding my supplier paperwork in
triplicate as heartfelt thanks from their engineering teams. I miss communities
that are run because it is important to them, not for ad revenue. I miss
community operated spaces and having more than four websites that are all full
of nothing except screenshots of each other.
Every other page I find myself on now has an AI generated click-bait title,
shared for rage-clicks all brought-to-you-by-our-sponsors–completely covered
wall-to-wall with popup modals, telling me how much they respect my privacy,
with the real content hidden at the bottom bracketed by deceptive ads served by
companies that definitely know which new coffee shop I went to last month.
This is wrong, and those who have seen what was know it.
I can’t keep doing it. I’m not doing it any more. I reject the notion that
this is as it needs to be. It is wrong. The hole left in what the Internet used
to be must be filled. I will fill it.
What comes before part b?
Throughout the 2000s, some of my favorite memories were from LAN parties at my
friends’ places. Dragging your setup somewhere, long nights playing games,
goofing off, even building software all night to get something working—being
able to do something fiercely technical in the context of a uniquely social
activity. It wasn’t really much about the games or the projects—it was an
excuse to spend time together, just hanging out. A huge reason I learned so
much in college was that campus was a non-stop LAN party – we could freely
stand up servers, talk between dorms on the LAN, and hit my dorm room computer
from the lab. Things could go from individual to social in the matter of
seconds. The Internet used to work this way—my dorm had public IPs handed out
by DHCP, and my workstation could serve traffic from anywhere on the internet.
I haven’t been back to campus in a few years, but I’d be surprised if this were
still the case.
In December of 2021, three of us got together and connected our houses together
in what we now call The Promised LAN. The idea is simple—fill the hole we feel
is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a
space that is intrinsically social, even though we’re doing technical things.
We can freely host insecure game servers or one-off side projects without
worrying about what someone will do with it.
Over the years, it’s evolved very slowly—we haven’t pulled any all-nighters.
Our mantra has become “old growth”, building each layer carefully. As of May
2025, the LAN is now 19 friends running around 25 network segments. Those 25
networks are connected to 3 backbone nodes, exchanging routes and IP traffic
for the LAN. We refer to the set of backbone operators as “The Bureau of LAN
Management”. Combined decades of operating critical infrastructure has
driven The Bureau to make a set of well-understood, boring, predictable,
interoperable and easily debuggable decisions to make this all happen.
Nothing here is exotic or even technically interesting.
Applications of trusting trust
The hardest part, however, is rejecting the idea that anything outside our own
LAN is untrustworthy—nearly irreversible damage inflicted on us by the
Internet. We have solved this by not solving it. We strictly control
membership—the absolute hard minimum for joining the LAN requires 10 years of
friendship with at least one member of the Bureau, with another 10 years of
friendship planned. Members of the LAN can veto new members even if all other
criteria is met. Even with those strict rules, there’s no shortage of friends
that meet the qualifications—but we are not equipped to take that many folks
on. It’s hard to join—-both socially and technically. Doing something malicious
on the LAN requires a lot of highly technical effort upfront, and it would
endanger a decade of friendship. We have relied on those human, social,
interpersonal bonds to bring us all together. It’s worked for the last 4 years,
and it should continue working until we think of something better.
We assume roommates, partners, kids, and visitors all have access to The
Promised LAN. If they’re let into our friends' network, there is a level of
trust that works transitively for us—I trust them to be on mine. This LAN is
not for “security”, rather, the network border is a social one. Benign
“hacking”—in the original sense of misusing systems to do fun and interesting
things—is encouraged. Robust ACLs and firewalls on the LAN are, by definition,
an interpersonal—not technical—failure. We all trust every other network
operator to run their segment in a way that aligns with our collective values
and norms.
Over the last 4 years, we’ve grown our own culture and fads—around half of the
people on the LAN have thermal receipt printers with open access, for printing
out quips or jokes on each other’s counters. It’s incredible how much network
transport and a trusting culture gets you—there’s a 3-node IRC network, exotic
hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and
even a SIP phone network of “redphones”.
DIY
We do not wish to, nor will we, rebuild the internet. We do not wish to, nor
will we, scale this. We will never be friends with enough people, as hard as we
may try. Participation hinges on us all having fun. As a result, membership
will never be open, and we will never have enough connected LANs to deal with
the technical and social problems that start to happen with scale. This is a
feature, not a bug.
This is a call for you to do the same. Build your own LAN. Connect it with
friends’ homes. Remember what is missing from your life, and fill it in. Use
software you know how to operate and get it running. Build slowly. Build your
community. Do it with joy. Remember how we got here. Rebuild a community space
that doesn’t need to be mediated by faceless corporations and ad revenue. Build
something sustainable that brings you joy. Rebuild something you use daily.
Took some time yesterday to upload the current state of what will
be at some point vym 3 to experimental. If you're a user of this
tool you can give it a try, but be aware that the file format changed, and
can't be processed with vym releases before 2.9.500! Thus it's
important to create a backup until you're sure that you're ready
to move on. On the technical side this is also the switch from Qt5 to Qt6.
I was not aware that one can write bad Markdown, since Markdown has such a
simple syntax, that I thought you just write, and it’s fine. Naïve, I know!
I’ve started editing the files for this blog/site with Visual Studio Code too,
and I had from another project the markdown lint
extension
installed, so as I was opening old files, more and more problems appeared. On a
whim, I searched and found the “lint all files� command, and after running it,
oops—more than 400 problems!
Now, some of them were entirely trivial and a matter of subjective style, like
mixing both underscore and asterisk for emphasis in a single file, and asterisks
and dashes for list items. Others, seemingly trivial like tab indentation, were
actually also causing rendering issues, so fixing that solved a real cosmetic
issue.
But some of the issues flagged were actual problems. For example, one sentence
that I had, was:
there seems to be some race condition between <something> and ntp
Here “something� was interpreted as an (invalid) HTML tag, and not rendered at
all.
Another problem, but more minor, was that I had links to Wikipedia with spaces
in the link name, which Visual Studio Code breaks at first space, rather than
encoded spaces or underscores-based, as Wikipedia generates today. In the
rendered output, Pandoc seemed to do the right think though.
However, the most interesting issue that was flagged was no details in HTML links, i.e. links of the form:
for more details, see [here](http://example.com).
Which works for non-visually impaired people, but not for people using assistive
technologies. And while trying to fix this, it turns out that you can do much
better, for everyone, because “here� is really non-descriptive. You can use
either the content as label (“an article about configuring BIND�), or the
destination (“an article on this-website�), rather than the plain “here�.
The only, really only check I disabled, was tweaking the trailing punctuation
checks in headers, as I really like to write a header that ends with exclamation
marks. I like exclamation marks in general! So why not use them in headers too.
The question mark is allowlisted by default, though that I use rarely.
During the changes/tweaks, I also did random improvements, but I didn’t change
the updated tag, since most of them were minor. But a non-minor thing was
tweaking the CSS for code blocks, since I had a really stupid non-symmetry
between top and bottom padding (5px vs 0), and which I don’t know where it came
from. But the MDN article on
padding has as an
example exactly what I had (except combined, I had it split). Did I just copy
blindly? Possible…
So, all good and then, and I hope this doesn’t trigger a flow of updates on any
aggregators, since all the changes were really trivial. And while I don’t write
often, I did touch about 60 posts or pages, ouch! Who knew that changing editors
can have such a large impact 😆
This time I seem to be settling on either Commit Mono or Space
Mono. For now I'm using Commit Mono because it's a little more
compressed than Fira and does have a italic version. I don't like how
Space Mono's parenthesis (()) is "squarish", it feels visually
ambiguous with the square brackets ([]), a big no-no for my primary
use case (code).
So here I am using a new font, again. It required changing a bunch of
configuration files in my home directory (which is in a private
repository, sorry) and Emacs configuration (thankfully that's
public!).
One gotcha is I realized I didn't actually have a global font
configuration in Emacs, as some Faces define their own font
family, which overrides the frame defaults.
This is what it looks like, before:
Fira Mono
After:
Commit Mono
(Notice how those screenshots are not sharp? I'm surprised too. The
originals look sharp on my display, I suspect this is something to
do with the Wayland transition. I've tried with both grim and
flameshot, for what its worth. Update: turns out this is a really
complicated issue having to do with displaying images as well as
screenshots, see the issues in shotman and grim.)
And here is an update of those in a single screenshot with the new
test sheet:
Fira and Commit mono with the new test sheet, generated
with foot -W 80x63 -T pop-up -f 'Commit mono:size=12' --hold sh -c
"sed -n '/```/,/```/{/```/d;p}' *fonts-again.md ; printf 'Commit
mono'" 2>/dev/null and foot -W 80x61 -T pop-up -f 'Fira
mono:size=12' --hold sh -c "sed -n '/```/,/```/{/```/d;p}'
*fonts-again.md ; printf 'Fira mono'" 2>/dev/null.
They are pretty similar! Commit Mono feels a bit more vertically
compressed maybe too much so, actually -- the line height feels too
low. But it's heavily customizable so that's something that's
relatively easy to fix, if it's really a problem. Its weight is also a
little heavier and wider than Fira which I find a little distracting
right now, but maybe I'll get used to it.
I like how the ampersand (&) is more traditional, although I'll miss
the exotic one Fira produced... I like how the back quotes (`,
GRAVE ACCENT) drop down low, nicely aligned with the apostrophe. As
I mentioned before, I like how the bar on the "f" aligns with the
other top of letters, something in Fira mono that really annoys me now
that I've noticed it (it's not aligned!).
A UTF-8 test file
Here's the test sheet I've made up to test various characters. I could
have sworn I had a good one like this lying around somewhere but
couldn't find it so here it is, I guess.
So there you have it, got completely nerd swiped by typography
again. Now I can go back to writing a too-long proposal again.
Sources and inspiration for the above:
the unicode(1) command, to lookup individual characters to
disambiguate, for example, - (U+002D HYPHEN-MINUS, the minus
sign next to zero on US keyboards) and − (U+2212 MINUS SIGN, a
math symbol)
searchable list of characters and their names - roughly
equivalent to the unicode(1) command, but in one page, amazingly
the /usr/share/unicode database doesn't have any one file like
this
UTF-8 encoded plain text file - nice examples of edge cases,
curly quotes example and box drawing alignment test which,
incidentally, showed me I needed specific faces customisation in
Emacs to get the Markdown code areas to display properly, also the
idea of comparing various dashes
In my previous blog post about fonts, I
had a list of alternative fonts, but it seems people are not digging
through this, so I figured I would redo the list here to preempt "but
have you tried Jetbrains mono" kind of comments.
My requirements are:
no ligatures: yes, in the previous post, I wanted ligatures but
I have changed my mind. after testing this, I find them distracting,
confusing, and they often break the monospace nature of the display
(note that some folks wrote emacs code to selectively enable
ligatures which is an interesting compromise)z
monospace: this is to display code
italics: often used when writing Markdown, where I do make use of
italics... Emacs falls back to underlining text when lacking italics
which is hard to read
free-ish, ultimately should be packaged in Debian
Here is the list of alternatives I have considered in the past and why
I'm not using them:
agave: recommended by tarzeau, not sure I like the lowercase
a, a bit too exotic, packaged as fonts-agave
Cascadia code: optional ligatures, multilingual, not liking the
alignment, ambiguous parenthesis (look too much like square
brackets), new default for Windows Terminal and Visual Studio,
packaged as fonts-cascadia-code
Fira Code: ligatures, was using Fira Mono from which it is derived,
lacking italics except for forks, interestingly, Fira Code succeeds
the alignment test but Fira Mono fails to show the X signs properly!
packaged as fonts-firacode
Hack: no ligatures, very similar to Fira, italics, good
alternative, fails the X test in box alignment, packaged as
fonts-hack
IBM Plex: irritating website, replaces Helvetica as the IBM
corporate font, no ligatures by default, italics, proportional alternatives,
serifs and sans, multiple languages, partial failure in box alignment test (X signs),
fancy curly braces contrast perhaps too much with the rest of the
font, packaged in Debian as fonts-ibm-plex
Inconsolata: no ligatures, maybe italics? more compressed than
others, feels a little out of balance because of that, packaged in
Debian as fonts-inconsolata
Intel One Mono: nice legibility, no ligatures, alignment issues
in box drawing, not packaged in Debian
Iosevka: optional ligatures, italics, multilingual, good
legibility, has a proportional option, serifs and sans, line height
issue in box drawing, fails dash test, not in Debian
Monoid: optional ligatures, feels much "thinner" than
Jetbrains, not liking alignment or spacing on that one, ambiguous
2Z, problems rendering box drawing, packaged as fonts-monoid
Mononoki: no ligatures, looks good, good alternative, suggested
by the Debian fonts team as part of fonts-recommended, problems
rendering box drawing, em dash bigger than en dash, packaged as
fonts-mononoki
spleen: bitmap font, old school, spacing issue in box drawing
test, packaged as fonts-spleen
sudo: personal project, no ligatures, zero originally not
dotted, relied on metrics for legibility, spacing issue in box
drawing, not in Debian
victor mono: italics are cursive by default (distracting),
ligatures by default, looks good, more compressed than commit mono,
good candidate otherwise, has a nice and compact proof sheet
So, if I get tired of Commit Mono, I might probably try, in order:
Hack
Jetbrains Mono
IBM Plex Mono
Iosevka, Monoki and Intel One Mono are also good options, but have
alignment problems. Iosevka is particularly disappointing as the EM
DASH metrics are just completely wrong (much too wide).
Also note that there is now a package in Debian called fnt to
manage fonts like this locally, including in-line previews (that don't
work in bookworm but should be improved in trixie and later).
Today we reconnect to a previous post, namely #36
on pub/sub for live market monitoring with R and Redis. It
introduced both Redis as well as the
(then fairly recent) extensions to RcppRedis to
support the publish-subscibe (“pub/sub”) model of Redis. In short, it manages both subscribing
clients as well as producer for live, fast and lightweight data
transmission. Using pub/sub is generally more efficient than the
(conceptually simpler) ‘poll-sleep’ loops as polling creates cpu and
network load. Subscriptions are lighterweight as they get notified, they
are also a little (but not much!) more involved as they require a
callback function.
We should mention that Redis has a
recent fork in Valkey that arose when
the former did one of these non-uncommon-among-db-companies licenuse
suicides—which, happy to say, they reversed more recently—so that we now
have both the original as well as this leading fork (among others). Both
work, the latter is now included in several Linux distros, and the C
library hiredis used to
connect to either is still licensed permissibly as well.
All this came about because Yahoo! Finance recently had another
‘hickup’ in which they changed something leading to some data clients
having hiccups. This includes GNOME applet Stocks Extension
I had been running. There is a lively discussion on its issue
#120 suggestions for example a curl wrapper (which then makes each
access a new system call).
Separating data acquisition and presentation
becomes an attractive alternative, especially given how the standard
Python and R accessors to the Yahoo! Finance service continued to work
(and how per post
#36 I already run data acquisition). Moreoever, and somewhat
independently, it occurred to me that the cute (and both funny in its
pun, and very pretty in its display) ActivateLinux
program might offer an easy-enough way to display updates on the
desktop.
There were two aspects to address. First, the subscription side
needed to be covered in either plain C or C++. That, it turns out, is
very straightforward and there are existing documentation and prior
examples (e.g. at StackOverflow) as well as the ability to have an LLM
generate a quick stanza as I did with Claude. A modified variant is now
in the example
repo ‘redis-pubsub-examples’ in file subscriber.c.
It is deliberately minimal and the directory does not even have a
Makefile: just compile and link against both
libevent (for the event loop controlling this) and
libhiredis (for the Redis or Valkey connection). This
should work on any standard Linux (or macOS) machine with those two
(very standard) libraries installed.
The second aspect was trickier. While we can get Claude to modify the
program to also display under x11, it still uses a single controlling
event loop. It took a little bit of probing on my event to understand
how to modify (the x11 use of) ActivateLinux,
but as always it was reasonably straightforward in the end: instead of
one single while loop awaiting events we now first check
for pending events and deal with them if present but otherwise do not
idle and wait but continue … in another loop that also checks on the Redis or Valkey “pub/sub” events. So two thumbs up
to vibe coding
which clearly turned me into an x11-savvy programmer too…
The result is in a new (and currently fairly bare-bones) repo almm. It includes all
files needed to build the application, borrowed with love from ActivateLinux
(which is GPL-licensed, as is of course our minimal extension) and adds
the minimal modifications we made, namely linking with
libhiredis and some minimal changes to
x11/x11.c. (Supporting wayland as well is on the TODO list,
and I also need to release a new RcppRedis version
to CRAN as one currently needs
the GitHub version.)
We also made a simple mp4 video with a sound overlay which describes
the components briefly:
Comments and questions welcome. I will probably add a little bit of
command-line support to the almm. Selecting the
symbol subscribed to is currently done in the most minimal way via
environment variable SYMBOL (NB: not SYM as
the video using the default value shows). I also worked out how to show
the display only one of my multiple monitors so I may add an explicit
screen id selector too. A little bit of discussion (including minimal Docker use around r2u) is also in issue
#121 where I first floated the idea of having StocksExtension
listen to Redis (or Valkey). Other suggestions are most
welcome, please use issue tickets at the almm repository.
I have a few pictures on this blog, mostly in earlier years, because even with
small pictures, the git repository became 80MiB soon—this is not much in
absolute terms, but the actual Markdown/Haskell/CSS/HTML total size is tiny
compared to the picture, PDFs and fonts. I realised I need a better solution,
probably about ten years ago, and that I should investigate
git-annex. Then time passed, and I heard
about git-lfs, so I thought that’s the way forward.
Now, I recently got interested again into doing something about this repository,
and started researching.
Detour: git-lfs
I was sure that git-lfs, being supported by large providers, would be the
modern solution. But to my surprise, git-lfs is very server centric, which in
hindsight makes sense, but for a home setup, it’s not very good. Maybe I
misunderstood, but git-lfs is more a protocol/method for a forge to store
files, rather than an end-user solution. But then you need to backup those files
separately (together with the rest of the forge), or implement another way of
safeguarding them.
Further details such as the fact that it keeps two copies of the files (one in
the actual checked-out tree, one in internal storage) means it’s not a good
solution. Well, for my blog yes, but not in general. Then posts on Reddit about
horror stories—people being locked out of github due to quota, as an example, or
this Stack Overflow
post
about git-lfs constraining how one uses git, convinced me that’s not what I
want. To each their own, but not for me—I might want to push this blog’s repo to
github, but I definitely wouldn’t want in that case to pay for github storage
for my blog images (which are copies, not originals). And yes, even in 2025,
those quotas are real—GitHub
limits—and
I agree with GitHub, storage and large bandwidth can’t be free.
Back to the future: git-annex
So back to git-annex. I thought it’s going to be a simple thing, but oh boy,
was I wrong. It took me half a week of continuous (well, in free time) reading
and discussions with LLMs to understand a bit how it works. I think, honestly,
it’s a bit too complex, which is why the workflows
page lists seven (!) levels of
workflow complexity, from fully-managed, to fully-manual. IMHO, respect to the
author for the awesome tool, but if you need a web app to help you manage git,
it hints that the tool is too complex.
I made the mistake of running git annex sync once, to realise it actually
starts pushing to my upstream repo and creating new branches and whatnot, so
after enough reading, I settled on workflow 6/7, since I don’t want another tool
to manage my git history. Maybe I’m an outlier here, but everything “automatic�
is a bit too much for me.
Once you do managed yourself how git-annex works (on the surface, at least), it
is a pretty cool thing. It uses a git-annex git branch to store
metainformation, and that is relatively clean. If you do run git annex sync,
it creates some extra branches, which I don’t like, but meh.
Trick question: what is a remote?
One of the most confusing things about git-annex was understanding its “remote�
concept. I thought a “remote� is a place where you replicate your data. But not,
that’s a special remote. A normal remote is a git remote, but which is
expected to be git/ssh/with command line access. So if you have a git+ssh
remote, git-annex will not only try to push it’s above-mentioned branch, but
also copy the files. If such a remote is on a forge that doesn’t support
git-annex, then it will complain and get confused.
Of course, if you read the extensive docs, you just do git config remote.<name>.annex-ignore true, and it will understand that it should not
“sync� to it.
But, aside, from this case, git-annex expects that all checkouts and clones of
the repository are both metadata and data. And if you do any annex commands in
them, all other clones will know about them! This can be unexpected, and you
find people complaining about it, but nowadays there’s a solution:
git clone … dir && cd dirgit config annex.private truegit annex init "temp copy"
This is important. Any “leaf� git clone must be followed by that annex.private true config, especially on CI/CD machines. Honestly, I don’t understand why
by default clones should be official data stores, but it is what it is.
I settled on not making any of my checkouts “stable�, but only the actual
storage places. Except those are not git repositories, but just git-annex
storage things. I.e., special remotes.
Is it confusing enough yet ? 😄
Special remotes
The special remotes, as said, is what I expected to be the normal git annex
remotes, i.e. places where the data is stored. But well, they exist, and while
I’m only using a couple simple ones, there is a large number of
them. Among the interesting
ones: git-lfs, a
remote that allows also storing the git repository itself
(git-remote-annex),
although I’m bit confused about this one, and most of the common storage
providers via the rclone
remote.
Plus, all of the special remotes support encryption, so this is a really neat
way to store your files across a large number of things, and handle replication,
number of copies, from which copy to retrieve, etc. as you with.
And many of other features
git-annex has tons of other features, so to some extent, the sky’s the limit.
Automatic selection of what to add git it vs plain git, encryption handling,
number of copies, clusters, computed files, etc. etc. etc. I still think it’s
cool but too complex, though!
Uses
Aside from my blog post, of course.
I’ve seen blog posts/comments about people using git-annex to track/store their
photo collection, and I could see very well how the remote encrypted repos—any
of the services supported by rclone could be an N+2 copy or so. For me, tracking
photos would be a bit too tedious, but it could maybe work after more research.
A more practical thing would probably be replicating my local movie collection
(all legal, to be clear) better than “just run rsync from time to time� and
tracking the large files in it via git-annex. That’s an exercise for another
day, though, once I get more mileage with it - my blog pictures are copies, so I
don’t care much if they get lost, but movies are primary online copies, and I
don’t want to re-dump the discs. Anyway, for later.
Migrating to git-annex
Migrating here means ending in a state where all large files are in git-annex,
and the plain git repo is small. Just moving the files to git annex at the
current head doesn’t remove them from history, so your git repository is still
large; it won’t grow in the future, but remains with old size (and contains the
large files in its history).
In my mind, a nice migration would be: run a custom command, and all the history
is migrated to git-annex, so I can go back in time and the still use git-annex.
I naïvely expected this would be easy and already available, only to find
comments on the git-annex site with unsure git-filter-branch calls and some
web discussions. This is the
discussion
on the git annex website, but it didn’t make me confident it would do the right
thing.
But that discussion is now 8 years old. Surely in 2025, with git-filter-repo,
it’s easier? And, maybe I’m missing something, but it is not. Not from the point
of view of plain git, that’s easy, but because interacting with git-annex, which
stores its data in git itself, so doing this properly across successive steps of
a repo (when replaying the commits) is, I think, not well defined behaviour.
So I was stuck here for a few days, until I got an epiphany: As I’m going to
rewrite the repository, of course I’m keeping a copy of it from before
git-annex. If so, I don’t need the history, back in time, to be correct in the
sense of being able to retrieve the binary files too. It just needs to be
correct from the point of view of the actual Markdown and Haskell files that
represent the “meat� of the blog.
This simplified the problem a lot. At first, I wanted to just skip these files,
but this could also drop commits (git-filter-repo, by default, drops the commits
if they’re empty), and removing the files loses information - when they were
added, what were the paths, etc. So instead I came up with a rather clever idea,
if I might say so: since git-annex replaces files with symlinks already, just
replace the files with symlinks in the whole history, except symlinks that
are dangling (to represent the fact that files are missing). One could also use
empty files, but empty files are more “valid� in a sense than dangling symlinks,
hence why I settled on those.
Doing this with git-filter-repo is easy, in newer versions, with the
new --file-info-callback. Here is the simple code I used:
This goes and replaces files with a symlink to nowhere, but the symlink should
explain why it’s dangling. Then later renames or moving the files around work
“naturally�, as the rename/mv doesn’t care about file contents. Then, when the
filtering is done via:
copy the (binary) files from the original repository
since they’re named the same, and in the same places, git sees a type change
then simply run git annex add on those files
For me it was easy as all such files were in a few directories, so just copying
those directories back, a few git-annex add commands, and done.
Of course, then adding a few rsync remotes, git annex copy --to, and the
repository was ready.
Well, I also found a bug in my own Hakyll setup: on a fresh clone, when the
large files are just dangling symlinks, the builder doesn’t complain, just
ignores the images. Will have to fix.
Other resources
This is a blog that I read at the beginning, and I found it very useful as an
intro: https://switowski.com/blog/git-annex/. It didn’t help me understand how
it works under the covers, but it is well written. The author does use the
‘sync’ command though, which is too magic for me, but also agrees about its
complexity 😅
The proof is in the pudding
And now, for the actual first image to be added that never lived in the old
plain git repository. It’s not full-res/full-size, it’s cropped a bit on the
bottom.
Earlier in the year, I went to Paris for a very brief work trip, and I walked
around a bit—it was more beautiful than what I remembered from way way back. So
a bit random selection of a picture, but here it is:
Large language models (LLMs) have awed the world, emerging as the fastest-growing application of all time–ChatGPT reached 100 million active users in January 2023, just two months after its launch. After an initial cycle, they have gradually been mostly accepted and incorporated into various workflows, and their basic mechanics are no longer beyond the understanding of people with moderate computer literacy. Now, given that the technology is better understood, we face the question of how convenient LLM chatbots are for different occupations. This paper embarks on the question of whether LLMs can be useful for networking applications.
This paper systematizes querying three popular LLMs (GPT-3.5, GPT-4, and Claude 3) with questions taken from several network management online courses and certifications, and presents a taxonomy of six axes along which the incorrect responses were classified:
Accuracy: the correctness of the answers provided by LLMs;
Detectability: how easily errors in the LLM output can be identified;
Cause: for each incorrect answer, the underlying causes behind the error;
Explainability: the quality of the explanations with which the LLMs support their answers;
Effects: the impact of wrong answers on users; and
Stability: whether a minor change, such as a change in the order of the prompts, yields vastly different answers for a single query.
The authors also measure four strategies toward improving answers:
Self-correction: giving the original question and received answer back to the LLM, as well as the expected correct answer, as part of the prompt;
One-shot prompting: adding to the prompt “when answering user questions, follow this example” followed by a similar correct answer;
Majority voting: using the answer that most models agree upon; and
Fine-tuning: further training on a specific dataset to adapt the LLM to a particular task or domain.
The authors observe that, while some of those strategies were marginally useful, they sometimes resulted in degraded performance.
The authors queried the commercially available instances of Gemini and GPT, which achieved scores over 90 percent for basic subjects but fared notably worse in topics that require understanding and converting between different numeric notations, such as working with Internet protocol (IP) addresses, even if they are trivial (that is, presenting the subnet mask for a given network address expressed as the typical IPv4 dotted-quad representation).
As a last item in the paper, the authors compare performance with three popular open-source models: Llama3.1, Gemma2, and Mistral with their default settings. Although those models are almost 20 times smaller than the GPT-3.5 commercial model used, they reached comparable performance levels. Sadly, the paper does not delve deeper into these models, which can be deployed locally and adapted to specific scenarios.
The paper is easy to read and does not require deep mathematical or AI-related knowledge. It presents a clear comparison along the described axes for the 503 multiple-choice questions presented. This paper can be used as a guide for structuring similar studies over different fields.
If you ever face the need to activate the PROXY Protocol in HaProxy
(e.g. if you're as unlucky as I'm, and you have to use Google Cloud TCP
proxy load balancer), be aware that there are two ways to do that.
Both are part of the frontend configuration.
accept-proxy
This one is the big hammer and forces the usage of the PROXY protocol
on all connections. Sample:
If you have to, e.g. during a phase of migrations, receive traffic directly, without
the PROXY protocol header and from a proxy with the header there is also a more
flexible option based on a tcp-request connection action. Sample:
Source addresses here are those of GCP global TCP proxy frontends. Replace with whatever
suites your case. Since this is happening just after establishing a TCP connection,
there is barely anything else available to match on beside of the source address.
Yes, something was setting an ACL on it. Thus began to saga to figure out what was doing that.
Firing up inotifywatch, I saw it was systemd-udevd or its udev-worker. But cranking up logging on that to maximum only showed me that uaccess was somehow doing this.
I started digging. uaccess turned out to be almost entirely undocumented. People say to use it, but there’s no description of what it does or how. Its purpose appears to be to grant access to devices to those logged in to a machine by dynamically adding them to ACLs for devices. OK, that’s a nice goal, but why was machine A doing this and not machine B?
I dug some more. I came across a hint that uaccess may only do that for a “seat”. A seat? I’ve not heard of that in Linux before.
Turns out there’s some information (older and newer) about this out there. Sure enough, on the machine with KDE, loginctl list-sessions shows me on seat0, but on the machine where I log in from ttyUSB0, it shows an empty seat.
But how to make myself part of the seat? I tried various udev rules to add the “seat” or “master-of-seat” tags, but nothing made any difference.
I finally gave up and did the old-fashioned rule to just make it work already:
This was my hundred-thirty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on:
[DLA 4168-1] openafs security update of three CVEs related to theft of credentials, crashes or buffer overflows.
[DLA 4196-1] kmail-account-wizard security update to fix one CVE related to a man-in-the-middle attack when using http instead of https to get some configuration.
[DLA 4198-1] espeak-ng security update to fix five CVEs related to buffer overflow or underflow in several functions and a floating point exception. Thanks to Samuel Thibault for having a look at my debdiff.
[#1106867] created Bookworm pu-bug for kmail-account-wizard. Thanks to Patrick Franz for having a look at my debdiff.
I also continued my to work on libxmltok and suricata. This month I also had to do some support on seger, for example to inject packages newly needed for builds.
Debian ELTS
This month was the eighty-second ELTS month. During my allocated time I uploaded or worked on:
[ELA-1444-1] kmail-account-wizard security update to fix two CVEs in Buster related to a man-in-the-middle attack when using http instead of https to get some configuration. The other issue is about a misleading UI, in which the state of encryption is shown wrong.
[ELA-1445-1] espeak-ng security update to fix five CVEs in Stretch and Buster. The issues are related to buffer overflow or underflow in several functions and a floating point exception.
All packages I worked on have been on the list of longstanding packages. For example espeak-ng has been on this list for more than nine month. I now understood that there is a reason why packages are on this list. Some parts of the software have been almost completely reworked, so that the patches need a “reverse” rework. For some packages this is easy, but for others this rework needs quite some time. I also continued to work on libxmltok and suricata.
Debian Printing
Unfortunately I didn’t found any time to work on this topic.
Thanks a lot to the Release Team who quickly handled all my unblock bugs!
FTP master
It is this time of the year when just a few packages arrive in NEW: it is Hard Freeze. So I enjoy this period and basically just take care of kernels or other important packages. As people seem to be more interested in discussions than in fixing RC bugs, my period of rest seems to continue for a while. So thanks for all this valuable discussions and really thanks to the few people who still take care of Trixie. This month I accepted 146 and rejected 10 packages. The overall number of packages that got accepted was 147.
My Debian contributions this month were all
sponsored by
Freexian. Things were a bit quieter than usual, as for the most part I was
sticking to things that seemed urgent for the upcoming trixie release.
After my appeal for help last month to
debug intermittent sshd crashes, Michel
Casabona helped me put together an environment where I could reproduce it,
which allowed me to track it down to a root
cause and fix it. (I
also found a misuse of
strlcpy affecting at
least glibc-based systems in passing, though I think that was unrelated.)
I backported fixes for some security vulnerabilities to unstable (since
we’re in freeze now so it’s not always appropriate to upgrade to new
upstream versions):
Recently someone in our #remotees channel at work asked about WFH setups and given quite a few things changed in mine, I thought it's time to post an update.
But first, a picture!
(Yes, it's cleaner than usual, how could you tell?!)
desk
It's still the same Flexispot E5B, no change here. After 7 years (I bought mine in 2018) it still works fine.
If I'd have to buy a new one, I'd probably get a four-legged one for more stability (they got quite affordable now), but there is no immediate need for that.
chair
It's still the IKEA Volmar. Again, no complaints here.
hardware
Now here we finally have some updates!
laptop
A Lenovo ThinkPad X1 Carbon Gen 12, Intel Core Ultra 7 165U, 32GB RAM, running Fedora (42 at the moment).
It's connected to a Lenovo ThinkPad Thunderbolt 4 Dock. It just worksâ„¢.
workstation
It's still the P410, but mostly unused these days.
monitor
An AOC U2790PQU 27" 4K. I'm running it at 150% scaling, which works quite decently these days (no comparison to when I got it).
speakers
As the new monitor didn't want to take the old Dell soundbar, I have upgraded to a pair of Alesis M1Active 330 USB.
It's not a Shure, for sure, but does the job well and Christian was quite satisfied with the results when we recorded the Debian and Foreman specials of Focus on Linux.
keyboard
It's still the ThinkPad Compact USB Keyboard with TrackPoint.
I had to print a few fixes and replacement parts for it, but otherwise it's doing great.
Replacement feet, because I broke one while cleaning the keyboard.
USB cable clamp, because it kept falling out and disconnecting.
Seems Lenovo stopped making those, so I really shouldn't break it any further.
mouse
Logitech MX Master 3S. The surface of the old MX Master 2 got very sticky at some point and it had to be replaced.
other
notepad
I'm still terrible at remembering things, so I still write them down in an A5 notepad.
whiteboard
I've also added a (small) whiteboard on the wall right of the desk, mostly used for long term todo lists.
coaster
Turns out Xeon-based coasters are super stable, so it lives on!
yubikey
Yepp, still a thing. Still USB-A because... reasons.
headphones
Still the Bose QC25, by now on the third set of ear cushions, but otherwise working great and the odd 15€ cushion replacement does not justify buying anything newer (which would have the same problem after some time, I guess).
I did add a cheap (~10€) Bluetooth-to-Headphonejack dongle, so I can use them with my phone too (shakes fist at modern phones).
And I do use the headphones more in meetings, as the Alesis speakers fill the room more with sound and thus sometimes produce a bit of an echo.
charger
The Bose need AAA batteries, and so do some other gadgets in the house, so there is a technoline BC 700 charger for AA and AAA on my desk these days.
light
Yepp, I've added an IKEA Tertial and an ALDI "face" light.
No, I don't use them much.
KVM switch
I've "built" a KVM switch out of an USB switch, but given I don't use the workstation that often these days, the switch is also mostly unused.
Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we’ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website.
Security audit of Reproducible Builds tools published
The Open Technology Fund’s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a “whitebox� audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service.
The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility.
[Colleagues] approached me to talk about a reproducibility issue they’d been having with some R code. They’d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using set.seed() to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren’t “just a little bit different� in the way that we’ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare “attestable builds� with reproducible builds by noting an attestable build requires “only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it�, and proceed by determining that t�he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.�
Timo Pohl, Pavel Novák, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is “sparse�, focusing on few individual problems and ecosystems, and therefore identify space for more critical research.
Distribution work
In Debian this month:
Ian Jackson filed a bug against the debian-policy package in order to delve into an issue affecting Debian’s support for cross-architecture compilation, multiple-architecture systems, reproducible builds’ SOURCE_DATE_EPOCH environment variable and the ability to recompile already-uploaded packages to Debian with a new/updated toolchain (binNMUs). Ian identifies a specific case, specifically in the libopts25-dev package, involving a manual page that had interesting downstream effects, potentially affecting backup systems. The bug generated a large number of replies, some of which have references to similar or overlapping issues, such as this one from 2016/2017.
There is now a “Reproducibility Status� link for each app on f-droid.org, listed on every app’s page. Our verification server shows ✔�� or 💔 based on its build results, where ✔�� means our rebuilder reproduced the same APK file and 💔 means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a ✅ for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that “provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.�
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295, 296 and 297 to Debian:
Don’t rely on zipdetails’ --walk argument being available, and only add that argument on newer versions after we test for that. […]
Review and merge support for NuGet packages from Omair Majid. […]
Merge support for an lzma comparator from Will Hollywood. […][…]
Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues […]. This was then uploaded to Debian as version 0.6.0-1.
Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 […][…] and 297 […][…], and disorderfs to version 0.6.0 […][…].
Website updates
Once again, there were a number of improvements made to our website this month including:
Incorporated a number of fixes for the JavaScript SOURCE_DATE_EPOCH snippet from Sebastian Davis, which did not handle non-integer values correctly. […]
Remove the JavaScript example that uses a ‘fixed’ timezone on the SOURCE_DATE_EPOCH page. […]
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.
However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL’s public post, “recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year�. As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Migrating the central jenkins.debian.net server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.
After testing it for almost ten years, the i386 architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386 is no longer supported as a ‘regular’ architecture — there will be no official kernel and no Debian installer for i386 systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386 to amd64.
Another node, ionos17-amd64.debian.net, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.
Lastly, we have been granted access to more riscv64 architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.
Outside of this, a number of smaller changes were also made by Holger Levsen:
Fix a (harmless) typo in the multiarch_versionskew script. […]
In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
Add out of memory detection to the statistics page. […]
Reverse the sorting order on the statistics page. […][…][…][…]
Improve the spacing between statistics groups. […]
Update a (hard-coded) line number in error message detection pertaining to a debrebuild line number. […]
Support Debian unstable in the rebuilder-debian.sh script. […]…]
Rely on rebuildctl to sync only ‘arch-specific’ packages. […][…]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
0xFFFF: Use SOURCE_DATE_EPOCH for date in manual pages.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
The Two Cultures is a term first used by C.P. Snow in a 1959
speech and monograph focused on the split between humanities and the
sciences. Decades later, the term was (quite famously) re-used by Leo
Breiman in a (somewhat prophetic) 2001
article about the split between ‘data models’ and ‘algorithmic
models’. In this note, we argue that statistical computing practice and
deployment can also be described via this Two Cultures
moniker.
Referring to the term linking these foundational pieces is of course
headline bait. Yet when preparing for the discussion of r2u in the invited talk in
Mons (video,
slides),
it occurred to me that there is in fact a wide gulf between two
alternative approaches of using R and, specifically,
deploying packages.
On the one hand we have the approach described by my friend Jeff as “you go to the Apple store,
buy the nicest machine you can afford, install what you need and
then never ever touch it�. A computer / workstation / laptop is
seen as an immutable object where every attempt at change may
lead to breakage, instability, and general chaos—and is hence best
avoided. If you know Jeff, you know he exaggerates. Maybe only slightly
though.
Similarly, an entire sub-culture of users striving for
“reproducibility� (and sometimes also “replicability�) does the same.
This is for example evidenced by the popularity of package renv by Rcpp collaborator and pal Kevin. The expressed hope is
that by nailing down a (sub)set of packages, outcomes are constrained to
be unchanged. Hope springs eternal, clearly. (Personally, if need be, I
do the same with Docker containers and their respective
Dockerfile.)
On the other hand, ‘rolling’ is fundamentally different approach. One
(well known) example is Google building “everything at @HEAD�. The entire (ginormous)
code base is considered as a mono-repo which at any point in
time is expected to be buildable as is. All changes made are pre-tested
to be free of side effects to other parts. This sounds hard, and likely
is more involved than an alternative of a ‘whatever works’ approach of
independent changes and just hoping for the best.
Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to
a ‘staging’ place (Debian calls this the ‘unstable’ distribution) and,
if no side effects are seen, propagated after a fixed number of days to
the rolling distribution (called ‘testing’). With this mechanism,
‘testing’ should always be installable too. And based on the rolling
distribution, at certain times (for Debian roughly every two years) a
release is made from ‘testing’ into ‘stable’ (following more elaborate
testing). The released ‘stable’ version is then immutable (apart from
fixes for seriously grave bugs and of course security updates). So this
provides the connection between frequent and rolling updates, and
produces immutable fixed set: a release.
This Debian approach has been influential for any other
projects—including CRAN as can
be seen in aspects of its system providing a rolling set of curated
packages. Instead of a staging area for all packages, extensive tests
are made for candidate packages before adding an update. This aims to
ensure quality and consistence—and has worked remarkably well. We argue
that it has clearly contributed to the success and renown of CRAN.
Now, when accessing CRAN
from R, we fundamentally have
two accessor functions. But seemingly only one is widely known
and used. In what we may call ‘the Jeff model’, everybody is happy to
deploy install.packages() for initial
installations.
One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package
I go check for a new version because invariably it’s been updated with
18 new major features 😆
And that is why we have two cultures.
Because some of us, yours truly included, also use
update.packages() at recurring (frequent !!) intervals:
daily or near-daily for me. The goodness and, dare I say, gift of
packages is not limited to those by my pal Vincent. CRAN updates all the time, and
updates are (generally) full of (usually excellent) changes, fixes, or
new features. So update frequently! Doing (many but small) updates
(frequently) is less invasive than (large, infrequent) ‘waterfall’-style
changes!
But the fear of change, or disruption, is clearly pervasive. One can
only speculate why. Is the experience of updating so painful on other
operating systems? Is it maybe a lack of exposure / tutorials on best
practices?
These ‘Two Cultures’ coexist. When I delivered the talk in Mons, I
briefly asked for a show of hands among all the R users in the audience to see who
in fact does use update.packages() regularly. And maybe a
handful of hands went up: surprisingly few!
Now back to the context of installing packages: Clearly ‘only
installing’ has its uses. For continuous integration checks we generally
install into ephemeral temporary setups. Some debugging work may be with
one-off container or virtual machine setups. But all other uses may well
be under ‘maintained’ setups. So consider calling
update.packages() once in while. Or even weekly or daily.
The rolling feature of CRAN is a real benefit, and it is
there for the taking and enrichment of your statistical computing
experience.
So to sum up, the real power is to use
install.packages() to obtain fabulous new statistical
computing resources, ideally in an instant; and
update.packages() to keep these fabulous resources
current and free of (known) bugs.
For both tasks, relying on binary installations accelerates
and eases the process. And where available, using binary
installation with system-dependency support as r2u does makes it easier
still, following the r2u slogan of ‘Fast. Easy.
Reliable. Pick All Three.’ Give it a try!
As I wrote in my last post, Twitter's new encrypted DM infrastructure is pretty awful. But the amount of work required to make it somewhat better isn't large.
When Juicebox is used with HSMs, it supports encrypting the communication between the client and the backend. This is handled by generating a unique keypair for each HSM. The public key is provided to the client, while the private key remains within the HSM. Even if you can see the traffic sent to the HSM, it's encrypted using the Noise protocol and so the user's encrypted secret data can't be retrieved.
But this is only useful if you know that the public key corresponds to a private key in the HSM! Right now there's no way to know this, but there's worse - the client doesn't have the public key built into it, it's supplied as a response to an API request made to Twitter's servers. Even if the current keys are associated with the HSMs, Twitter could swap them out with ones that aren't, terminate the encrypted connection at their endpoint, and then fake your query to the HSM and get the encrypted data that way. Worse, this could be done for specific targeted users, without any indication to the user that this has happened, making it almost impossible to detect in general.
This is at least partially fixable. Twitter could prove to a third party that their Juicebox keys were generated in an HSM, and the key material could be moved into clients. This makes attacking individual users more difficult (the backdoor code would need to be shipped in the public client), but can't easily help with the website version[1] even if a framework exists to analyse the clients and verify that the correct public keys are in use.
It's still worse than Signal. Use Signal.
[1] Since they could still just serve backdoored Javascript to specific users. This is, unfortunately, kind of an inherent problem when it comes to web-based clients - we don't have good frameworks to detect whether the site itself is malicious.
(Edit: Twitter could improve this significantly with very few changes - I wrote about that here. It's unclear why they'd launch without doing that, since it entirely defeats the point of using HSMs)
When Twitter[1] launched encrypted DMs a couple of years ago, it was the worst kind of end-to-end encrypted - technically e2ee, but in a way that made it relatively easy for Twitter to inject new encryption keys and get everyone's messages anyway. It was also lacking a whole bunch of features such as "sending pictures", so the entire thing was largely a waste of time. But a couple of days ago, Elon announced the arrival of "XChat", a new encrypted message platform built on Rust with (Bitcoin style) encryption, whole new architecture. Maybe this time they've got it right?
tl;dr - no. Use Signal. Twitter can probably obtain your private keys, and admit that they can MITM you and have full access to your metadata.
The new approach is pretty similar to the old one in that it's based on pretty straightforward and well tested cryptographic primitives, but merely using good cryptography doesn't mean you end up with a good solution. This time they've pivoted away from using the underlying cryptographic primitives directly and into higher level abstractions, which is probably a good thing. They're using Libsodium's boxes for message encryption, which is, well, fine? It doesn't offer forward secrecy (if someone's private key is leaked then all existing messages can be decrypted) so it's a long way from the state of the art for a messaging client (Signal's had forward secrecy for over a decade!), but it's not inherently broken or anything. It is, however, written in C, not Rust[2].
That's about the extent of the good news. Twitter's old implementation involved clients generating keypairs and pushing the public key to Twitter. Each client (a physical device or a browser instance) had its own private key, and messages were simply encrypted to every public key associated with an account. This meant that new devices couldn't decrypt old messages, and also meant there was a maximum number of supported devices and terrible scaling issues and it was pretty bad. The new approach generates a keypair and then stores the private key using the Juicebox protocol. Other devices can then retrieve the private key.
Doesn't this mean Twitter has the private key? Well, no. There's a PIN involved, and the PIN is used to generate an encryption key. The stored copy of the private key is encrypted with that key, so if you don't know the PIN you can't decrypt the key. So we brute force the PIN, right? Juicebox actually protects against that - before the backend will hand over the encrypted key, you have to prove knowledge of the PIN to it (this is done in a clever way that doesn't directly reveal the PIN to the backend). If you ask for the key too many times while providing the wrong PIN, access is locked down.
But this is true only if the Juicebox backend is trustworthy. If the backend is controlled by someone untrustworthy[3] then they're going to be able to obtain the encrypted key material (even if it's in an HSM, they can simply watch what comes out of the HSM when the user authenticates if there's no validation of the HSM's keys). And now all they need is the PIN. Turning the PIN into an encryption key is done using the Argon2id key derivation function, using 32 iterations and a memory cost of 16MB (the Juicebox white paper says 16KB, but (a) that's laughably small and (b) the code says 16 * 1024 in an argument that takes kilobytes), which makes it computationally and moderately memory expensive to generate the encryption key used to decrypt the private key. How expensive? Well, on my (not very fast) laptop, that takes less than 0.2 seconds. How many attempts to I need to crack the PIN? Twitter's chosen to fix that to 4 digits, so a maximum of 10,000. You aren't going to need many machines running in parallel to bring this down to a very small amount of time, at which point private keys can, to a first approximation, be extracted at will.
Juicebox attempts to defend against this by supporting sharding your key over multiple backends, and only requiring a subset of those to recover the original. I can't find any evidence that Twitter's does seem to be making use of this,Twitter uses three backends and requires data from at least two, but all the backends used are under x.com so are presumably under Twitter's direct control. Trusting the keystore without needing to trust whoever's hosting it requires a trustworthy communications mechanism between the client and the keystore. If the device you're talking to can prove that it's an HSM that implements the attempt limiting protocol and has no other mechanism to export the data, this can be made to work. Signal makes use of something along these lines using Intel SGX for contact list and settings storage and recovery, and Google and Apple also have documentation about how they handle this in ways that make it difficult for them to obtain backed up key material. Twitter has no documentation of this, and as far as I can tell does nothing to prove that the backend is in any way trustworthy. (Edit to add: The Juicebox API does support authenticated communication between the client and the HSM, but that relies on you having some way to prove that the public key you're presented with corresponds to a private key that only exists in the HSM. Twitter gives you the public key whenever you communicate with them, so even if they've implemented this properly you can't prove they haven't made up a new key and MITMed you the next time you retrieve your key)
On the plus side, Juicebox is written in Rust, so Elon's not 100% wrong. Just mostly wrong.
But ok, at least you've got viable end-to-end encryption even if someone can put in some (not all that much, really) effort to obtain your private key and render it all pointless? Actually no, since you're still relying on the Twitter server to give you the public key of the other party and there's no out of band mechanism to do that or verify the authenticity of that public key at present. Twitter can simply give you a public key where they control the private key, decrypt the message, and then reencrypt it with the intended recipient's key and pass it on. The support page makes it clear that this is a known shortcoming and that it'll be fixed at some point, but they said that about the original encrypted DM support and it never was, so that's probably dependent on whether Elon gets distracted by something else again. And the server knows who and when you're messaging even if they haven't bothered to break your private key, so there's a lot of metadata leakage.
Signal doesn't have these shortcomings. Use Signal.
[1] I'll respect their name change once Elon respects his daughter
[2] There are implementations written in Rust, but Twitter's using the C one with these JNI bindings
[3] Or someone nominally trustworthy but who's been compelled to act against your interests - even if Elon were absolutely committed to protecting all his users, his overarching goals for Twitter require him to have legal presence in multiple jurisdictions that are not necessarily above placing employees in physical danger if there's a perception that they could obtain someone's encryption keys
Despite comments on my ikiwiki blog being fully moderated, spammers have
been increasingly posting link spam comments on my blog. While I used to use
the blogspam plugin, the
underlying service was likely retired circa
2017 and its public
repositories are all archived.
It turns out that there is a relatively simple way to drastically reduce the
amount of spam submitted to the moderation queue: ban the datacentre IP
addresses that spammers are using.
Looking up AS numbers
It all starts by looking at the IP address of a submitted comment:
From there, we can look it up using whois:
$ whois -r 2a0b:7140:1:1:5054:ff:fe66:85c5
% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See https://docs.db.ripe.net/terms-conditions.html
% Note: this output has been filtered.
% To receive output for a database update, use the "-B" flag.
% Information related to '2a0b:7140:1::/48'
% Abuse contact for '2a0b:7140:1::/48' is 'abuse@servinga.com'
inet6num: 2a0b:7140:1::/48
netname: EE-SERVINGA-2022083002
descr: servinga.com - Estonia
geoloc: 59.4424455 24.7442221
country: EE
org: ORG-SG262-RIPE
mnt-domains: HANNASKE-MNT
admin-c: CL8090-RIPE
tech-c: CL8090-RIPE
status: ASSIGNED
mnt-by: MNT-SERVINGA
created: 2020-02-18T11:12:49Z
last-modified: 2024-12-04T12:07:26Z
source: RIPE
% Information related to '2a0b:7140:1::/48AS207408'
route6: 2a0b:7140:1::/48
descr: servinga.com - Estonia
origin: AS207408
mnt-by: MNT-SERVINGA
created: 2020-02-18T11:18:11Z
last-modified: 2024-12-11T23:09:19Z
source: RIPE
% This query was served by the RIPE Database Query Service version 1.114 (SHETLAND)
Alternatively, you can use this WHOIS server with much better output:
$ whois -h whois.cymru.com -v 2a0b:7140:1:1:5054:ff:fe66:85c5
AS | IP | BGP Prefix | CC | Registry | Allocated | AS Name
207408 | 2a0b:7140:1:1:5054:ff:fe66:85c5 | 2a0b:7140:1::/48 | DE | ripencc | 2017-07-11 | SERVINGA-EE, DE
While I do want to eliminate this source of spam, I don't want to block
these datacentre IP addresses outright since legitimate users could be using
these servers as VPN endpoints or crawlers.
I therefore added the following to my Apache config to restrict the CGI
endpoint (used only for write operations such as commenting):
<Location /blog.cgi>
Include /etc/apache2/spammers.include
Options +ExecCGI
AddHandler cgi-script .cgi
</Location>
and then put the following in /etc/apache2/spammers.include:
<RequireAll>
Require all granted
# https://ipinfo.io/AS207408
Require not ip 46.11.183.0/24
Require not ip 80.77.25.0/24
Require not ip 194.76.227.0/24
Require not ip 2a0b:7140:1::/48
</RequireAll>
Finally, I can restart the website and commit my changes:
$ apache2ctl configtest && systemctl restart apache2.service
$ git commit -a -m "Ban all IP blocks from Servinga"
Future improvements
I will likely automate this process in the future, but at the moment my
blog can go for a week without a single spam message (down from dozens every
day). It's possible that I've already cut off the worst offenders.
Internet users, software developers, academics, entrepreneurs – basically everybody is now aware of the importance of considering privacy as a core part of our online experience. User demand, and various national or regional laws, have made privacy a continuously present subject. And privacy is such an all-encompassing, complex topic, the angles from which it can be studied seems never to finish; I recommend computer networking-oriented newcomers to the topic to refer to Brian Kernighan’s excellent work [1]. However, how do regular people –like ourselves, in our many capacities– feel about privacy? Lukas Antoine presents a series of experiments aiming at better understanding how people throughout the world understands privacy, and when is privacy held as more or less important than security in different aspects,
Particularly, privacy is often portrayed as a value set at tension against surveillance, and particularly state surveillance, in the name of security: conventional wisdom presents the idea of privacy calculus. This is, it is often assumed that individuals continuously evaluate the costs and benefits of divulging their personal data, sharing data when they expect a positive net outcome, and denying it otherwise. This framework has been accepted for decades, and the author wishes to challenge it. This book is clearly his doctoral thesis on political sciences, and its contents are as thorough as expected in this kind of product.
The author presents three empirical studies based on cross-survey analysis. The first experiment explores the security justifications for surveillance and how they influence their support. The second one searches whether the stance on surveillance can be made dependent on personal convenience or financial cost. The third study explores whether privacy attitude is context-dependant or can be seen as a stable personality trait. The studies aim to address the shortcomings of published literature in the field, mainly, (a) the lack of comprehensive research on state surveillance, needed or better understanding privacy appreciation, (b) while several studies have tackled the subjective measure of privacy, there is a lack of cross-national studies to explain wide-ranging phenomena, (c) most studies in this regard are based on population-based surveys, which cannot establish causal relationships, (d) a seemingly blind acceptance of the privacy calculus mentioned above, with no strong evidence that it accurately measures people’s motivations for disclosing or withholding their data. The specific take, including the framing of the tension between privacy and surveillance has long been studied, as can be seen in Steven Nock’s 1993 book [2], but as Sannon’s article in 2022 shows [3], social and technological realities require our undertanding to be continuously kept up to date.
The book is full with theoretical references and does a very good job of explaining the path followed by the author. It is, though, a heavy read, and, for people not coming from the social sciences tradition, leads to the occasional feeling of being lost. The conceptual and theoretical frameworks and presented studies are thorough and clear. The author is honest in explaining when the data points at some of his hypotheses being disproven, while others are confirmed.
The aim of the book is for people digging deep into this topic. Personally, I have authored several works on different aspects of privacy (such as a book [4] and a magazine number [5]), but this book did get me thinking on many issues I had not previously considered. Looking for comparable works, I find Friedewald et al.’s 2017 book [6] chapter organization to follow a similar thought line. My only complaint would be that, for the publication as part of its highly prestigious publisher, little attention has been paid to editorial aspects: sub-subsection depth is often excessive and unclear. Also, when publishing monographs based on doctoral works, it is customary to no longer refer to the work as a “thesis” and to soften some of the formal requirements such a work often has, with the aim of producing a more gentle and readable book; this book seems just like the mass-production of an (otherwise very interesting and well made) thesis work.
References:
[1] Kernighan, B. W. (2021). Understanding the digital world: What you need to know about computers, the internet, privacy, and security. Princeton University Press.
[2] Nock, S. L. (1993). The Costs of Privacy: Surveillance and Reputation in America. De Gruyter.
[3] Sannon, S., Sun, B., Cosley, D. (2022). Privacy, Surveillance, and Power in the Gig Economy. SIGCHI, Association for Computing Machinery.
[4] Wolf, G. (coord), 2021. Mecanismos de privacidad y anonimato en redes. Una visión transdisciplinaria. IIEc-UNAM, México https://www.priv-anon.unam.mx/libro/
[5] XRDS•Crossroads Summer 2018. Pseudonimity and Anonymity. Association for Computing Machinery https://xrds.acm.org/archives.cfm?iid=3239334
[6] Friedewald, M., Burgess, P., Čas, J., Bellanova, R., Peissl, W. (2017). Surveillance, Privacy and Security: Citizens’ Perspectives. Routeledge, Taylor & Francis Group.
Digital humanities is a young–though established–field. It deals with different expressions in which digital data manipulation techniques can be applied and used to analyze subjects that are identified as belonging to the humanities. Although most often used to analyze different aspects of literature or social network analysis, it can also be applied to other humanistic disciplines or artistic expressions. Digital humanities employs many tools, but those categorized as big data are among the most frequently employed. This book samples different takes on digital humanities, with the particularity that it focuses on Ibero-American uses. It is worth noting that this book is the second in a series of four volumes, published or set to be published between 2022 and 2026. Being the output of a field survey, I perceive this book to be targeted towards fellow Digital Humanists – people interested in applying computational methods to further understand and research topics in the humanities. It is not a technical book in the sense Computer Science people would recognize as such, but several of the presented works do benefit from understanding some technical concepts.
The 12 articles (plus an introduction) that make up this book are organized in three parts:
(1) “Theoretical Framework” presents the ideas and techniques of data science (that make up the tools for handling big data), and explores how data science can contribute to literary analysis, all while noting that many such techniques are usually frowned upon in Latin America as data science “smells neoliberal”;
(2) “Methodological Issues” looks at specific issues through the lens of how they can be applied to big data, with specific attention given to works in Spanish; and
(3) “Practical Applications” analyzes specific Spanish works and communities based on big data techniques.
Several chapters treat a recurring theme: the simultaneous resistance and appropriation of big data by humanists. For example, at least three of the chapters describe the tensions between humanism (“aesthesis”) and cold, number-oriented data analysis (“mathesis”).
The analyzed works of Parts 2 and 3 are interesting and relatively easy to follow.
Some inescapable ideological gleans from several word uses – from the book’s and series’ name, which refers to the Spanish-speaking regions as “Ibero-America”, often seen as Eurocentric, in contrast with the “Latin America” term much more widely used throughout the region.
I will end with some notes about the specific versions of the book I reviewed. I read both an EPUB version and a print copy. The EPUB did not include links for easy navigation to footnotes, that is, the typographical superindexes are not hyperlinked to the location of the notes, so it is very impractical to try to follow them. The print version (unlike the EPUB) did not have an index, that is, the six pages before the introduction are missing from the print copy I received. For a book such as this one, not having an index hampers the ease of reading and referencing.
The current boom of artificial intelligence (AI) is based upon neural networks (NNs). In order for these to be useful, the network has to undergo a machine learning (ML) process: work over a series of inputs, and adjust the inner weights of the connections between neurons so that each of the data samples the network was trained on produces the right set of labels for each item. Federated learning (FL) appeared as a reaction given the data centralization power that traditional ML provides: instead of centrally controlling the whole training data, various different actors analyze disjoint subsets of data, and provide only the results of this analysis, thus increasing privacy while analyzing a large dataset. Finally, given multiple actors are involved in FL, how hard is it for a hostile actor to provide data that will confuse the NN, instead of helping it reach better performance? This kind of attack is termed a poisoning attack, and is the main focus of this paper. The authors set out to research how effective can a hyperdimensional data poisoning attack (HDPA) be to confuse a NN and cause it to misclassify both the items trained on and yet unseen items.
Data used for NN training is usually represented as a large set of orthogonal vectors, each describing a different aspect of the item, allowing for very simple vector arithmetic operations. Thus, NN training is termed as high-dimensional or hyperdimensional. The attack method described by the authors employs cosine similarity, that is, in order to preserve similarity, a target hypervector is reflected over a given dimension, yielding a cosine-similar result that will trick ML models, even if using byzantine-robust defenses.
The paper is clear, though not an easy read. It explains in detail the mathematical operations, following several related although different threat models. The authors present the results of the experimental evaluation of their proposed model, comparing it to several other well-known adversarial attacks for visual recognition tasks, over pre-labeled datasets frequently used as training data, such as MNIST, Fashion-MNIST and CIFAR-10. They show that their method is not only more effective as an attack, but falls within the same time range as other surveyed attacks.
Adversarial attacks are, all in all, an important way to advance any field of knowledge; by publishing this attack, the authors will surely spark other works to detect and prevent this kind of alteration. It is important for AI implementers to understand the nature of this field and be aware of the risks that this work, as well as others cited in it, highlight: ML will train a computer system to recognize a dataset, warts and all; efficient as AI is, if noise is allowed into the training data (particularly adversarially generated noise), the trained model might present impaired performance.
I saw this document on running DeepSeek R1 [1] and decided to give it a go. I downloaded the llama.cpp source and compiled it and downloaded the 131G of data as described. Running it with the default options gave about 7 CPU cores in use. Changing the --threads parameter to 44 caused it to use 17 CPU cores (changing it to larger numbers like 80 made it drop to 2.5 cores). I used the --n-gpu-layers parameter with the value of 1 as I currently have a GPU with only 6G of RAM (AliExpress is delaying my delivery of a PCIe power adaptor for a better GPU). Running it like this makes the GPU take 12W more power than standby and using 5.5G of VRAM according to nvidia-smi so it is doing a small amount of work, but not much. The documentation refers to the DeepSeek R1 1.58bit model which I’m using as having 61 layers so presumably less than 2% of the work is done on the GPU.
Running like this it takes 2 hours of CPU time (just over 3 minutes of elapsed time at 17 cores) to give 8 words of output. I didn’t let any tests run long enough to give complete output.
The documentation claims that it will run on CPU with 20G of RAM. In my tests it takes between 161G and 195G of RAM to run depending on the number of threads. The documentation describes running on the CPU as “very slow” which presumably means 3 words per minute on a system with a pair of E5-2699A v4 CPUs and 256G of RAM.
When I try to use more than 44 threads I get output like “system_info: n_threads = 200 (n_threads_batch = 200) / 44” and it seems that I only have a few threads actually in use. Apparently there’s some issue with having more threads than the 44 CPU cores in the system.
I was expecting this to go badly and it met my expectations in that regard. But it was interesting to see exactly how it went badly. It seems that if I had a GPU with 24G of VRAM I’d still have 54/61 layers running on the CPU so even the largest of home GPUs probably wouldn’t make much difference.
Maybe if I configured the server to have hyper-threading enabled and 88 HT cores then I could have 88 threads and about 34 CPU cores in use which might help. But even if I got the output speed from 3 to 6 words per minute that still wouldn’t be very usable.
Have you ever found yourself in the situation where you had no or
anonymized logs and still wanted to figure out where your traffic was
coming from?
Or you have multiple upstreams and are looking to see if you can save
fees by getting into peering agreements with some other party?
Or your site is getting heavy load but you can't pinpoint it on a
single IP and you suspect some amoral corporation is training their
degenerate AI on your content with a bot army?
(You might be getting onto something there.)
If that rings a bell, read on.
TL;DR:
... or just skip the cruft and install asncounter:
pip install asncounter
Also available in Debian 14 or later, or possibly in Debian 13
backports (soon to be released) if people are interested:
tcpdump -q -i eth0 -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)" | asncounter --input-format=tcpdump --repl
Read on for why this matters, and why I wrote yet another weird tool
(almost) from scratch.
Background and manual work
This is a tool I've been dreaming of for a long, long time. Back in
2006, at Koumbit a colleague had setup TAS ("Traffic
Accounting System", "Система учета трафика" in Russian, apparently), a
collection of Perl script that would do per-IP accounting. It was
pretty cool: it would count bytes per IP addresses and, from that, you
could do analysis. But the project died, and it was kind of bespoke.
Fast forward twenty years, and I find myself fighting off bots at the
Tor Project (the irony...), with our GitLab suffering pretty bad
slowdowns (see issue tpo/tpa/team#41677 for the latest public
issue, the juicier one is confidential, unfortunately).
(We did have some issues caused by overloads in CI, as we host, after
all, a fork of Firefox, which is a massive repository, but the
applications team did sustained, awesome work to fix issues on that
side, again and again (see tpo/applications/tor-browser#43121 for
the latest, and tpo/applications/tor-browser#43121 for some
pretty impressive correlation work, I work with really skilled
people). But those issues, I believe were fixed.)
So I had the feeling it was our turn to get hammered by the AI
bots. But how do we tell? I could tell something was hammering at
the costly /commit/ and (especially costly) /blame/ endpoint. So
at first, I pulled out the trusted awk, sort | uniq -c | sort -n |
tail pipeline I am sure others have worked out before:
For people new to this, that pulls the first field out of web server
log files, sort the list, counts the number of unique entries, and
sorts that so that the most common entries (or IPs) show up first,
then show the top 10.
That, other words, answers the question of "which IP address visits
this web server the most?" Based on this, I found a couple of IP
addresses that looked like Alibaba. I had already addressed an abuse
complaint to them (tpo/tpa/team#42152) but never got a response,
so I just blocked their entire network blocks, rather violently:
for cidr in 47.240.0.0/14 47.246.0.0/16 47.244.0.0/15 47.235.0.0/16 47.236.0.0/14; do
iptables-legacy -I INPUT -s $cidr -j REJECT
done
That made Ali Baba and his forty thieves (specifically their
AL-3 network go away, but our load was still high, and I was
still seeing various IPs crawling the costly endpoints. And this time,
it was hard to tell who they were: you'll notice all the Alibaba IPs
are inside the same 47.0.0.0/8 prefix. Although it's not a /8
itself, it's all inside the same prefix, so it's visually easy to
pick it apart, especially for a brain like mine who's stared too long
at logs flowing by too fast for their own mental health.
What I had then was different, and I was tired of doing the stupid
thing I had been doing for decades at this point. I had recently
stumbled upon pyasn recently (in January, according to my notes)
and somehow found it again, and thought "I bet I could write a quick
script that loops over IPs and counts IPs per ASN".
(Obviously, there are lots of other tools out there for that kind of
monitoring. Argos, for example, presumably does this, but it's a kind
of a huge stack. You can also get into netflows, but there's serious
privacy implications with those. There are also lots of per-IP
counters like promacct, but that doesn't scale.
Or maybe someone already had solved this problem and I just wasted a
week of my life, who knows. Someone will let me know, I hope, either
way.)
ASNs and networks
A quick aside, for people not familiar with how the internet
works. People that know about ASNs, BGP announcements and so on can
skip.
The internet is the network of networks. It's made of multiple
networks that talk to each other. The way this works is there is a
Border Gateway Protocol (BGP), a relatively simple TCP-based protocol,
that the edge routers of those networks used to announce each other
what network they manage. Each of those network is called an
Autonomous System (AS) and has an AS number (ASN) to uniquely identify
it. Just like IP addresses, ASNs are allocated by IANA and local
registries, they're pretty cheap and useful if you like running your
own routers, get one.
When you have an ASN, you'll use it to, say, announce to your BGP
neighbors "I have 198.51.100.0/24" over here and the others might
say "okay, and I have 216.90.108.31/19 over here, and I know of this
other ASN over there that has 192.0.2.1/24 too! And gradually, those
announcements flood the entire network, and you end up with each BGP
having a routing table of the global internet, with a map of which
network block, or "prefix" is announced by which ASN.
It's how the internet works, and it's a useful thing to know, because
it's what, ultimately, makes an organisation responsible for an IP
address. There are "looking glass" tools like the one provided by
routeviews.org which allow you to effectively run "trace routes"
(but not the same as traceroute, which actively sends probes from
your location), type an IP address in that form to fiddle with it. You
will end up with an "AS path", the way to get from the looking glass
to the announced network. But I digress, and that's kind of out of
scope.
Point is, internet is made of networks, networks are autonomous
systems (AS) and they have numbers (ASNs), and they announced IP
prefixes (or "network blocks") that ultimately tells you who is
responsible for traffic on the internet.
Introducing asncounter
So my goal was to get from "lots of IP addresses" to "list of ASNs",
possibly also the list of prefixes (because why not). Turns out pyasn
makes that really easy. I managed to build a prototype in probably
less than an hour, just look at the first version, it's 44 lines
(sloccount) of Python, and it works, provided you have already
downloaded the required datafiles from routeviews.org. (Obviously, the
latest version is longer at close to 1000 lines, but it downloads the
data files automatically, and has many more features).
The way the first prototype (and later versions too, mostly) worked is
that you feed it a list of IP addresses on standard input, it looks up
the ASN and prefix associated with the IP, and increments a counter
for those, then print the result.
That showed me something like this:
root@gitlab-02:~/anarcat-scripts# tcpdump -q -i eth0 -n -Q in "(udp or tcp)" | ./asncounter.py --tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
INFO: collecting IPs from stdin, using datfile ipasn_20250523.1600.dat.gz
INFO: loading datfile /root/.cache/pyasn/ipasn_20250523.1600.dat.gz...
INFO: loading /root/.cache/pyasn/asnames.json
ASN count AS
136907 7811 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
[----] 359 [REDACTED]
[----] 313 [REDACTED]
8075 254 MICROSOFT-CORP-MSN-AS-BLOCK, US
[---] 164 [REDACTED]
[----] 136 [REDACTED]
24940 114 HETZNER-AS, DE
[----] 98 [REDACTED]
14618 82 AMAZON-AES, US
[----] 79 [REDACTED]
prefix count
166.108.192.0/20 1294
188.239.32.0/20 1056
166.108.224.0/20 970
111.119.192.0/20 951
124.243.128.0/18 667
94.74.80.0/20 651
111.119.224.0/20 622
111.119.240.0/20 566
111.119.208.0/20 538
[REDACTED] 313
Even without ratios and a total count (which will come later), it was
quite clear that Huawei was doing something big on the server. At that
point, it was responsible for a quarter to half of the traffic on our
GitLab server or about 5-10 queries per second.
But just looking at the logs, or per IP hit counts, it was really hard
to tell. That traffic is really well distributed. If you look more
closely at the output above, you'll notice I redacted a couple of
entries except major providers, for privacy reasons. But you'll also
notice almost nothing is redacted in the prefix list, why? Because
all of those networks are Huawei! Their announcements are kind of
bonkers: they have hundreds of such prefixes.
Now, clever people in the know will say "of course they do, it's an
hyperscaler; just ASN14618 (AMAZON-AES) there is way more
announcements, they have 1416 prefixes!" Yes, of course, but they are
not generating half of my traffic (at least, not yet). But even then:
this also applies to Amazon! This way of counting traffic is way
more useful for large scale operations like this, because you group by
organisation instead of by server or individual endpoint.
And, ultimately, this is why asncounter matters: it allows you to
group your traffic by organisation, the place you can actually
negotiate with.
Now, of course, that assumes those are entities you can talk with. I
have written to both Alibaba and Huawei, and have yet to receive a
response. I assume I never will. In their defence, I wrote in English,
perhaps I should have made the effort of translating my message in
Chinese, but then again English is the Lingua Franca of the
Internet, and I doubt that's actually the issue.
The Huawei and Facebook blocks
Another aside, because this is my blog and I am not looking for a
Pullitzer here.
So I blocked Huawei from our GitLab server (and before you tear your
shirt open: only our GitLab server, everything else is still
accessible to them, including our email server to respond to my
complaint). I did so 24h after emailing them, and after examining
their user agent (UA) headers. Boy that was fun. In a sample of 268
requests I analyzed, they churned out 246 different UAs.
At first glance, they looked legit, like:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
Safari on a Mac, so far so good. But when you start digging, you
notice some strange things, like here's Safari running on Linux:
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.457.0 Safari/534.3
Was Safari ported to Linux? I guess that's.. possible?
But here is Safari running on a 15 year old Ubuntu release (10.10):
Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Ubuntu/10.10 Chromium/12.0.702.0 Chrome/12.0.702.0 Safari/534.24
Speaking of old, here's Safari again, but this time running on Windows
NT 5.1, AKA Windows XP, released 2001, EOL since 2019:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-CA) AppleWebKit/534.13 (KHTML like Gecko) Chrome/9.0.597.98 Safari/534.13
Really?
Here's Firefox 3.6, released 14 years ago, there were quite a lot of
those:
Mozilla/5.0 (Windows; U; Windows NT 6.1; lt; rv:1.9.2) Gecko/20100115 Firefox/3.6
I remember running those old Firefox releases, those were the days.
But to me, those look like entirely fake UAs, deliberately rotated to
make it look like legitimate traffic.
In comparison, Facebook seemed a bit more legit, in the sense that
they don't fake it. most hits are from:
crawls the web for use cases such as training AI models or improving products by indexing content directly
From what I could tell, it was even respecting our rather liberal
robots.txt rules, in that it wasn't crawling the sprawling /blame/
or /commit/ endpoints, explicitly forbidden by robots.txt.
So I've blocked the Facebook bot in robots.txt and, amazingly, it
just went away. Good job Facebook, as much as I think you've given the
empire to neo-nazis, cause depression and genocide, you know how to
run a crawler, thanks.
Huawei was blocked at the web server level, with a friendly 429 status
code telling people to contact us (over email) if they need help. And
they don't care: they're still hammering the server, from what I can
tell, but then again, I didn't block the entire ASN just yet, just the
blocks I found crawling the server over a couple hours.
A full asncounter run
So what does a day in asncounter look like? Well, you start with a
problem, say you're getting too much traffic and want to see where
it's from. First you need to sample it. Typically, you'd do that with
tcpdump or tailing a log file:
If you really get a lot of traffic, you might want to get a subset
of that to avoid overwhelming asncounter, it's not fast enough to do
multiple gigabit/second, I bet, so here's only incoming SYN IPv4
packets:
tcpdump -q -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)" | asncounter --input-format=tcpdump --repl
In any case, at this point you're staring at a process, just sitting
there. If you passed the --repl or --manhole arguments, you're
lucky: you have a Python shell inside the program. Otherwise, send
SIGHUP to the thing to have it dump the nice tables out:
pkill -HUP asncounter
Here's an example run:
> awk '{print $2}' /var/log/apache2/*access*.log | asncounter
INFO: using datfile ipasn_20250527.1600.dat.gz
INFO: collecting addresses from <stdin>
INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
INFO: finished reading data
INFO: loading /home/anarcat/.cache/pyasn/asnames.json
count percent ASN AS
12779 69.33 66496 SAMPLE, CA
3361 18.23 None None
366 1.99 66497 EXAMPLE, FR
337 1.83 16276 OVH, FR
321 1.74 8075 MICROSOFT-CORP-MSN-AS-BLOCK, US
309 1.68 14061 DIGITALOCEAN-ASN, US
128 0.69 16509 AMAZON-02, US
77 0.42 48090 DMZHOST, GB
56 0.3 136907 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
53 0.29 17621 CNCGROUP-SH China Unicom Shanghai network, CN
total: 18433
count percent prefix ASN AS
12779 69.33 192.0.2.0/24 66496 SAMPLE, CA
3361 18.23 None
298 1.62 178.128.208.0/20 14061 DIGITALOCEAN-ASN, US
289 1.57 51.222.0.0/16 16276 OVH, FR
272 1.48 2001:DB8::/48 66497 EXAMPLE, FR
235 1.27 172.160.0.0/11 8075 MICROSOFT-CORP-MSN-AS-BLOCK, US
94 0.51 2001:DB8:1::/48 66497 EXAMPLE, FR
72 0.39 47.128.0.0/14 16509 AMAZON-02, US
69 0.37 93.123.109.0/24 48090 DMZHOST, GB
53 0.29 27.115.124.0/24 17621 CNCGROUP-SH China Unicom Shanghai network, CN
Those numbers are actually from my home network, not GitLab. Over
there, the battle still rages on, but at least the vampire bots are
banging their heads against the solid Nginx wall instead of eating the
fragile heart of GitLab. We had a significant improvement in latency
thanks to the Facebook and Huawei blocks... Here are the "workhorse
request duration stats" for various time ranges, 20h after the block:
range
mean
max
stdev
20h
449ms
958ms
39ms
7d
1.78s
5m
14.9s
30d
2.08s
3.86m
8.86s
6m
901ms
27.3s
2.43s
We went from two seconds mean to 500ms! And look at that standard deviation!
39ms! It was ten seconds before! I doubt we'll keep it that way very
long but for now, it feels like I won a battle, and I didn't even have
to setup anubis or go-away, although I suspect that will
unfortunately come.
Note that asncounter also supports exporting Prometheus metrics, but
you should be careful with this, as it can lead to cardinal explosion,
especially if you track by prefix (which can be disabled with
--no-prefixes`.
Folks interested in more details should read the fine manual for
more examples, usage, and discussion. It shows, among other things,
how to effectively block lots of networks from Nginx, aggregate
multiple prefixes, block entire ASNs, and more!
So there you have it: I now have the tool I wish I had 20 years
ago. Hopefully it will stay useful for another 20 years, although I'm
not sure we'll have still have internet in 20
years.
I welcome constructive feedback, "oh no you rewrote X", Grafana
dashboards, bug reports, pull requests, and "hell yeah"
comments. Hacker News, let it rip, I know you can give me another
juicy quote for my blog.
This work was done as part of my paid work for the Tor Project,
currently in a fundraising drive, give us money if you like what you
read.
Another short status update of what happened on my side last
month. Larger blocks besides the Phosh 0.47 release are on screen
keyboard and cell broadcast improvements, work on separate volume
streams, the switch of phoc to wlroots 0.19.0 and effort to make
Phosh work on Debian's upcoming stable release (Trixie) out of the
box. Trixie will ship with Phosh 0.46, if you want to try out 0.47
you can fetch it from Debian's experimental suite.
Standardize audio stream roles (MR). Otherwise we'll have a hard time
with e.g. WirePlumbers role based policy linking as apps might use all kinds of types.
Reviews
This is not code by me but reviews on other peoples code. The list is
(as usual) slightly incomplete. Thanks for the contributions!
Welcome to post 48 in the R4 series, and to
video 8 in this series.
Last week I had the honour of giving the opening talk at the 11eme Rencontres R at the Université de Mons in Belgium as
an invited plenary talk. Big thanks again to Philippe Grosjean and Kathy Huet for the
invitation, and for organising a lovely conference.
Being the opening talk, we were still sorting out projector issues
when I started so I forgot to set a timer, and consequently ran out of
time like a newbie. It occured to me that I could simply re-record the
talk in front of my slides just as I do for my STAT 447 students. So I sat down this
morning and did this, and the video is now online:
RcppDate wraps
the featureful date
library written by Howard
Hinnant for use with R. This header-only modern C++ library has been
in pretty wide-spread use for a while now, and adds to C++11/C++14/C++17
what will is (with minor modifications) the ‘date’ library in C++20. The
RcppDate package
adds no extra R or C++ code and can therefore be a zero-cost dependency
for any other project; yet a number of other projects decided to
re-vendor it resulting in less-efficient duplication. Oh well. C’est
la vie.
This release syncs with upstream release 3.0.4 made yesterday which
contains a few PRs (including one by us) for
the clang++-20 changes some of which we already had in release
0.0.5. We also made a routine update to the continuous
integration.
Debian 13 "Trixie" full freeze has started 2025-05-17, so this is
a good time to take a look at some of the features, that this release
will bring. Here we will focus on packages related to XMPP, a.k.a.
Jabber.
XMPP is a universal communication protocol for instant messaging, push
notifications, IoT, WebRTC, and social applications. It has existed since
1999, originally called "Jabber", it has a diverse and active developers
community.
Clients
Dino, a modern XMPP client has been upgraded from 0.4.2 to
0.5.0
Dino now uses OMEMO encryption by default. It also supports
XEP-0447: Stateless File Sharing for unencrypted file
transfers. Users can now see preview images or other file details
before downloading the file. Multiple widgets are redesigned to be
compatible with mobile devices, e.g. running Mobian.
Kaidan, a simple and user-friendly Jabber/XMPP client is
upgraded from 0.8.0 to 0.12.2
Kaidan supports end-to-end encryption via OMEMO 2, Automatic Trust
Management and XMPP Providers. It has been migrated
to QT 6 and many features have been added: XEP-0444: Message
Reactions, XEP-0461: Message Replies,
chat pinning, inline audio player, chat list filtering, local
message removal, etc.
Libervia is upgraded from 0.9.0~hg3993 to
0.9.0~hg4352
Among other features, it now also contains a gateway to ActivityPub,
e.g. to Mastodon.
Poezio, a console based XMPP client as been updated from 0.14
to 0.15.0
Better self-ping support. Use the system CA store by default.
Profanity, a console based XMPP client has been
upgraded from 0.13.1 to 0.15.0.
Add support for XEP-0054: vcard-temp, Improve MAM
support, show encryption for messages from history and handle alt+enter
as newline char.
Psi+, a QT based XMPP client (basic version) has been
upgraded from 1.4.554 to 1.4.1456
Prosŏdy, a lightweight extensible XMPP server has been
upgraded from 0.12.3 to 13.0.1
Admins can disable and enable accounts as needed. A new
role and permissions framework. Storage and performance improvements.
libstrophe, an XMPP library in C has been upgraded from 0.12.2 to
0.14.0
It now supports XEP-0138: Stream Compression and
adds various modern SCRAM mechanisms.
omemo-dr, an OMEMO library used by Gajim is now in
Debian, in version 1.0.1
python-nbxmpp, a non blocking Jabber/XMPP Python 3 library, upgrade
from 4.2.2 to 6.1.1
python-oldmemo, a python-omemo backend for OMEMO 1, 1.0.3 to 1.1.0
python-omemo, a Python 3 implementation of the OMEMO protocol, 1.0.2
to 1.2.0
python-twomemo, a python-omemo backend for OMEMO 2, 1.0.3 to 1.1.0
strophejs, a library for writing XMPP clients has been upgraded from
1.2.14 to 3.1.0
Gateways/Transports
Biboumi, a gateway between XMPP and IRC, upgrades from
9.0 to 9.0+20241124.
Debian 13 Trixie includes Slidge 0.2.12 and
Matridge 0.2.3 for the first time! It is a
gateway between XMPP and Matrix, with support for many chat
features.
Not in Trixie
Spectrum 2, a gateway from XMPP to various other
messaging systems, did not make it into Debian 13, because it
depends on Swift, which has release critical bugs and
therefore cannot be part of a stable release.
I’ve been part of the Debian Project since 2019, when I attended DebConf held in Curitiba, Brazil. That event sparked my interest in the community, packaging, and how Debian works as a distribution.
In the early years of my involvement, I contributed to various teams such as the Python, Golang and Cloud teams, packaging dependencies and maintaining various tools. However, I soon felt the need to focus on packaging software I truly enjoyed, tools I was passionate about using and maintaining.
That’s when I turned my attention to Kubernetes within Debian.
A Broken Ecosystem
The Kubernetes packaging situation in Debian had been problematic for some time. Given its large codebase and complex dependency tree, the initial packaging approach involved vendorizing all dependencies. While this allowed a somewhat functional package to be published, it introduced several long-term issues, especially security concerns.
Vendorized packages bundle third-party dependencies directly into the source tarball. When vulnerabilities arise in those dependencies, it becomes difficult for Debian’s security team to patch and rebuild affected packages system-wide. This approach broke Debian’s best practices, and it eventually led to the abandonment of the Kubernetes source package, which had stalled at version 1.20.5.
Due to this abandonment, critical bugs emerged and the package was removed from Debian’s testing channel, as we can see in the package tracker.
New Debian Kubernetes Team
Around this time, I became a Debian Maintainer (DM), with permissions to upload certain packages. I saw an opportunity to both contribute more deeply to Debian and to fix Kubernetes packaging.
In early 2024, just before DebConf Busan in South Korea, I founded the Debian Kubernetes Team. The mission of the team was to repackage Kubernetes in a maintainable, security-conscious, and Debian-compliant way. At DebConf, I shared our progress with the broader community and received great feedback and more visibility, along with people interested in contributing to the team.
Our first tasks was to migrate existing Kubernetes-related tools such as kubectx, kubernetes-split-yaml and kubetail into a dedicated namespace on Salsa, Debian’s GitLab instance.
Many of these tools were stored across different teams (like the Go team), and consolidating them helped us organize development and focus our efforts.
De-vendorizing Kubernetes
Our main goal was to un-vendorize Kubernetes and bring it up-to-date with upstream releases.
This meant:
Removing the vendor directory and all embedded third-party code.
Trimming the build scope to focus solely on building kubectl, Kubernetes’ CLI.
Using Files-Excluded in debian/copyright to cleanly drop unneeded files during source imports.
Rebuilding the dependency tree, ensuring all Go modules were separately packaged in Debian.
We used uscan, a standard Debian packaging tool that fetches upstream tarballs and prepares them accordingly. The Files-Excluded directive in our debian/copyright file instructed uscan to automatically remove unnecessary files during the repackaging process:
$ uscan
Newest version of kubernetes on remote site is 1.32.3, specified download version is 1.32.3
Successfully repacked ../v1.32.3 as ../kubernetes_1.32.3+ds.orig.tar.gz, deleting 30616 files from it.
The results were dramatic. By comparing the original upstream tarball with our repackaged version, we can see that our approach reduced the tarball size by over 75%:
This significant reduction wasn’t just about saving space. By removing over 30,000 files, we simplified the package, making it more maintainable. Each dependency could now be properly tracked, updated, and patched independently, resolving the security concerns that had plagued the previous packaging approach.
Dependency Graph
To give you an idea of the complexity involved in packaging Kubernetes for Debian, the image below is a dependency graph generated with debtree, visualizing all the Go modules and other dependencies required to build the kubectl binary.
This web of nodes and edges represents every module and its relationship during the compilation process of kubectl. Each box is a Debian package, and the lines connecting them show how deeply intertwined the ecosystem is. What might look like a mess of blue spaghetti is actually a clear demonstration of the vast and interconnected upstream world that tools like kubectl rely on.
But more importantly, this graph is a testament to the effort that went into making kubectl build entirely using Debian-packaged dependencies only, no vendoring, no downloading from the internet, no proprietary blobs.
Upstream Version 1.32.3 and Beyond
After nearly two years of work, we successfully uploaded version 1.32.3+ds of kubectl to Debian unstable.
Zsh, Fish, and Bash completions installed automatically
Man pages and metadata for improved discoverability
Full integration with kind and docker for testing purposes
Integration Testing with Autopkgtest
To ensure the reliability of kubectl in real-world scenarios, we developed a new autopkgtest suite that runs integration tests using real Kubernetes clusters created via Kind.
Autopkgtest is a Debian tool used to run automated tests on binary packages. These tests are executed after the package is built but before it’s accepted into the Debian archive, helping catch regressions and integration issues early in the packaging pipeline.
Our test workflow validates kubectl by performing the following steps:
Installing Kind and Docker as test dependencies.
Spinning up two local Kubernetes clusters.
Switching between cluster contexts to ensure multi-cluster support.
Deploying and scaling a sample nginx application using kubectl.
Cleaning up the entire test environment to avoid side effects.
To measure real-world usage, we rely on data from Debian’s popularity contest (popcon), which gives insight into how many users have each binary installed.
Here’s what the data tells us:
kubectl (new binary): Already installed on 2,124 systems.
golang-k8s-kubectl-dev: This is the Go development package (a library), useful for other packages and developers who want to interact with Kubernetes programmatically.
kubernetes-client: The legacy package that kubectl is replacing. We expect this number to decrease in future releases as more systems transition to the new package.
Although the popcon data shows activity for kubectl before the official Debian upload date, it’s important to note that those numbers represent users who had it installed from upstream source-lists, not from the Debian repositories. This distinction underscores a demand that existed even before the package was available in Debian proper, and it validates the importance of bringing it into the archive.
Also worth mentioning: this number is not the real total number of installations, since users can choose not to participate in the popularity contest. So the actual adoption is likely higher than what popcon reflects.
Community and Documentation
The team also maintains a dedicated wiki page which documents:
The next stable release of Debian will ship with kubectl version 1.32.3, built from a clean, de-vendorized source. This version includes nearly all the latest upstream features, and will be the first time in years that Debian users can rely on an up-to-date, policy-compliant kubectl directly from the archive.
By comparing with upstream, our Debian package even delivers more out of the box, including shell completions, which the upstream still requires users to generate manually.
In 2025, the Debian Kubernetes team will continue expanding our packaging efforts for the Kubernetes ecosystem.
Our roadmap includes:
kubelet: The primary node agent that runs on each node. This will enable Debian users to create fully functional Kubernetes nodes without relying on external packages.
kubeadm: A tool for creating Kubernetes clusters. With kubeadm in Debian, users will then be able to bootstrap minimum viable clusters directly from the official repositories.
helm: The package manager for Kubernetes that helps manage applications through Kubernetes YAML files defined as charts.
kompose: A conversion tool that helps users familiar with docker-compose move to Kubernetes by translating Docker Compose files into Kubernetes resources.
Final Thoughts
This journey was only possible thanks to the amazing support of the debian-devel-br community and the collective effort of contributors who stepped up to package missing dependencies, fix bugs, and test new versions.
Special thanks to:
Carlos Henrique Melara (@charles)
Guilherme Puida (@puida)
João Pedro Nobrega (@jnpf)
Lucas Kanashiro (@kanashiro)
Matheus Polkorny (@polkorny)
Samuel Henrique (@samueloph)
Sergio Cipriano (@cipriano)
Sergio Durigan Junior (@sergiodj)
I look forward to continuing this work, bringing more Kubernetes tools into Debian and improving the developer experience for everyone.
I’ve been part of the Debian Project since 2019, when I attended DebConf held in Curitiba, Brazil. That event sparked my interest in the community, packaging, and how Debian works as a distribution.
In the early years of my involvement, I contributed to various teams such as the Python, Golang and Cloud teams, packaging dependencies and maintaining various tools. However, I soon felt the need to focus on packaging software I truly enjoyed, tools I was passionate about using and maintaining.
That’s when I turned my attention to Kubernetes within Debian.
A Broken Ecosystem
The Kubernetes packaging situation in Debian had been problematic for some time. Given its large codebase and complex dependency tree, the initial packaging approach involved vendorizing all dependencies. While this allowed a somewhat functional package to be published, it introduced several long-term issues, especially security concerns.
Vendorized packages bundle third-party dependencies directly into the source tarball. When vulnerabilities arise in those dependencies, it becomes difficult for Debian’s security team to patch and rebuild affected packages system-wide. This approach broke Debian’s best practices, and it eventually led to the abandonment of the Kubernetes source package, which had stalled at version 1.20.5.
Due to this abandonment, critical bugs emerged and the package was removed from Debian’s testing channel, as we can see in the package tracker.
New Debian Kubernetes Team
Around this time, I became a Debian Maintainer (DM), with permissions to upload certain packages. I saw an opportunity to both contribute more deeply to Debian and to fix Kubernetes packaging.
In early 2024, just before DebConf Busan in South Korea, I founded the Debian Kubernetes Team. The mission of the team was to repackage Kubernetes in a maintainable, security-conscious, and Debian-compliant way. At DebConf, I shared our progress with the broader community and received great feedback and more visibility, along with people interested in contributing to the team.
Our first tasks was to migrate existing Kubernetes-related tools such as kubectx, kubernetes-split-yaml and kubetail into a dedicated namespace on Salsa, Debian’s GitLab instance.
Many of these tools were stored across different teams (like the Go team), and consolidating them helped us organize development and focus our efforts.
De-vendorizing Kubernetes
Our main goal was to un-vendorize Kubernetes and bring it up-to-date with upstream releases.
This meant:
Removing the vendor directory and all embedded third-party code.
Trimming the build scope to focus solely on building kubectl, Kubernetes’ CLI.
Using Files-Excluded in debian/copyright to cleanly drop unneeded files during source imports.
Rebuilding the dependency tree, ensuring all Go modules were separately packaged in Debian.
We used uscan, a standard Debian packaging tool that fetches upstream tarballs and prepares them accordingly. The Files-Excluded directive in our debian/copyright file instructed uscan to automatically remove unnecessary files during the repackaging process:
$ uscan
Newest version of kubernetes on remote site is 1.32.3, specified download version is 1.32.3
Successfully repacked ../v1.32.3 as ../kubernetes_1.32.3+ds.orig.tar.gz, deleting 30616 files from it.
The results were dramatic. By comparing the original upstream tarball with our repackaged version, we can see that our approach reduced the tarball size by over 75%:
This significant reduction wasn’t just about saving space. By removing over 30,000 files, we simplified the package, making it more maintainable. Each dependency could now be properly tracked, updated, and patched independently, resolving the security concerns that had plagued the previous packaging approach.
Dependency Graph
To give you an idea of the complexity involved in packaging Kubernetes for Debian, the image below is a dependency graph generated with debtree, visualizing all the Go modules and other dependencies required to build the kubectl binary.
This web of nodes and edges represents every module and its relationship during the compilation process of kubectl. Each box is a Debian package, and the lines connecting them show how deeply intertwined the ecosystem is. What might look like a mess of blue spaghetti is actually a clear demonstration of the vast and interconnected upstream world that tools like kubectl rely on.
But more importantly, this graph is a testament to the effort that went into making kubectl build entirely using Debian-packaged dependencies only, no vendoring, no downloading from the internet, no proprietary blobs.
Upstream Version 1.32.3 and Beyond
After nearly two years of work, we successfully uploaded version 1.32.3+ds of kubectl to Debian unstable.
Zsh, Fish, and Bash completions installed automatically
Man pages and metadata for improved discoverability
Full integration with kind and docker for testing purposes
Integration Testing with Autopkgtest
To ensure the reliability of kubectl in real-world scenarios, we developed a new autopkgtest suite that runs integration tests using real Kubernetes clusters created via Kind.
Autopkgtest is a Debian tool used to run automated tests on binary packages. These tests are executed after the package is built but before it’s accepted into the Debian archive, helping catch regressions and integration issues early in the packaging pipeline.
Our test workflow validates kubectl by performing the following steps:
Installing Kind and Docker as test dependencies.
Spinning up two local Kubernetes clusters.
Switching between cluster contexts to ensure multi-cluster support.
Deploying and scaling a sample nginx application using kubectl.
Cleaning up the entire test environment to avoid side effects.
To measure real-world usage, we rely on data from Debian’s popularity contest (popcon), which gives insight into how many users have each binary installed.
Here’s what the data tells us:
kubectl (new binary): Already installed on 2,124 systems.
golang-k8s-kubectl-dev: This is the Go development package (a library), useful for other packages and developers who want to interact with Kubernetes programmatically.
kubernetes-client: The legacy package that kubectl is replacing. We expect this number to decrease in future releases as more systems transition to the new package.
Although the popcon data shows activity for kubectl before the official Debian upload date, it’s important to note that those numbers represent users who had it installed from upstream source-lists, not from the Debian repositories. This distinction underscores a demand that existed even before the package was available in Debian proper, and it validates the importance of bringing it into the archive.
Also worth mentioning: this number is not the real total number of installations, since users can choose not to participate in the popularity contest. So the actual adoption is likely higher than what popcon reflects.
Community and Documentation
The team also maintains a dedicated wiki page which documents:
The next stable release of Debian will ship with kubectl version 1.32.3, built from a clean, de-vendorized source. This version includes nearly all the latest upstream features, and will be the first time in years that Debian users can rely on an up-to-date, policy-compliant kubectl directly from the archive.
By comparing with upstream, our Debian package even delivers more out of the box, including shell completions, which the upstream still requires users to generate manually.
In 2025, the Debian Kubernetes team will continue expanding our packaging efforts for the Kubernetes ecosystem.
Our roadmap includes:
kubelet: The primary node agent that runs on each node. This will enable Debian users to create fully functional Kubernetes nodes without relying on external packages.
kubeadm: A tool for creating Kubernetes clusters. With kubeadm in Debian, users will then be able to bootstrap minimum viable clusters directly from the official repositories.
helm: The package manager for Kubernetes that helps manage applications through Kubernetes YAML files defined as charts.
kompose: A conversion tool that helps users familiar with docker-compose move to Kubernetes by translating Docker Compose files into Kubernetes resources.
Final Thoughts
This journey was only possible thanks to the amazing support of the debian-devel-br community and the collective effort of contributors who stepped up to package missing dependencies, fix bugs, and test new versions.
Special thanks to:
Carlos Henrique Melara (@charles)
Guilherme Puida (@puida)
João Pedro Nobrega (@jnpf)
Lucas Kanashiro (@kanashiro)
Matheus Polkorny (@polkorny)
Samuel Henrique (@samueloph)
Sergio Cipriano (@cipriano)
Sergio Durigan Junior (@sergiodj)
I look forward to continuing this work, bringing more Kubernetes tools into Debian and improving the developer experience for everyone.
I've been working on a multi-label email classification model.
It's been a frustrating slog, fraught with challenges, including
a lack of training data. Labeling emails is labor-intensive and
error-prone. Also, I habitually delete certain classes of email
immediately after its usefulness has been reduced. I use a
CRM-114-based spam filtering system (actually I use two
different isntances of the same mailreaver config, but that's
another story), which is differently frustrating, but I
delete spam when it's detected or when it's trained.
Fortunately, there's no shortage of incoming spam, so I can
collect enough, but for other, arguably more important labels,
they arrive infrequently. So, those labels need to be excluded,
or the small sample sizes wreck the training feedback loop.
Currently, I have ten active labels, and even though the point
of this is not to be a spam filter, “spam” is one of the labels.
Out of curiosity, I decided to compare the performance of
my three different models, and to do so on a neutral corpus
(in other words, emails that none of them had ever been
trained on). I grabbed the full TREC 2007 corpus and ran
inference. The results were unexpected in many ways. For
example, the Pearson correlation coefficient between my
older CRM-114 model and my newer CRM-114 was only about
0.78.
I was even more surprised by how poorly all three performed.
Were they overfit to my email? So, I decided to look at
the TREC corpus for the first time, and lo and behold, the
first spam-labeled email I checked was something I would
definitely train all three models with as non-spam, but
ham for CRM-114 and an entirely different label for my
experimental model.
I've been refreshing myself on the low-level guts of Linux
container technology. Here's some notes on mount namespaces.
In the below examples, I will use more than one root shell
simultaneously. To disambiguate them, the examples will feature
a numbered shell prompt: 1# for the first shell, and 2# for
the second.
Preliminaries
Namespaces are normally associated with processes and are
removed when the last associated process terminates. To make
them persistent, you have to bind-mount the corresponding
virtual file from an associated processes's entry in /proc,
to another path1.
The receiving path needs to have its "propogation" property set to "private".
Most likely your system's existing mounts are mostly "public". You can check
the propogation setting for mounts with
1# findmnt -o+PROPAGATION
We'll create a new directory to hold mount namespaces we create,
and set its Propagation to private, via a bind-mount of itself
to itself.
1# mkdir /root/mntns
1# mount --bind --make-private /root/mntns /root/mntns
The namespace itself needs to be bind-mounted over a file rather
than a directory, so we'll create one.
1# touch /root/mntns/1
Creating and persisting a new mount namespace
1# unshare --mount=/root/mntns/1
We are now 'inside' the new namespace in a new shell process.
We'll change the shell prompt to make this clearer
PS1='inside# '
We can make a filesystem change, such as mounting a tmpfs
inside# mount -t tmpfs /mnt /mnt
inside# touch /mnt/hi-there
And observe it is not visible outside that namespace
2# findmnt /mnt
2# stat /mnt/hi-there
stat: cannot statx '/mnt/hi-there': No such file or directory
Back to the namespace shell, we can find an integer identifier for
the namespace via the shell processes /proc entry:
inside# readlink /proc/$$/ns/mnt
It will be something like mnt:[4026533646].
From another shell, we can list namespaces and see that it
exists:
2# lsns -t mnt
NS TYPE NPROCS PID USER COMMAND
…
4026533646 mnt 1 52525 root -bash
If we exit the shell that unshare created,
inside# exit
running lsns again should2 still list the namespace,
albeit with the NPROCS column now reading 0.
2# lsns -t mnt
We can see that a virtual filesystem of type nsfs is mounted at
the path we selected when we ran unshare:
As a small addendum to the last post, here are the relevant
commands #debci helpfully provided.
First, you need to install the autopkgtest package,
obviously:
# apt install autopkgtest
Then you need to create a Debian virtual machine to run the
tests (put the sid.raw wherever you prefer):
# autopkgtest-build-qemu sid /tmp/sid.raw
Then you can run the tests themselves, using the just created
virtual machine. The autopkgtest command can use the tests from
various sources, using the last argument to the command. In my case
what was the most helpful was to run the tests from my git clone
(which uses gbp) so I could edit the tests directly. So I didn't
give anything for testsrc (but
. would work as well I guess).
We are very excited to announce that Debian has selected nine contributors to
work under mentorship on a variety of
projects with us during the
Google Summer of Code.
Here is a list of the projects and students, along with details of the tasks to
be performed.
Deliverables of the project: Continuous integration tests for Debian Med
applications lacking a test, Quality Assurance review and bug fixing if issues
might be uncovered.
Deliverables of the project: Analysis and discussion of the current
state of device tweaks management in Debian and Mobian. Proposal for a
unified, run-time approach. Packaging of this service and tweaks
data/configuration for at least one device.
Deliverables of the project: New Debian packages with GPU
support. Enhanced GPU support within existing Debian packages.
More autopackagetests running on the Debian ROCm CI.
Deliverables of the project: Refreshing the set of daily-built
images. Having the set of daily-built images become automatic
again—that is, go back to the promise of having it daily-built.
Write an Ansible playbook/Chef recipe/Puppet whatsitsname to define a
virtual serve and have it build daily. Do the (very basic!) hardware
testing on several Raspberry computers. Do note, naturally, this will
require having access to the relevant hardware.
Deliverables of the project: Eventually I hope we can make vLLM into
Debian archive, based on which we can deliver something for LLM
inference out-of-the-box. If the amount of work eventually turns to be
beyond my expectation, I'm still happy to see how far we can go
towards this goal. If the amount of work required for vLLM is less
than I expected, we can also look at something else like SGLang,
another open source LLM inference library.
Congratulations and welcome to all the contributors!
The Google Summer of Code program is possible in Debian thanks to the efforts of
Debian Developers and Debian Contributors that dedicate part of their free time
to mentor contributors and outreach tasks.
Join us and help extend Debian! You can follow the contributors' weekly reports
on the debian-outreach mailing-list, chat with us on our
IRC channel or reach out to the individual projects' team
mailing lists.
Each year on August the 16th, we celebrate the Debian Project Anniversary.
Several communities around the world join us in celebrating "Debian Day" with
local events, parties, or gatherings.
So, how about celebrating the 32nd anniversary of the Debian Project in 2025 in
your city? As the 16th of August falls on a Saturday this year, we believe it
is great timing to gather people around your event.
We invite you and your local community to organize a Debian Day by hosting an
event with talks, workshops, a
bug squashing party, or
OpenPGP keysigning gathering, etc.
You could also hold a meeting with others in the Debian community in a smaller
social setting like a bar/pizzeria/cafeteria/restaurant to celebrate. In other
words, any type of celebrating is valid!
One of Pope Francis' last activities before he passed away
was a visit to the Coeli prison in Rome. It reminded me about one
of our own prisons in Australia, the prison where I was baptised.
After all the falsification of
police rumors by rogue Debianists, and the case of the
arrested Outreachies, the prison story is a
curious twist of the truth.
Here is the main gate of Pentridge prison. The church is
in the background at the end of the prison wall:
The Pope presides over St Peter's basilica in Rome. In Coburg,
Australia, we have St Paul's church. Rome also has the
Basilica of Saint Paul Outside the Walls, just as St Paul's is
outside the walls of Pentridge.
Back in 1967, Ronald Ryan
gained notoriety as the last man to hang in Australia. His crime
was the murder of a prison guard while escaping from Melbourne's
Pentridge Prison. He maintained he was innocent and there was some
controversy over who fired the fatal shot.
Ryan's wikipedia page has a detailed description of the prison escape,
describing the fatal incident at the intersection of Sydney Road, O'Hea Street
and Champ Street.
St Paul's church is mentioned, Ryan's accomplice used a wall for
shelter.
Walker went south across Church Street toward the adjacent
Roman Catholic church in Sydney Road. Prison officer Bennett
had his rifle aimed at Walker and ordered Walker to halt or he
would shoot. Walker took cover behind a small wall that bordered
the church.
The report goes on to the murder itself in the middle of
this well known street.
George Hodson fell to the ground. He had been struck by a single bullet
that exited through Hodson's back, about an inch lower than the
point of entry in his right chest. Hodson died in the middle of
Sydney Road. Warder Robert Paterson, now with a rifle, ran back
outside and onto Champ Street.
On 30 March 1966, Ryan and his accomplice Walker were convicted
of murder and manslaughter respectively. Their appeals were
rejected in June 1966.
On 23 July 1966, shortly after Ryan's trial and appeal had both
failed, Fr Sean Patrick O'Connell was ordained a priest at
St Patrick's Cathedral, oblivious to the fact he would eventually
have a "life sentence", if you could call it that, to occupy the church
beside the gates of the prison.
Fr John Brosnan, a Jesuit, was the prison chaplain for 30 years from the
1950s to the 1980s. His work put him in touch with the prisoners,
the guards and their respective families. He ran a high profile
campaign to spare Ryan from the death penalty.
(obituary of Fr Brosnan).
My father had already been living in Coburg prior to the arrival
of Fr O'Connell. They knew each other throughout the entire period
of forty years that Fr O'Connell served the parish.
Fr Sean O'Connell served brief periods in the parishes of Flemington,
Werribee and Clifton Hill. In 1975 he became Assistant Priest for
the Coburg parish and in 1979 he was appointed as Parish Priest.
In other words, Fr O'Connell arrived shortly before Fr Brosnan
would finish his three decades of chaplaincy service on the other side of the
adjacent prison wall.
The long and distinguished service of these men is the thing that
really amplifies the sense of shock people feel about the wrongdoing
of some among their peers. The priests known for wrongdoing had
been moved from parish to parish every two or three years while
Fr O'Connell and Fr Brosnan both had decades of service in the same
locations.
In 1980, Bob Hawke was elected as the representative for Wills,
the federal district enclosing Coburg. On 8 February 1983,
Hawke became leader of the Labor Party and in March 1983,
he became Prime Minister, holding onto the top job until December 1991.
Hawke was not religious, nonetheless, he is widely remembered for
his 1987 election promise that within three years,
no Australian child will live in poverty.
Nonetheless, Hawke himself didn't live in the working class district
of Coburg, he had a large house on the other side of Melbourne
in Sandringham. Australia's Prime Minister has an official residence in
Canberra,
The Lodge and in Sydney,
Kirribilli House
.
Hawke's father was a Congregational minister but Hawke himself
was an atheist. News reports suggest Hawke
contemplated becoming a Catholic before his death. Is it possible the
influence of Fr O'Connell had a subconscious impact on the former Prime
Minister's thinking over the years?
I was born in the region and baptised right beside the prison at
St Paul's church in December 1978.
In Switzerland, Italian is the official language for one of
the 26 cantons, the Canton of Ticino. Around eight percent of
the Swiss population speak Italian. In Coburg, fifteen percent
speak Italian, yet it is not an official language in any part
of Australia. Fr O'Connell is well known for learning Italian
and giving ministry to the Italian community.
In this photo from a festival, the procession is walking between
the walls of the prison (left and rear of the photo) and the church
(right hand side of the photo).
On 17 June 1980, Maria James was brutally murdered at a bookshop
where she lived about fifty meters from St Mary's, the church in
Thornbury, a district adjacent to Coburg. A witness
claimed they saw Fr Anthony Bongiorno covered in blood.
Fr O'Connell provided an alibi, which police verified through
other means, proving that Fr Bongiorno was actually in Coburg on the
day of the murder. The crime remains unsolved.
In November 1982, gangland figure Brian Kane asked Father John
Brosnan to preside at his eventual funeral. A week later and Kane was
shot in the Quarry Hotel, Brunswick. Fr Brosnan described the request
from Kane in a news report:
For the prisoners, Fr Brosnan was like a stable family
that some of them never had before.
Likewise, Fr O'Connell's 40 years in Coburg gave him the
status of a family member for many of those who got to know
him over the decades.
Here is a photo of Father O'Connell with students from year 3
and their teacher Miss Keogh in 1985:
I never attended the school in Coburg. I did year 3 at St Patrick's
in Kilmore. St Patrick's primary school is on the opposite side of the
road from Assumption College, where Fr Brosnan attended high school
himself many years prior.
In 1989, the largest employer in the district, Kodak,
contemplated closing their factory. Prime Minister Hawke wasn't going
to allow that to happen under his nose and the Government made a deal
to keep the factory open. Nonetheless, by 2004, the rise of
digital cameras made the factory obsolete and it closed anyway.
In 1992, when Hawke resigned, there was a byelection for the
district and the winner was prominent local football personality
Phil Cleary running as an independent against the established
Labor party. His victory was a major coup. The rise of Cleary hints at
the special relationship between sport, politics and religion
in Australian society.
In 1996, I moved back to Coburg and for a while we lived in
O'Hea Street, one of the places described in the report about
Ronald Ryan's prison break.
Ronald Ryan's wife and daughters lived in Hawthorn, adjacent to Kew.
When I tell anybody in Melbourne that I used to cycle from
Pentridge to
Xavier College on a daily basis it sounds rather odd.
In 1997, the Virtual Moreland Community Network was established
and opened an office at 512 Sydney Road, also adjacent to the
churches and the notorious prison. Here is a map:
The prison itself was closed on 1 May 1997. Some of the original
heritage listed walls and buildings have been preserved.
Looking through official filings from the Australian Labor Party,
I found the Vice President of the Coburg branch,
an active member of the Upgrade Upfield Coordinating Committee
was at one point living in a house owned by Fr O'Connell on Mackay Street,
Coburg. Was it community activism that saved the train or was it the
power of faith? It could have been a bit of both.
Nonetheless, it is another hint at the relationships between religion,
politics and sport that underpin Australian society.
Fr John Brosnan passed away in 2003. He was given a state funeral
in St Patrick's Cathedral (Eulogy for John Brosnan).
The St Patrick's Cathedral choir became very well known due to
the prosecution of Cardinal George Pell.
von Bidder's death was
discussed like a suicide and given that it happened shortly after
other confirmed suicides, it feels like it was part of a suicide
cluster on the day of our wedding. So I received the sacrament
of baptism meters away from the gates
of a notorious prison known for the murder of a prison guard and
then at the sacrament of marriage,
we had this Debian death that was avoidable and could even be a criminal
act of manslaughter under the British definition of the law.
The day of the baptism was the first Sunday of Advent and the
wedding, when Adrian von Bidder died, was Palm Sunday.
In 2010 I went to Zurich to
work on a contract for UBS. The Kanton told us that we had to
pay mandatory church taxes or we could not attend mass or be buried in a Swiss
cemetery if we died there. This felt totally inconsistent with
everything I had previously learnt about Christianity.
The church tax situation was even more confusing because they demanded
that we give money to the church but they were refusing to cover
the cost of medical services for Carla after somebody fell on her
in a yoga studio.
At the time, I felt there was significant inconsistency between
the manner in which Australian women were marching to support the
white, attractive Irish immigrant Jill Meagher while turning a blind
eye to the manner in which the government rounds up women from
Afghanistan and Iran and puts them into state-sponsored concentration
camps.
16 September 2015, researcher Val Noone gave a presentation about the
Irish in Coburg. The details were
subsequently published in a blog. Fr O'Connell and Michael Laporta
are credited as sources.
Throughout 2016, the Child Abuse Royal Commission conducted
a series of public and private hearings about abuse in the Catholic
Church. Fr O'Connell is not one of those accused of wrongdoing,
quite the opposite, the wrongdoing undermines his legacy. Nonetheless,
Fr O'Connell died shortly after the public scandal, just as my
father died shortly after Cardinal Pell was sent to prison in 2019.
Fr O'Connell's church and presbytery were surrounded on two sides by
very high prison walls. Ironically, after living there for forty years,
he may have only discovered at the same time as everybody else the extent
to which a small group of his colleagues belonged on the other side.
Fr O'Connell's Golden Jubilee as a priest was 23 July 2016.
Four days later, the ABC program 7:30 Report broadcast
a mixed bag of accusations that would subsequently be the basis for
the prosecution of Cardinal Pell.
On 18 December 2016, Fr O'Connell died at the Austin Hospital.
A few days later, on 23 December 2016, his funeral was held
as a Pontifical Requiem mass, in other words, the funeral was
conducted by the bishop.
Coincidentally, Australia's Child Abuse Royal Commission handed
down their report in December 2017, right in the middle of the period
where I had discovered the wrongdoing in open source software.
Rogue Debianists became upset when their blackmail racket was exposed.
They began censoring blogs at the end of 2018 and the Debian Christmas
lynchings quickly followed.
Paul Tagliamonte from the US Digital Service (White House) stomped on people
using metaphors about summary executions:
Subject: Re: Censorship in Debian
Date: Thu, 27 Dec 2018 10:39:19 -0500
From: Paul R. Tagliamonte <paultag@gmail.com>
To: Norbert Preining <norbert@preining.info>
CC: debian-project@lists.debian.org
This entire thread is so cringy, this is likely my last reply.
On Wed, Dec 26, 2018 at 9:31 PM Norbert Preining <norbert@preining.info> wrote:
>
> Paul,
>
> On Wed, 26 Dec 2018, Paul R. Tagliamonte wrote:
> > Please, all, get some perspective and stop with the comparisons to labor
> > camps, targeted killings, prisons and sentences of death. We sound like
>
> You did not understand the meaning of this comparison: The point was
> that the correct agreed upon and legal procedures have not been
> followed. And you deliberately removed this part from your email and
> consideration.
Gulags and military tribunals were both legal. They were not policy or
procedure fouls.
They were not foibles. It was intentional and targeted.
They were ways to murder dissidents. Say what you want about our ability to
self-govern the Debian community, and ways we've messed up, we've never
killed anyone as part of the expulsion process, and the comparisons need to
stop, even if I'm still "missing the point" and people consider what happened
with anti-harassment unfair. A-H is not killing DDs. Stop comparing them to it.
It's a very simple point.
> It is not about the planet, it is about expulsion that did not follow
> the rules. This *can* be consider a libel case due to influences on my
> professional life.
>
> Best
>
> Norbert
Paul
Tagliamonte's comment is wrong: people did die. Frans Pop and
Adrian von Bidder both died shortly after the lynching of Sven Luther.
Frans Pop wrote his suicide note / resignation email the night before
Debian Day. See the
full history of the Debian Harassment Culture. On the topic
of Debian giving volunteers sentences, here are the gallows constructed
to hang Ronald Ryan in D division at Pentridge:
Software in the Public Interest, Inc, a US non-profit,
filed accounts for 2022 showing they spent $120,000 on legal fees
to hide the fact Adrian von Bidder died, possibly as part of the suicide
cluster, on our wedding day. Ironically, the psychology and the legal tactics
used to evade liability for the suicides are remarkably similar to
the tactics that the church was criticized for.
From baptism at the site of death to $120,000 in Debian kill money ...
The church reasoned that they had to hide certain crimes by priests
to maintain the public perception of the church as infallible. Looking
at the lifetime of good work done by men like Fr Brosnan and Fr O'Connell,
their reputations have stood the test of time and their
legacy would not have been diminished in any way if rogue priests
had been managed more competently in the region throughout
the same period.
Even if they spend $120 million dollars, the lawyers and judges can
not bring back the volunteers who died. It is not easy to hide a death,
especially when the Debian logo is on the tombstone, along with the
date of our wedding:
Look at the email from Diana von Bidder-Senn, the widow. She was
completely in the dark about debian-private and all the
problems subsequent to the previous suicide. This is an example of
how the public is fooled by the messages that Paul Tagliamonte and
others were publishing to whitewash over the truth about
Debian harassment culture. Would she have sent an
email like this if she had read and understood all the emails about
Frans Pop in 2010?
Subject: Re: condolences for Adrian
Date: Mon, 25 Apr 2011 15:02:18 +0200
From: Diana von Bidder <diana@fortytwo.ch>
To: Stefano Zacchiroli <leader@debian.org>
Dear Stefano
Thank you for your wonderful mail! Yes Debian and people were very
important to Adrian. I was glad that he was not only sitting alone in
front of his computer but to know that there are people out there that
estimate him and are his friends even if most of you did not know each
other personally.
The way you describe him (empathy, calm, insight, ... - just the Adrian
I know) assures me on how good friends of Adrian are out there. And I
will always continue to think of this (in a good way!) when continuing
to use debian (which I became quite fond of because of Adrian).
It's a pity that he couldn't go to Banja Luca anymore which he did so
much look forward to. Anyway, I wish you all the best and hope you
continue your good work.
- Diana
Shortly after Cardinal Pell died,
I published a photo of our rowing crew. On 3 April 2023, the man sitting
behind me won the National Emergency Medal. The following day, 4 April 2023,
the Swiss financial regulator FINMA discretely shut down Parreaux, Thiebaud & Partners,
leading to my investigation into the
JuristGate scandal.
So I was baptised at the scene of a notorious death connected to
the story of capital punishment in Australia and I went on to
expose another facet of the corruption in the Swiss legal system.
We don't know how many people have committed suicide due to invalid
and corrupt judgments, liquidated lawyers, miscarriages of justice
and other failings by racist Swiss hillbilly jurists. The suicide
victims around Geneva are every bit as dead as George Hodson and
Ronald Ryan.
This is a bug fix and minor feature release over INN 2.7.2, and the
upgrade should be painless. You can download the new release from
ISC or
my personal INN pages. The latter also has
links to the full changelog and the other INN documentation.
Father John Brosnan SJ passed away in 2003 and he was given
a state funeral at St Patrick's Cathedral in Melbourne.
Fr Brosnan was one of the most notable priests in Australia's
Catholic community due to his campaign against the
death penalty and his contact with Ronald Ryan, the last man to hang
in Australia.
Peter Norden AO, then Policy Director for Jesuit Social Services
gave the eulogy. He makes some interesting comments about
Fr Brosnan's philosophy. This is invaluable to our understanding
of the flaws in the
Code of Conduct (CoC) gaslighting phenomena.
‘I was in prison ……. and you visited me’.
This must be the most succinct description of the pubic life of Father John Brosnan.
As Australian of quite remarkable qualities, who spent thirty years ministering to those on the other side of the walls:
The walls of Pentridge Prison, Coburg.
Those thirty years earned Father Brosnan the reputation of being ‘The Knockabout Priest.’
A priest who walked with a dignified and grace-filled presence the corridors of the most notorious prison in recent Australian history.
A pastor who combined Christian compassion and worldly wisdom as he advised and counselled thousands of inmates in their prison cells.
An advocate for human rights and civil liberties who undertook this task with discretion and subtlety and good humour.
A leading opponent of capital punishment, who knew from first hand experience the essential inconsistency of upholding the value of human life, by taking the life of another.
But there was much more to the life of Father John Brosnan than the thirty years he spent ‘in the nick’.
John Brosnan was born on 12 April 1919, at Keilambete, a small town between Terang and Mortlake, in the Western District of Victoria.
He was the third child of four children, the second of three sons, of Jeremiah Joseph Brosnan, a railway fettler, and his wife, Mark Jane, known as Jenny. Jeremiah Brosnan was born in County Kerry, Ireland, and migrated to Australia in 1886.
John Brosnan grew up in the small town of Cudgee, near Warrnambool, with is sister, Mary, present here today, and his brothers, Denis and Jim, both now deceased.
John was educated at Cudgee State School and later at Assumption College, Kilmore.
His early years at Cudgee, he often recalled in later years, growing up largely with Baptist families rather than a Catholic environment, prepared him for later life, where he moved easily in circles outside of the more sheltered Catholic Church network.
He often said that they had discovered ecumenism in Cudgee long before the Second Vatican Council and before it became fashionable!
Young John Brosnan later boarded at Assumption College for four years from the age of fifteen, from 1934-1937. He played one game with the First XVIII of Assumption College, but was carried off with a corkey ten minutes into the first quarter.
Geelong Football Club won the premiership that year in 1937, and his devotion to that other form of religion was well established, even in those days.
Late that evening, young John Brosnan led an enthusiastic celebration march down the main street of Kilmore with fellow students. The Marist Headmaster at the time, Brother Hilary, suggested that it might not have been appropriate for a young man with intentions to join the seminary the following year!
Stopped by people in the street in later years, who began their conversation with: ‘Father, I am not of your faith, but …’, Father Brosnan would interrupt them and say: ‘You mean you don’t follow my beloved Cats?’
Last August, the Geelong Football Club was preparing a public tribute to Father Brosnan, at their last home game, to be played at Colonial Stadium. The tribute was postponed, after Father broke his hip a few weeks before.
Discussing the preparations for this event with the young marketing officer from the club in recent days, I asked him: ‘Do you know who Father Brosnan was?’ He admitted he didn’t. I told him: Father Brosnan was effectively the marketing man for the Geelong Football Club around Australia, before the term ‘marketing’ was even invented!
As a student of Assumption College, young John Brosnan did apply for the seminary, to Bishop Daniel Foley of Ballarat. Many years later, Father Brosnan still remembered the curt letter in reply:
‘Dear Mr Brosnan, we have no vacancies for students for the priesthood in the Diocese of Ballarat. The religious orders are always anxious for suitable candidates.’
His personal and spiritual references from Assumption had been first class, even if his academic achievements were not, and after failing Latin of all subjects in his first year of Matriculation, he repeated the year and was accepted into the Archdiocese of Melbourne by Archbishop Mannix the following year, in 1938.
In 1945, John Brosnan was ordained a priest by Archbishop Mannix, here at Saint Patrick’s Cathedral, at the age of twenty-six.
The next two years he worked in Geelong, as chaplain to the Saint Augustine’s orphanage. Then as assistant priest at Saint Joseph’s Church in Collingwood for two years. Then he was stationed here at Saint Patrick’s Cathedral for a further five years, until his appointment to the position of Chaplain to Pentridge Prison in 1956.
During the years as Assistant Priest here at Saint Patrick’s he came to know and admire deeply Archbishop Mannix. Much of his astute capacity to move so effectively in public life came from the lessons he learned watching and listening to Mannix during those years.
In his biography, Father Brosnan explained the impact that Mannix had on him:
‘Dr Mannix was the only person, man, woman or child, I have known in my life I couldn’t take my eyes off. His every movement was worth watching, his every word worth hearing. I could watch Don Bradman bat, I could watch Reg Hickey or Polly Farmer move on a football field and I could watch Dr Mannix drink his soup! Every movement of the man was worth watching. You realised you were in the presence of greatness.’
When he arrived at Pentridge Prison as Chaplain in 1956, at the age of thirty-five, John Brosnan was both astonished and disturbed to find so many of his former junior football players from the inner-city parishes and from the orphanage at Geelong serving time. Before the psychologists had worked it out, he spoke about ‘kids’ futures being written on their faces before they were born.’
The ten years of priestly ministry before his assignment to Pentridge had prepared Father Brosnan well for his assignment to those sentenced to Her Majesty’s prisons.
His priesthood was one deeply inculturated in the lives of ordinary people. He was as much at home in Hardiman’s Pub, on Flemington racetrack or at the dogs on Monday nights, as he was in the church buildings. But he was always the pastoral man, offering a word of recognition or encouragement when it was most needed.
A man with a big heart for those in real need, offering a generous and practical response when called for. But this was balanced by an honesty and an insight into human behaviour which was hard to parallel: ‘Nurse a mug long enough and he will die in your arms’ was one of his sayings.
His great love of people, his incredible knowledge of family trees, and his memory for names and places, remained with him through to the end. His last thirteen years of ministry after retirement from Pentridge in 1985 were spent in the parishes: firstly, at Glenhuntly, then eleven years as Parish Priest at Holy Redeemer Church in Surrey Hills.
At Glenhuntly, one of his pastoral responsibilities included the care of those who attended the nearby Caulfield Racecourse. At Surrey Hills, his involvement with the local families watching their children progress through primary school was one of his delights. He knew each child by name and would reward many by a little treat at the end of the school day, usually a Mars Bar! Late last year a Year 8 student at Saint Kevin’s College asked me to send his regards to Father Brosnan: ‘Tell him, from the punter.’
But Father Brosnan’s public persona was formed during his thirty years as Chaplain at ‘The College of Knowledge’ in Sydney Road, Coburg.
There were many thousands of people assisted by Father Brosnan’s presence within the walls of Pentridge Prison during those years. When opening a new site for the Brosnan Centre, then in Sydney Road, Brunswick, former Premier John Cain quipped: ‘Father Brosnan worked with a terrible lot of people.’
However, this generous hearted man, with such a wonderful insight into human behaviour, pastored not only to those behind the walls of the prison, but to many thousands of others, in particular their wives, their children and their friends, many of whom could be regarded as victims of crime.
For the first twenty years of his prison ministry, Father Brosnan lived in a little cottage in Abbotsford, provided by the Good Shepherd Sisters. Here a procession of friends and prison acquaintances would visit him after hours, especially on Saturday mornings. Supported in a practical and generous way by the Sisters, Father Brosnan operated one of the then most effective after-care services, from his own residence.
He was pleased to see this early work as the forerunner of the Brosnan Centre established by the Jesuits in 1977, and later named after him, on his retirement from prison ministry in 1985.
In his last ten years as prison chaplain, he lived in a centrally located flats behind the old Saint Vincent’s hospital, provided by the Sisters of Charity. Throughout his working life, he appeared to have just one pair of shoes, one suit, and a sports jacket. What he was given as a gift was generally passed on to someone in need.
Saint Vincent De Paul prison visitors and VACRO, assisting the families of prisoners, were key collaborators in his ministry.
VACRO’s former manager, Matt Derham, used to refer to Father’s ‘old boys association’ as ‘Bros’s menagerie.’
Just as the time with Archbishop Mannix was a formative period in his priestly life, so was his ministry to Ronald Ryan and Ryan’s family. The public campaign against capital punishment with which he was so centrally involved in late 1966 and early 1967, was in one sense a failure.
But Ryan’s last words before his execution, directed to Father Brosnan, tell another story: ‘Never forget, no matter how long you live, you were ordained for me.’
Father Brosnan’s involvement with Ryan was one of the clearest, and certainly the most public, forms of witness he could give to the unconditional love of God.
Many Christian people mistakenly believe that this love must be earned or deserved. Father Brosnan had learned through his own life experience, especially through 30 years of prison ministry, that it is freely given.
It is significant, and a tribute to Father Brosnan’s involvement in the campaign against capital punishment, that Ryan was the last person executed by the State in Australia’s history and that capital punishment has now been removed from the statutes of every State and Territory in this country.
One of the most endearing qualities of John Brosnan was his refusal to sit in judgement on others. When it was suggested that one of his friends had been found to be involved in some form of dubious or illegal activity, ‘so they say’ he would comment.
While traditional in his theological beliefs, he had an enormous pastoral capacity and personal freedom to respond creatively to the circumstances of the person seeking his advice or guidance.
He moved with grace and with dignity across all levels of our society, and was well received by persons of all political persuasions and religious beliefs or ideologies.
The demand for his presence in public forums and as an after-dinner speaker was unbelievable and his capacity for this did not diminish with the years. He was often asked how he survived 30 years in the Nick. He would refer to four ancient documents that were a big help, written by Matthew, Mark, Luke and John. He would also quote words of wisdom from Henry Lawson.
John Brosnan was able to speak on sensitive issues, such as the need for prison reform, in a way that was hard to take offence, even in an entertaining but always respectful manner. Through this means, he was able to help the wider community consider and reflect on the complex issues of crime and punishment.
A notable example was when he was invited by the then Minister for Prisons, Pauline Toner, to join her in addressing an angry crowd of more than a thousand local residents opposed to the construction of Barwon Prison at Lara.
Father Brosnan was, as always, the essence of diplomacy and a builder of bridges between different points of view.
Many people will be affected by the departure of Father John Brosnan: Mary, his sister, the foremost, of course. And the members of Father’s Brosnan’s family.
Throughout this Cathedral today many people, from many different walks of life, will shed a tear as they reflect on the impact that this remarkable priest has had on their lives.
It may have been a quiet word of encouragement at a time of personal crisis. Or a contact made that led to a job opportunity or a decent place to live. Or his presence in court, when it seemed little could be said on one’s behalf. Or a quiet word of advice to a politician or public servant.
This legacy of Father Brosnan will live on in the centre that bears his name: The Brosnan Centre.
But what we will miss most of all is his friendship.
I can just her John Brosnan ask the question, at the pearly gates, with some wonderment:
‘Lord, when did I see you hungry, and feed you; or thirsty and give you drink? When did I see you a stranger and make you welcome; sick or in prison and go to see you/’
And the Lord will answer him:
‘I tell you solemnly, in so far as you did this to one of the least of these brothers or sisters of mine, you did it to me.’
Father John Brosnan, a faith-filled life that brought hope and encouragement where it was most needed.
A life of respectful and committed service, with much to say to our divided world at the present time. Father Brosnan, we thank you!
Blocking comment spammers on an Ikiwiki blog
Despite comments on my ikiwiki blog being fully moderated, spammers have been increasingly posting link spam comments on my blog. While I used to use the blogspam plugin, the underlying service was likely retired circa 2017 and its public repositories are all archived.
It turns out that there is a relatively simple way to drastically reduce the amount of spam submitted to the moderation queue: ban the datacentre IP addresses that spammers are using.
Looking up AS numbers
It all starts by looking at the IP address of a submitted comment:
From there, we can look it up using
whois
:The important bit here is this line:
which referts to Autonomous System 207408, owned by a hosting company in Germany called Servinga.
Alternatively, you can use this WHOIS server with much better output:
Looking up IP blocks
Autonomous Systems are essentially organizations to which IPv4 and IPv6 blocks have been allocated.
These allocations can be looked up easily on the command line either using a third-party service:
or a local database downloaded from IPtoASN.
This is what I ended up with in the case of Servinga:
Preventing comment submission
While I do want to eliminate this source of spam, I don't want to block these datacentre IP addresses outright since legitimate users could be using these servers as VPN endpoints or crawlers.
I therefore added the following to my Apache config to restrict the CGI endpoint (used only for write operations such as commenting):
and then put the following in
/etc/apache2/spammers.include
:Finally, I can restart the website and commit my changes:
Future improvements
I will likely automate this process in the future, but at the moment my blog can go for a week without a single spam message (down from dozens every day). It's possible that I've already cut off the worst offenders.
I have published the list I am currently using.
04 June, 2025 08:28PM