Updated my timezone tool.
08 July, 2025 01:56AM by Junichi Uekawa
planet: Debian Social Contract point #3: we will not hide problems
Debian is a trademark of Software in the Public Interest, Inc. This site is operated independently in the spirit of point three of the Debian Social Contract which tells us We will not hide problems.
08 July, 2025 01:56AM by Junichi Uekawa
This was my hundred-thirty-second month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on:
This month I also did a week of FD duties and attended the monthly LTS/ELTS meeting.
This month was the eighty-third ELTS month. During my allocated time I uploaded or worked on:
This month I also did a week of FD duties and attended the monthly LTS/ELTS meeting.
This month I uploaded bugfix versions of:
Thanks a lot again to the Release Team who quickly handled all my unblock bugs!
This work is generously funded by Freexian!
This month I uploaded bugfix versions of:
Unfortunately I didn’t found any time to work on this topic.
This month I uploaded bugfix versions of:
Unfortunately I stumbled over a discussion about RFPs. One part of those involved wanted to automatically close older RFPs, the other part just wanted to keep them. But nobody suggested to really take care of those RFPs. Why is it easier to spend time on talking about something instead of solving the real problem? Anyway, I had a look at those open RFPs. Some of them can be just closed because they haven’t been closed when uploading the corresponding package. For some others the corresponding software has not seen any upstream activity for several years and depends on older software no longer in Debian (like Python 2). Such bugs can be just closed. Some requested software only works together with long gone technology (for example the open Twitter API). Such bugs can be just closed. Last but not least, even the old RFPs contain nice software, that is still maintained upstream and useful. One example is ta-lib that I uploaded in June. So, please, let’s put our money where out mouths are. My diary of closed RFP bugs is on people.d.o. If only ten people follow suit, all bugs can be closed within a year.
It is still this time of the year when just a few packages arrive in NEW: it is Hard Freeze. So please don’t hold it against me that I enjoy the sun more than processing packages in NEW. This month I accepted 104 and rejected 13 packages. The overall number of packages that got accepted was 105.
07 July, 2025 09:40AM by alteholz
Dear Debian community,
This is bits from the DPL for June.
In June there was an extended discussion about the ongoing challenges around mentoring newcomers in Debian. As many of you know, this is a topic I’ve cared about deeply--long before becoming DPL. In my view, the issue isn’t just a matter of lacking tools or needing to “try harder” to attract contributors. Anyone who followed the discussion will likely agree that it’s more complex than that.
I sometimes wonder whether Debian’s success contributes to the problem. From the outside, things may appear to “just work”, which can lead to the impression: “Debian is doing fine without me--they clearly have everything under control.” But that overlooks how much volunteer effort it takes to keep the project running smoothly.
We should make it clearer that help is always needed--not only in packaging, but also in writing technical documentation, designing web pages, reaching out to upstreams about license issues, finding sponsors, or organising events. (Speaking from experience, I would have appreciated help in patiently explaining Free Software benefits to upstream authors.) Sometimes we think too narrowly about what newcomers can do, and also about which tasks could be offloaded from overcommitted contributors.
In fact, one of the most valuable things a newcomer can contribute is better documentation. Those of us who’ve been around for years may be too used to how things work--or make assumptions about what others already know. A person who just joined the project is often in the best position to document what’s confusing, what’s missing, and what they wish they had known sooner.
In that sense, the recent "random new contributor’s experience" posts might be a useful starting point for further reflection. I think we can learn a lot from positive user stories, like this recent experience of a newcomer adopting the courier package. I'm absolutely convinced that those who just found their way into Debian have valuable perspectives--and that we stand to learn the most from listening to them.
We should also take seriously what Russ Allbery noted in the discussion: "This says bad things about the project's sustainability and I think everyone knows that." Volunteers move on--that’s normal and expected. But it makes it all the more important that we put effort into keeping Debian's contributor base at least stable, if not growing.
Lucas Nussbaum has volunteered to handle the paperwork and submit a request on Debian’s behalf to LLM providers, aiming to secure project-wide access for Debian Developers. If successful, every DD will be free to use this access--or not--according to their own preferences.
Kind regards Andreas.
05 July, 2025 10:00PM by Andreas Tille
For at least 12 years laptops have been defaulting to not having the traditional PC 101 key keyboard function key functionality and instead have had other functions like controlling the volume and have had a key labelled Fn to toggle the functions. It’s been a BIOS option to control whether traditional function keys or controls for volume etc are the default and for at least 12 years I’ve configured all my laptops to have the traditional function keys as the default.
Recently I’ve been working in corporate IT and having exposure to many laptops with the default BIOS settings for those keys to change volume etc and no reasonable option for addressing it. This has made me reconsider the options for configuring these things.
Here’s a page listing the standard uses of function keys [1]. Here is a summary of the relevant part of that page:
The keys F1, F3, F4, F7, F9, F10, and F12 don’t get much use for me and for the people I observe. The F2 and F8 keys aren’t useful in most programs, F6 is only really used in web browsers – but the web browser counts as “most programs” nowadays.
Here’s the description of Thinkpad Fn keys [2]. I use Thinkpads for fun and Dell laptops for work, so it would be nice if they both worked in similar ways but of course they don’t. Dell doesn’t document how their Fn keys are laid out, but the relevant bit is that F1 to F4 are the same as on Thinkpads which is convenient as they are the ones that are likely to be commonly used and needed in a hurry.
I have used the KDE settings on my Thinkpad to map the function F1 to F3 keys to the Fn equivalents which are F1 to mute-audio, F2 for vol-down, and F3 for vol-up to allow using them without holding down the Fn key while having other function keys such as F5 and F6 have their usual GUI functionality. Now I have to could train myself to use F8 in situations where I usually use F2, at least when using a laptop.
The only other Fn combinations I use are F5 and F6 for controlling screen brightness, but that’s not something I use much.
It’s annoying that the laptop manufacturers forced me to this. Having a Fn key to get extra functions and not need 101+ keys on a laptop size device is a reasonable design choice. But they could have done away with the PrintScreen key to make space for something else. Also for Thinkpads a touch pad is something that could obviously be removed to gain some extra space as the Trackpoint does all that’s needed in that regard.
04 July, 2025 11:44AM by etbe
There are many negative articles about “AI” (which is not about actual Artificial Intelligence also known as “AGI”). Which I think are mostly overblown and often ridiculous.
Complaints about resource usage are common, training Llama 3.1 could apparently produce as much pollution as “10,000 round trips by car between Los Angeles and New York City”. That’s not great but when you compare to the actual number of people doing such drives in the US and the number of people taking commercial flights on that route it doesn’t seem like such a big deal. Apparently commercial passenger jets cause CO2 emissions per passenger about equal to a car with 2 people. Why is it relevant whether pollution comes from running servers, driving cars, or steel mills? Why not just tax polluters for the damage they do and let the market sort it out? People in the US make a big deal about not being communist, so why not have a capitalist solution, make it more expensive to do undesirable things and let the market sort it out?
ML systems are a less bad use of compute resources than Bitcoin, at least ML systems give some useful results while Bitcoin has nothing good going for it.
People often complain about the apparent impossibility of “AI” companies doing what investors think they will do. But this isn’t anything new, that all happened before with the “dot com boom”. I’m not the first person to make this comparison, The Daily WTF (a high quality site about IT mistakes) has an interesting article making this comparison [1]. But my conclusions are quite different.
The result of that was a lot of Internet companies going bankrupt, the investors in those companies losing money, and other companies then bought up their assets and made profitable companies. The cheap Internet we now have was built on the hardware from bankrupt companies which was sold for far less than the manufacture price. That allowed it to scale up from modem speeds to ADSL without the users paying enough to cover the purchase of the infrastructure. In the early 2000s I worked for two major Dutch ISPs that went bankrupt (not my fault) and one of them continued operations in the identical manner after having the stock price go to zero (I didn’t get to witness what happened with the other one). As far as I’m aware random Dutch citizens and residents didn’t suffer from this and employees just got jobs elsewhere.
There are good things being done with ML systems and when companies like OpenAI go bankrupt other companies will buy the hardware and do good things.
NVidia isn’t ever going to have the future sales that would justify a market capitalisation of almost 4 Trillion US dollars. This market cap can support paying for new research and purchasing rights to patented technology in a similar way to the high stock price of Google supported buying YouTube, DoubleClick, and Motorola Mobility which are the keys to Google’s profits now.
Until recently I worked for a company that used ML systems to analyse drivers for signs of fatigue, distraction, or other inappropriate things (smoking which is illegal in China, using a mobile phone, etc). That work was directly aimed at saving human lives with a significant secondary aim of saving wear on vehicles (in the mining industry drowsy drivers damage truck tires and that’s a huge business expense).
There are many applications of ML in medical research such as recognising cancer cells in tissue samples.
There are many less important uses for ML systems, such as recognising different types of pastries to correctly bill bakery customers – technology that was apparently repurposed for recognising cancer cells.
The ability to recognise objects in photos is useful. It can be used for people who want to learn about random objects they see and could be used for helping young children learn about their environment. It also has some potential for assistance for visually impaired people, it wouldn’t be good for safety critical systems (don’t cross a road because a ML system says there are no cars coming) but could be useful for identifying objects (is this a lemon or a lime). The Humane AI pin had some real potential to do good things but there wasn’t a suitable business model [2], I think that someone will develop similar technology in a useful way eventually.
Even without trying to do what the Humane AI Pin attempted, there are many ways for ML based systems to assist phone and PC use.
ML systems allow analysing large quantities of data and giving information that may be correct. When used by a human who knows how to recognise good answers this can be an efficient way of solving problems. I personally have solved many computer problems with the help of LLM systems while skipping over many results that were obviously wrong to me. I believe that any expert in any field that is covered in the LLM input data could find some benefits from getting suggestions from an LLM. It won’t necessarily allow them to solve problems that they couldn’t solve without it but it can provide them with a set of obviously wrong answers mixed in with some useful tips about where to look for the right answers.
I don’t think it’s reasonable to expect ML systems to make as much impact on society as the industrial revolution, and the agricultural revolutions which took society from more than 90% farm workers to less than 5%. That doesn’t mean everything will be fine but it is something that can seem OK after the changes have happened. I’m not saying “apart from the death and destruction everything will be good”, the death and destruction are optional. Improvements in manufacturing and farming didn’t have to involve poverty and death for many people, improvements to agriculture didn’t have to involve overcrowding and death from disease. This was an issue of political decisions that were made.
Political decisions that are being made now have the aim of making the rich even richer and leaving more people in poverty and in many cases dying due to being unable to afford healthcare. The ML systems that aim to facilitate such things haven’t been as successful as evil people have hoped but it will happen and we need appropriate legislation if we aren’t going to have revolutions.
There are documented cases of suicide being inspired by Chat GPT systems [4]. There have been people inspired towards murder by ChatGPT systems but AFAIK no-one has actually succeeded in such a crime yet. There are serious issues that need to be addressed with the technology and with legal constraints about how people may use it. It’s interesting to consider the possible uses of ChatGPT systems for providing suggestions to a psychologist, maybe ChatGPT systems could be used to alleviate mental health problems.
The cases of LLM systems being used for cheating on assignments etc isn’t a real issue. People have been cheating on assignments since organised education was invented.
There is a real problem of ML systems based on biased input data that issue decisions that are the average of the bigotry of the people who provided input. That isn’t going to be worse than the current situation of bigoted humans making decisions based on hate and preconceptions but it will be more insidious. It is possible to search for that so for example a bank could test it’s mortgage approval ML system by changing one factor at a time (name, gender, age, address, etc) and see if it changes the answer. If it turns out that the ML system is biased on names then the input data could have names removed. If it turns out to be biased about address then there could be weights put in to oppose that.
For a long time there has been excessive trust in computers. Computers aren’t magic they just do maths really fast and implement choices based on the work of programmers – who have all the failings of other humans. Excessive trust in a rule based system is less risky than excessive trust in a ML system where no-one really knows why it makes the decisions it makes.
Self driving cars kill people, this is the truth that Tesla stock holders don’t want people to know.
Companies that try to automate everything with “AI” are going to be in for some nasty surprises. Getting computers to do everything that humans do in any job is going to be a large portion of an actual intelligent computer which if it is achieved will raise an entirely different set of problems.
I’ve previously blogged about ML Security [5]. I don’t think this will be any worse than all the other computer security problems in the long term, although it will be more insidious.
Companies spending billions of dollars without firm plans for how to make money are going to go bankrupt no matter what business they are in. Companies like Google and Microsoft can waste some billions of dollars on AI Chat systems and still keep going as successful businesses. Companies like OpenAI that do nothing other than such chat systems won’t go well. But their assets can be used by new companies when sold at less than 10% the purchase price.
Companies like NVidia that have high stock prices based on the supposed ongoing growth in use of their hardware will have their stock prices crash. But the new technology they develop will be used by other people for other purposes. If hospitals can get cheap diagnostic ML systems because of unreasonable investment into “AI” then that could be a win for humanity.
Companies that bet their entire business on AI even when it’s not necessarily their core business (as Tesla has done with self driving) will have their stock price crash dramatically at a minimum and have the possibility of bankruptcy. Having Tesla go bankrupt is definitely better than having people try to use them as self driving cars.
03 July, 2025 10:21AM by etbe
Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 1241 other packages on CRAN, downloaded 40.4 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 634 times according to Google Scholar.
Conrad released a minor version 4.6.0 yesterday which offers new accessors for non-finite values. And despite being in Beautiful British Columbia on vacation, I had wrapped up two rounds of reverse dependency checks preparing his 4.6.0 release, and shipped this to CRAN this morning where it passed with flying colours and no human intervention—even with over 1200 reverse dependencies. The changes since the last CRAN release are summarised below.
Changes in RcppArmadillo version 14.6.0-1 (2025-07-02)
Upgraded to Armadillo release 14.6.0 (Caffe Mocha)
Added
balance()
to transform matrices so that column and row norms are roughly the sameAdded
omit_nan()
andomit_nonfinite()
to extract elements while omitting NaN and non-finite valuesAdded
find_nonnan()
for finding indices of non-NaN elementsAdded standalone
replace()
functionThe
fastLm()
help page now mentions that options tosolve()
can control its behavior.
Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.
With a friendly Canadian hand wave from vacation in Beautiful British Columbia, and speaking on behalf of the Rcpp Core Team, I am excited to shared that the (regularly scheduled bi-annual) update to Rcpp just brought version 1.1.0 to CRAN. Debian builds haven been prepared and uploaded, Windows and macOS builds should appear at CRAN in the next few days, as will builds in different Linux distribution–and of course r2u should catch up tomorrow as well.
The key highlight of this release is the switch to C++11 as minimum standard. R itself did so in release 4.0.0 more than half a decade ago; if someone is really tied to an older version of R and an equally old compiler then using an older Rcpp with it has to be acceptable. Our own tests (using continuous integration at GitHub) still go back all the way to R 3.5.* and work fine (with a new-enough compiler). In the previous release post, we commented that we had only reverse dependency (falsely) come up in the tests by CRAN, this time there was none among the well over 3000 packages using Rcpp at CRAN. Which really is quite amazing, and possibly also a testament to our rigorous continued testing of our development and snapshot releases on the key branch.
This release continues with the six-months January-July cycle started with release 1.0.5 in July 2020. As just mentioned, we do of course make interim snapshot ‘dev’ or ‘rc’ releases available. While we not longer regularly update the Rcpp drat repo, the r-universe page and repo now really fill this role admirably (and with many more builds besides just source). We continue to strongly encourage their use and testing—I run my systems with these versions which tend to work just as well, and are of course also fully tested against all reverse-dependencies.
Rcpp has long established itself as the most popular way of enhancing R with C or C++ code. Right now, 3038 packages on CRAN depend on Rcpp for making analytical code go faster and further. On CRAN, 13.6% of all packages depend (directly) on Rcpp, and 61.3% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 100.8 million times. The two published papers (also included in the package as preprint vignettes) have, respectively, 2023 (JSS, 2011) and 380 (TAS, 2018) citations, while the the book (Springer useR!, 2013) has another 695.
As mentioned, this release switches to C++11 as the minimum standard.
The diffstat
display in the CRANberries
comparison to the previous release shows how several (generated)
sources files with C++98 boilerplate have now been removed; we also
flattened a number of if
/else
sections we no
longer need to cater to older compilers (see below for details). We also
managed more accommodation for the demands of tighter use of the C API
of R by removing DATAPTR
and CLOENV
use. A
number of other changes are detailed below.
The full list below details all changes, their respective PRs and, if applicable, issue tickets. Big thanks from all of us to all contributors!
Changes in Rcpp release version 1.1.0 (2025-07-01)
Changes in Rcpp API:
C++11 is now the required minimal C++ standard
The
std::string_view
type is now covered bywrap()
(Lev Kandel in #1356 as discussed in #1357)A last remaining
DATAPTR
use has been converted toDATAPTR_RO
(Dirk in #1359)Under R 4.5.0 or later,
R_ClosureEnv
is used instead ofCLOENV
(Dirk in #1361 fixing #1360)Use of
lsInternal
switched tolsInternal3
(Dirk in #1362)Removed compiler detection macro in a header cleanup setting C++11 as the minunum (Dirk in #1364 closing #1363)
Variadic templates are now used onconditionally given C++11 (Dirk in #1367 closing #1366)
Remove
RCPP_USING_CXX11
as a#define
as C++11 is now a given (Dirk in #1369)Additional cleanup for
__cplusplus
checks (Iñaki in #1371 fixing #1370)Unordered set construction no longer needs a macro for the pre-C++11 case (Iñaki in #1372)
Lambdas are supported in a Rcpp Sugar functions (Iñaki in #1373)
The Date(time)Vector classes now have default ctor (Dirk in #1385 closing #1384)
Fixed an issue where Rcpp::Language would duplicate its arguments (Kevin in #1388, fixing #1386)
Changes in Rcpp Attributes:
Changes in Rcpp Documentation:
Changes in Rcpp Deployment:
Thanks to my CRANberries, you can also look at a diff to the previous release Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues).
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.
02 July, 2025 01:12AM by Junichi Uekawa
01 July, 2025 07:08PM by Ben Hutchings
gordon1
, unload the mediatek module first.
The following seems to work, either from the console or under swayecho devices > /sys/power/pm_test
echo reboot > /sys/power/disk
rmmod mt76x2u
echo disk > /sys/power/state
modprobe mt76x2u
Another short status update of what happened on my side last month. Phosh 0.48.0 is out with nice improvements, phosh.mobi e.V. is alive, helped a bit to get cellbroadcastd out, osk bugfixes and some more:
See below for details on the above and more:
GApplicaation
(MR)STARTTLS
behavior in docs (MR)This is not code by me but reviews on other peoples code. The list is (as usual) slightly incomplete. Thanks for the contributions!
If you want to support my work see donations.
Join the Fediverse thread
This month I didn't have any particular focus. I just worked on issues in my info bubble.
All work was done on a volunteer basis.
My Debian contributions this month were all sponsored by Freexian. This was a very light month; I did a few things that were easy or that seemed urgent for the upcoming trixie release, but otherwise most of my energy went into Debusine. I’ll be giving a talk about that at DebConf in a couple of weeks; this is the first DebConf I’ll have managed to make it to in over a decade, so I’m pretty excited.
You can also support my work directly via Liberapay or GitHub Sponsors.
After reading a bunch of recent discourse about X11 and Wayland, I decided to try switching my laptop (a Framework 13 AMD running Debian trixie with GNOME) over to Wayland. I don’t remember why it was running X; I think I must have either inherited some configuration from my previous laptop (in which case it could have been due to anything up to ten years ago or so), or else I had some initial problem while setting up my new laptop and failed to make a note of it. Anyway, the switch was hardly noticeable, which was great.
One problem I did notice is that my preferred terminal emulator, pterm
,
crashed after the upgrade. I run a slightly-modified version from git to
make some small terminal emulation changes that I really must either get
upstream or work out how to live without one of these days, so it took me a
while to notice that it only crashed when running from the packaged version,
because the crash was in code that only runs when pterm
has a set-id bit.
I reported this upstream, they quickly fixed
it,
and I
backported
it to the Debian package.
Upstream bug #67169 reported URLs being dropped from PDF output in some cases. I investigated the history both upstream and in Debian, identified the correct upstream patch to backport, and uploaded a fix.
I upgraded libfido2 to 1.16.0 in experimental.
I upgraded pydantic-extra-types to a new upstream version, and fixed some resulting fallout in pendulum.
I updated python-typing-extensions in bookworm-backports, to help fix python3-tango: python3-pytango from bookworm-backports does not work (10.0.2-1~bpo12+1).
I upgraded twisted to a new upstream version in experimental.
I fixed or helped to fix a few release-critical bugs:
30 June, 2025 11:30PM by Colin Watson
script: https://docs.kernel.org/power/basic-pm-debugging.html
kernel is 6.15.4-1~exp1+reform20250628T170930Z
normal reboot works
Either from the console, or from sway, the intial test of reboot mode hibernate fails. In both cases it looks very similar to halting.
As I often do, this year I have also prepared a set of personalized maps for your OpenPGP keysigning in DebConf25, in Brest!
What is that, dare you ask?
One of the not-to-be-missed traditions of DebConf is a Key-Signing Party
(KSP) that spans the whole conference! Travelling from all the corners of
the world to a single, large group gathering, we have the ideal opportunity
to spread some communicable diseases trust on your peers’
identities and strengthen Debian’s OpenPGP keyring.
But whom should you approach for keysigning?
Go find yourself in the nice listing I have
prepared. By clicking on your
long keyid (in my case, the link labeled 0x2404C9546E145360
), anybody can
download your certificate (public key + signatures). The
SVG and
PNG
links will yield a graphic version of your position within the DC25
keyring, and the
TXT
link will give you a textual explanation of it. (of course, your links will
differ, yada yada…)
Please note this is still a preview of our KSP information: You will notice there are outstanding several things for me to fix before marking the file as final. First, some names have encoding issues I will fix. Second, some keys might be missing — if you submitted your key as part of the conference registration form but it is not showing, it must be because my scripts didn’t find it in any of the queried keyservers. My scripts are querying the following servers:
hkps://keyring.debian.org/
hkps://keys.openpgp.org/
hkps://keyserver.computer42.org/
hkps://keyserver.ubuntu.com/
hkps://pgp.mit.edu/
hkps://pgp.pm/
hkps://pgp.surf.nl/
hkps://pgpkeys.eu/
hkps://the.earth.li/
Make sure your key is available in at least some of them; I will try to do a further run on Friday, before travelling, or shortly after arriving to France.
If you didn’t submit your key in time, but you will be at DC25, please
mail me stating [DC25 KSP]
in your mail title, and I will manually add it
to the list.
On (hopefully!) Friday, I’ll post the final, canonical KSP coordination page which you should download and calculate its SHA256-sum. We will have printed out convenience sheets to help you do your keysigning at the front desk.
Just when you thought it was safe to go to court, think again. Your lawyer might not have your best interests at heart and even worse, they may be working for the other side.
In 2014, journalists discovered Victoria Police had a secret informer, a mole snitching on the underworld, identified by the code name Lawyer X.
Initially, police were so concerned they sought a restraining order to prevent the media from publishing anything about the scandal. Police even sought to have the code name Lawyer X suppressed from publication.
It was beyond embarassing: not only did police have the burden of protecting their secret informer, they may also have to protect her relatives who share the same name. The most notable among them, the informer's uncle, James Gobbo, a supreme court judge who subsequently served as Governor for the State of Victoria.
There is absolutely no suggestion that Lawyer X's relatives had anything to do with her misdeeds. Nonetheless, the clients she betrayed were the biggest crooks in town, until, of course, her unethical behavior gave them the opportunity to have those convictions overturned and present themselves as model citizens once again. Any relatives or former business associates of Lawyer X, including the former governor, would be in danger for the rest of their lives.
James Gobbo and his son James Gobbo junior are both Old Xaverians, graduates of Melbourne's elite Jesuit school for boys, like my father and I.
Lawyer X was eventually revealed to be Nicola Gobbo, a graduate of the elite girls school Genazzano FCJ College. My aunt, that is my father's sister, also went to Genazzano.
Alumni communications typically refer to Old Xaverians with the symbols "OX" and the year of graduation, for example, "OX96" for somebody who graduated in 1996.
Whenever a scandal like this arises, if the suspect is a graduate of one of these elite schools, the newspapers will be very quick to dramatize the upper class background. The case of Lawyer X was a head and shoulders above any other scandal: a former prefect and class captain who made a career out of partying with drug lords, having their children and simultaneously bugging their conversations for the police.
Stories like this are inconvenient for those elite schools but in reality, I don't feel the schools are responsible when one of these unlucky outcomes arises. The majority of students are getting a head start in life but there is simply nothing that any school can do to prevent one or two alumni going off the rails like this.
Having been through this environment myself, I couldn't believe what I was seeing in 2023 when the Swiss financial regulator (FINMA) voluntarily published a few paragraphs from a secret judgment, using the code name "X" to refer to a whole law office (cabinet juridique in French) of jurists in Geneva who had ripped off their clients.
The Gobbo family, Genazzano FCJ College and alumni have finally been vindicated. The misdeeds of Lawyer X pale in comparison to the crimes of the Swiss law firm X.
Remember, Lawyer X operated in secrecy, her identity only known to a small number of handlers inside the police department. Thanks to my own research, I was able to prove that the activities of Law firm X were fully known to the bar association and the financial regulator for at least two years before they belatedly closed the firm.
Lawyer X claims she contributed evidence to the arrest of 386 suspects during her time as a police informer. Law firm X had over twenty thousand clients at the time they were shut down. They admit that client records fell into the hands of Walder Wyss, a rival law firm engaged in legal proceedings against some of the clients who were abandoned by the Swiss jurists.
Lawyer X was a woman and in her most recent bid for compensation, she claimed she was exploited by the police. Law firm X trafficked at least one woman from France to come and work in Geneva helping them promote an unauthorized insurance service to residents of both France and Switzerland.
Lawyer X was a former member of a political party. One of the jurists from Law firm X was working for the rogue law office at the same time that he was a member of Geneva city council. He is a member of the same political party as the Swiss president from that era.
In 1993, Lawyer X was an editor of Farrago, Australia's leading student newspaper. Law firm X used the Swiss media to write positive stories about their company. When the same company was outlawed, nanny-state laws prevented the media reporting anything at all about its downfall. Ironically, one of my former clients was also an editor of Farrago before he became Australia's Minister for Finance. The word Farrago gives a fascinating insight into the life of Lawyer X. Here is a sample sentence using the word Farrago in the Cambridge dictionary:
... told us a farrago of lies
When FINMA revealed the secret judgment shuttering Law Firm X, Urban Angehrn, the FINMA director, resigned citing health reasons. His dramatic resignation helped bury news stories about the Law firm X judgment. In Australia, a number of chief commissioners have resigned. In fact, Victoria Police have been through three leaders in the last year.
In 2018, I attended the UN Forum on Business and Human Rights, where I made this brief intervention predicting the future of Facebook and Twitter. When Elon Musk purchased Twitter in 2022, he called it X. Go figure.
Jonathan McDowell wrote part 2 of his blog series about setting up a voice assistant on Debian, I look forward to reading further posts [1]. I’m working on some related things for Debian that will hopefully work with this.
I’m testing out OpenSnitch on Trixie inspired by this blog post, it’s an interesting package [2].
Valerie wrote an informative article about creating mesh networks using LORA for emergency use [3].
Insightful article about AI and the end of prestige [5]. We should all learn about LLMs.
Jonathan Dowland wrote an informative blog post about how to manage namespaces on Linux [6].
Interesting article about Schizophrenia and the cliff-edge function of evolution [8].
30 June, 2025 01:58PM by etbe
On Monday I had my Viva Voce (PhD defence), and passed (with minor corrections).
It's a relief to have passed after 8 years of work. I'm not quite done of course, as I have the corrections to make! Once those are accepted I'll upload my thesis here.
We are pleased to announce that AMD has committed to sponsor DebConf25 as a Platinum Sponsor.
The AMD ROCm platform includes programming models, tools, compilers, libraries, and runtimes for AI and HPC solution development on AMD GPUs. Debian is an officially supported platform for AMD ROCm and a growing number of components are now included directly in the Debian distribution.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. AMD is deeply committed to supporting and contributing to open-source projects, foundations, and open-standards organizations, taking pride in fostering innovation and collaboration within the open-source community.
With this commitment as Platinum Sponsor, AMD is contributing to the annual Debian Developers’ Conference, directly supporting the progress of Debian and Free Software. AMD contributes to strengthening the worldwide community that collaborates on Debian projects year-round.
Thank you very much, AMD, for your support of DebConf25!
DebConf25 will take place from 14 to 20 July 2025 in Brest, France, and will be preceded by DebCamp, from 7 to 13 July 2025.
DebConf25 is accepting sponsors! Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf25 website at https://debconf25.debconf.org/sponsors /become-a-sponsor/.
26 June, 2025 09:37PM by Daniel Lange
Debian uses LDAP for storing information about users, hosts and other objects. The wrapping around this is called userdir-ldap, or ud-ldap for short. It provides a mail gateway, web UI and a couple of schemas for different object types.
Back in late 2018 and early 2019, we (DSA) removed support for ISO5218 in userdir-ldap, and removed the corresponding data. This made some people upset, since they were using that information, as imprecise as it was, to infer people’s pronouns. ISO5218 has four values for sex, unknown, male, female and N/A. This might have been acceptable when the standard was new (in 1976), but it wasn’t acceptable any longer in 2018.
A couple of days ago, I finally got around to adding support to userdir-ldap to let people specify their pronouns. As it should be, it’s a free-form text field. (We don’t have localised fields in LDAP, so it probably makes sense for people to put the English version of their pronouns there, but the software does not try to control that.)
So far, it’s only exposed through the LDAP gateway, not in the web UI.
If you’re a Debian developer, you can set your pronouns using
echo "pronouns: he/him" | gpg --clearsign | mail changes@db.debian.org
I see that four people have already done so in the time I’ve taken to write this post.
JP was puzzled that using podman run --memory=2G …
would not result in the 2G limit being visible inside the container.
While we were able to identify this as a visualization problem — tools like free(1)
only look at /proc/meminfo
and that is not virtualized inside a container, you'd have to look at /sys/fs/cgroup/memory.max
and friends instead — I couldn't leave it at that.
And then I remembered there is actually something that can provide a virtual (cgroup-aware) /proc
for containers: LXCFS!
But does it work with Podman?! I always used it with LXC, but there is technically no reason why it wouldn't work with a different container solution — cgroups are cgroups after all.
As we all know: there is only one way to find out!
Take a fresh Debian 12 VM, install podman
and verify things behave as expected:
user@debian12:~$ podman run -ti --rm --memory=2G centos:stream9 bash-5.1# grep MemTotal /proc/meminfo MemTotal: 6067396 kB bash-5.1# cat /sys/fs/cgroup/memory.max 2147483648
And after installing (and starting) lxcfs
, we can use the virtual /proc/meminfo
it generates by bind-mounting it into the container (LXC does that part automatically for us):
user@debian12:~$ podman run -ti --rm --memory=2G --mount=type=bind,source=/var/lib/lxcfs/proc/meminfo,destination=/proc/meminfo centos:stream9 bash-5.1# grep MemTotal /proc/meminfo MemTotal: 2097152 kB bash-5.1# cat /sys/fs/cgroup/memory.max 2147483648
The same of course works with all the other proc entries lxcfs
provides (cpuinfo
, diskstats
, loadavg
, meminfo
, slabinfo
, stat
, swaps
, and uptime
here), just bind-mount them.
And yes, free(1)
now works too!
bash-5.1# free -m total used free shared buff/cache available Mem: 2048 3 1976 0 67 2044 Swap: 0 0 0
Just don't blindly mount the whole /var/lib/lxcfs/proc
over the container's /proc
.
It did work (as in: "bash
and free
didn't crash") for me, but with /proc/$PID
etc missing, I bet things will go south pretty quickly.
24 June, 2025 07:46PM by evgeni
A new minor release 0.2.6 of our RcppRedis package arrived on CRAN today. RcppRedis is one of several packages connecting R to the fabulous Redis in-memory datastructure store (and much more). It works equally well with the newer fork Valkey. RcppRedis does not pretend to be feature complete, but it may do some things faster than the other interfaces, and also offers an optional coupling with MessagePack binary (de)serialization via RcppMsgPack. The package has been “deployed in production” as a risk / monitoring tool on a trading floor for several years. It also supports pub/sub dissemination of streaming market data as per this earlier example.
This update brings new functions del
, lrem
,
and lmove
(for the matching Redis / Valkey commands) which
may be helpful in using Redis (or Valkey) as a job queue.
We also extended the publish
accessor by supporting text
(i.e. string
) mode along with raw
or
rds
(the prior default which always serialized R objects) just how
listen
already worked with these three cases. The change
makes it possible to publish from R to subscribers not running R as they
cannot rely on the R deserealizer. An example is provided by almm, a live market
monitor, which we introduced in this
blog post. Apart from that the continuous integration script
received another mechanical update.
The detailed changes list follows.
Changes in version 0.2.6 (2025-06-24)
The commands
DEL
,LREM
andLMOVE
have been addedThe continuous integration setup was updated once more
The pub/sub publisher now supports a type argument similar to the listener, this allows string message publishing for non-R subscribers
Courtesy of my CRANberries, there is also a diffstat report for this this release. More information is on the RcppRedis page and at the repository and its issue tracker.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.
I have a SHT3x humidity and temperature sensor connected to the i2c bus of my Turris Omnia that runs OpenWrt.
To make it produce nice graphs shown in the webif I installed the packages
collectd-mod-sensors
, luci-app-statistics
and kmod-hwmon-sht3x
.
To make the sht3x driver bind to the device I added
echo 'sht3x 0x44' > /sys/bus/i2c/devices/0-0070/channel-6/new_device
to /etc/rc.local
. After that I only had to enable the Sensors
plugin below Statistics -> Setup -> General plugins
and check 'Monitor all except specified` in its "Configure" dialog.
If we ever thought a couple of years or decades of constant use would get humankind to understand how an asymetric key pair is to be handled… It’s time we moved back to square one.
I had to do an online tramit with the Mexican federal government to get a statement certifying I successfully finished my studies, and I found this jewel of user interface:
So… I have to:
Not only I have to submit my certificate (public key)… But also the private part (and, of course, the password that secures it).
I understand I’m interacting with a Javascript thingie that runs only client-side, and I trust it is not shipping my private key to their servers. But given it is an opaque script, I have no assurance about it. And, of course, this irks me because I am who I am and because I’ve spent several years thinking about cryptography. But for regular people, it just looks as a stupid inconvenience: they have to upload two weird files with odd names and provide a password. What for?
This is beyond stupid. I’m baffled.
(of course, I did it, because I need the fsckin’ document. Oh, and of course, I paid my MX$1770, ≈€80, for it… which does not make me too happy for a tramit that’s not even shuffling papers, only storing the right bits in the right corner of the right datacenter, but anyhow…)
For some time I’ve been noticing news reports about PFAs [1]. I hadn’t thought much about that issue, I grew up when leaded petrol was standard, when almost all thermometers had mercury, when all small batteries had mercury, and I had generally considered that I had already had so many nasty chemicals in my body that as long as I don’t eat bottom feeding seafood often I didn’t have much to worry about. I already had a higher risk of a large number of medical issues than I’d like due to decisions made before I was born and there’s not much to do about it given that there are regulations restricting the emissions of lead, mercury etc.
I just watched a Veritasium video about Teflon and the PFA poisoning related to it’s production [2]. This made me realise that it’s more of a problem than I realised and it’s a problem that’s getting worse. PFA levels in the parts-per-trillion range in the environment can cause parts-per-billion in the body which increases the risks of several cancers and causes other health problems. Fortunately there is some work being done on water filtering, you can get filters for a home level now and they are working on filters that can work at a sufficient scale for a city water plant.
There is a map showing PFAs in the environment in Australia which shows some sites with concerning levels that are near residential areas [3]. One of the major causes for that in Australia is fire retardant foam – Australia has never had much if any Teflon manufacturing AFAIK.
Also they noted that donating blood regularly can decrease levels of PFAs in the bloodstream. So presumably people who have medical conditions that require receiving donated blood regularly will have really high levels.
23 June, 2025 12:26PM by etbe
When I was younger, and definitely naïve, I was so looking forward to AI, which will help us write lots of good, reliable code faster. Well, principally me, not thinking what impact it will have industry-wide. Other more general concerns, like societal issues, role of humans in the future and so on were totally not on my radar.
At the same time, I didn’t expect this will actually happen. Even years later, things didn’t change dramatically. Even the first release of ChatGPT a few years back didn’t click for me, as the limitations were still significant.
The first hint of the change, for me, was when a few months ago (yes, behind the curve), I asked ChatGPT to re-explain a concept to me, and it just wrote a lot of words, but without a clear explanation. On a whim, I asked Grok—then recently launched, I think—to do the same. And for the first time, the explanation clicked and I felt I could have a conversation with it. Of course, now I forgot again that theoretical CS concept, but the first step was done: I can ask an LLM to explain something, and it will, and I can have a back and forth logical discussion, even if on some theoretical concept. Additionally, I learned that not all LLMs are the same, and that means there’s real competition and that leap frogging is possible.
Another topic on which I tried to adopt early and failed to get mileage out of it, was GitHub Copilot (in VSC). I tried, it helped, but didn’t feel any speed-up at all. Then more recently, in May, I asked Grok what’s the state of the art in AI-assisted coding. It said either Claude in a browser tab, or in VSC via continue.dev extension.
The continue.dev extension/tooling is a bit of a strange/interesting thing. It seems to want to be a middle-man between the user and actual LLM services, i.e. you pay a subscription to continue.dev, not to Anthropic itself, and they manage the keys/APIs, for whatever backend LLMs you want to use. The integration with Visual Studio Code is very nice, but I don’t know if long-term their business model will make sense. Well, not my problem.
So I installed the latter and subscribed, thinking 20 CHF for a month is good for testing. I skipped the tutorial model/assistant, created a new one from scratch, just enabled Claude 3.7 Sonnet, and started using it. And then, my mind was blown-not just by the LLM, but by the ecosystem. As said, I’ve used GitHub copilot before, but it didn’t seem effective. I don’t know if a threshold has been reached, or Claude (3.7 at that time) is just better than ChatGPT.
I didn’t use the AI to write (non-trivial) code for me, at most boilerplate snippets. But I used it both as partner for discussion - “I want to do x, what do you think, A or B?�, and as a teacher, especially for fronted topics, which I’m not familiar with.
Since May, in mostly fragmented sessions, I’ve achieved more than in the last two years. Migration from old school JS to ECMA modules, a webpacker (reducing bundle size by 50%), replacing an old Javascript library with hand written code using modern APIs, implementing the zoom feature together with all of keyboard, mouse, touchpad and touchscreen support, simplifying layout from manually computed to automatic layout, and finding a bug in webkit for which it also wrote a cool minimal test (cool, as in, way better than I’d have ever, ever written, because for me it didn’t matter that much). And more. Could I have done all this? Yes, definitely, nothing was especially tricky here. But hours and hours of reading MDN, scouring Stack Overflow and Reddit, and lots of trial and error. So doable, but much more toily.
This, to me, feels like cheating. 20 CHF per month to make me 3x more productive is free money—well, except that I don’t make money on my code which is written basically for myself. However, I don’t get stuck anymore searching hours in the web for guidance, I ask my question, and I get at least direction if not answer, and I’m finished way earlier. I can now actually juggle more hobbies, in the same amount of time, if my personal code takes less time or differently said, if I’m more efficient at it.
Not all is roses, of course. Once, it did write code with such an endearing error that it made me laugh. It was so blatantly obvious that you shouldn’t keep other state in the array that holds pointer status because that confuses the calculation of “how many pointers are down�, probably to itself too if I’d have asked. But I didn’t, since it felt a bit embarassing to point out such a dumb mistake. Yes, I’m anthropomorphising again, because this is the easiest way to deal with things.
In general, it does an OK-to-good-to-sometimes-awesome job, and the best thing is that it summarises documentation and all of Reddit and Stack Overflow. And gives links to those.
Now, I have no idea yet what this means for the job of a software engineer. If on open source code, my own code, it makes me 3x faster—reverse engineering my code from 10 years ago is no small feat—for working on large codebases, it should do at least the same, if not more.
As an example of how open-ended the assistance can be, at one point, I started implementing a new feature—threading a new attribute to a large number of call points. This is not complex at all, just add a new field to a Haskell record, and modifying everything to take it into account, populate it, merge it when merging the data structures, etc. The code is not complex, tending toward boilerplate a bit, and I was wondering on a few possible choices for implementation, so, with just a few lines of code written that were not even compiling, I asked “I want to add a new feature, should I do A or B if I want it to behave like this�, and the answer was something along the lines of “I see you want to add the specific feature I was working on, but the implementation is incomplete, you still need to to X, Y and Z�. My mind was blown at this point, as I thought, if the code doesn’t compile, surely the computer won’t be able to parse it, but this is not a program, this is an LLM, so of course it could read it kind of as a human would. Again, the code complexity is not great, but the fact that it was able to read a half-written patch, understand what I was working towards, and reason about, was mind-blowing, and scary. Like always.
Now, after all this, while writing a recent blog post, I thought—this is going to be public anyway, so let me ask Claude what it thinks about it. And I was very surprised, again: gone was all the pain of rereading three times my post to catch typos (easy) or phrasing structure issues. It gave me very clearly points, and helped me cut 30-40% of the total time. So not only coding, but word smithing too is changed. If I were an author, I’d be delighted (and scared). Here is the overall reply it gave me:
So yeah, this speeds me up to about 2x on writing blog posts, too. It definitely feels not fair.
After all this, I’m a bit flabbergasted. Gone are the 2000’s with code without unittests, gone are the 2010’s without CI/CD, and now, mid-2020’s, gone is the lone programmer that scours the internet to learn new things, alone?
What this all means for our skills in software development, I have no idea, except I know things have irreversibly changed (a butlerian jihad aside). Do I learn better with a dedicated tutor even if I don’t fight with the problem for so long? Or is struggling in finding good docs the main method of learning? I don’t know yet. I feel like I understand the topics I’m discussing with the AI, but who knows in reality what it will mean long term in terms of “stickiness� of learning. For the better, or for worse, things have changed. After all the advances over the last five centuries in mechanical sciences, it has now come to some aspects of the intellectual work.
Maybe this is the answer to the ever-growing complexity of tech stacks? I.e. a return of the lone programmer that builds things end-to-end, but with AI taming the complexity added in the last 25 years? I can dream, of course, but this also means that the industry overall will increase in complexity even more, because large companies tend to do that, so maybe a net effect of not much…
One thing I did learn so far is that my expectation that AI (at this level) will only help junior/beginner people, i.e. it would flatten the skills band, is not true. I think AI can speed up at least the middle band, likely the middle top band, I don’t know about the 10x programmers (I’m not one of them). So, my question about AI now is how to best use it, not to lament how all my learning (90% self learning, to be clear) is obsolete. No, it isn’t. AI helps me start and finish one migration (that I delayed for ages), then start the second, in the same day.
At the end of this—a bit rambling—reflection on the past month and a half, I still have many questions about AI and humanity. But one has been answered: yes, “AI�, quotes or no quotes, already has changed this field (producing software), and we’ve not seen the end of it, for sure.
I had a peculiar question at work recently, and it went off of a tangent that was way too long and somewhat interesting, so I wanted to share.
The question is: Can you create a set of N-bit numbers (codes), so that
a) Neither is a subset of each other, and
b) Neither is a subset of the OR of two of the others?
Of course, you can trivially do this (e.g., for N=5, choose 10000, 01000, 00100 and so on), but how many can you make for a given N? This is seemingly an open question, but at least I found that they are called (1,2) superimposed codes and have history at least back to this 1964 paper. They present a fairly elegant (but definitely non-optimal) way of constructing them for certain N; let me show an example for N=25:
We start by counting 3-digit numbers (k=3) in base 5 (q=5):
Now we have 5^3 numbers. Let's set out to give them the property that we want.
This code (set of numbers) trivially has distance 1; that is, every number differs from every other number by at least one digit. We'd like to increase that distance so that it is at least as large as k. Reed-Solomon gives us an optimal way of doing that; for every number, we add two checksum digits and R-S will guarantee that the resulting code has distance 3. (Just trust me on this, I guess. It only works for q >= (k+1)/2, though, and q must be a power of an odd prime because otherwise the group theory doesn't work out.)
We now have a set of 5-digit numbers with distance 3. But if we now take any three numbers from this set, there is at least one digit where all three must differ, since the distance is larger than half the number of digits: Two numbers A and B differ from each other in at least 3 of the 5 digits, and A and C also has to differ from each other in at least 3 of the 5 digits. There just isn't room for A and B to be the same in all the places that A differ from C.
To modify this property into the one that we want, we encode each digit into binary using one-hot encoding (00001, 00010, 00100, etc.). Now our 5-digit numbers are 25-bit numbers. And due to the "all different" property in the previous paragraph, we also have our superimposition property; there's at least one 5-bit group where A|B shares no bits with C. So this gives us a 25-bit set with 125 different values and our desired property.
This isn't necessarily an optimal code (and the authors are very clear on that), but it's at least systematic and easy to extend to larger sizes. (I used a SAT solver to extend this to 170 different values, just by keeping the 125 first and asking for 45 more that were not in conflict. 55 more was evidently hard.) The paper has tons more information, including some stuff based on Steiner systems that I haven't tried to understand. And of course, there are tons more later papers, including one by Erdős. :-)
I've applied for an account at OEIS so I can add a sequence for the maximum number of possible codes for each N. It doesn't have many terms known yet, because the SAT solver struggles hard with this (at least in my best formulation), but at least it will give the next person something to find when they are searching. :-)
The Linux kernel has an interesting file descriptor called pidfd. As the name imples, it is a file descriptor to a pid or a specific process. The nice thing about it is that is guaranteed to be for the specific process you expected when you got that pidfd. A process ID, or PID, has no reuse guarantees, which means what you think process 1234 is and what the kernel knows what process 1234 is could be different because your process exited and the process IDs have looped around.
pidfds are *odd*, they’re half a “normal” file descriptor and half… something else. That means some file descriptor things work and some fail in odd ways. stat()
works, but using them in the first parameter of openat()
will fail.
One thing you can do with them is use epoll()
on them to get process status, in fact the pidfd_open()
manual page says:
A PID file descriptor returned by pidfd_open() (or by clone(2) with the CLONE_PID flag) can be used for the following purposes:
…
A PID file descriptor can be monitored using poll(2), select(2), and epoll(7). When the process that it refers to terminates, these interfaces indicate the file descriptor as readable.
So if you want to wait until something terminates, then you can just find the pidfd of the process and sit an epoll_wait()
onto it. Simple, right? Except its not quite true.
procps issue #386 stated that if you had a list of processes, then pidwait only finds half of them. I’d like to thank Steve the issue reporter for the initial work on this. The odd thing is that for every exited process, you get two epoll events. You get an EPOLLIN first, then a EPOLLIN | EPOLLHUP after that. Steve suggested the first was when the process exits, the second when the process has been collected by the parent.
I have a collection of oddball processes, including ones that make zombies. A zombie is a child that has exited but has not been wait()
ed by its parent. In other words, if a parent doesn’t collect its dead child, then the child becomes a zombie. The test program spawns a child, which exits after some seconds. The parent waits longer, calls wait()
waits some more then exits. Running pidwait we can see the following epoll events:
wait()
, then EPOLLIN | EPOLLHUP on the child is triggered.If you want to use epoll()
to know when a process terminates, then you need to decide on what you mean by that:
epoll_ctl()
call to use this instead.A “zombie trigger” (EPOLLIN with no subsequent EPOLLHUP) is a bit tricky to work out. There is no guarantee the two events have to be in the same epoll, especially if the parent is a bit tardy on their wait()
call.
22 June, 2025 07:32AM by dropbear
Terraform 1.9 introduced some time ago the capability to reference in an input variable validation condition other variables, not only the one you're validating.
What does not work is having two variables which validate each other, e.g.
variable "nat_min_ports" {
description = "Minimal amount of ports to allocate for 'min_ports_per_vm'"
default = 32
type = number
validation {
condition = (
var.nat_min_ports >= 32 &&
var.nat_min_ports <= 32768 &&
var.nat_min_ports < var.nat_max_ports
)
error_message = "Must be between 32 and 32768 and less than 'nat_max_ports'"
}
}
variable "nat_max_ports" {
description = "Maximal amount of ports to allocate for 'max_ports_per_vm'"
default = 16384
type = number
validation {
condition = (
var.nat_max_ports >= 64 &&
var.nat_max_ports <= 65536 &&
var.nat_max_ports > var.nat_min_ports
)
error_message = "Must be between 64 and 65536 and above 'nat_min_ports'"
}
}
That let directly to the following rather opaque error message:
Received an error
Error: Cycle: module.gcp_project_network.var.nat_max_ports (validation), module.gcp_project_network.var.nat_min_ports (validation)
Removed the sort of duplicate check var.nat_max_ports > var.nat_min_ports
on
nat_max_ports
to break the cycle.
A few months ago I bought a Intel Arc B580 for the main purpose of getting 8K video going [1]. I had briefly got it working in a test PC but then I wanted to deploy it on my HP z840 that I use as a build server and for playing with ML stuff [2]. I only did brief tests of it previously and this was my first attempt at installing it in a system I use. My plan was to keep the NVidia RTX A2000 in place and run 2 GPUs, that’s not an uncommon desire among people who want to do ML stuff and it’s the type of thing that the z840 is designed for, the machine has slots 2, 4, and 6 being PCIe*16 so it should be able to fit 3 cards that each take 2 slots. So having one full size GPU, the half-height A2000, and a NVMe controller that uses *16 to run four NVMe devices should be easy.
Intel designed the B580 to use every millimeter of space possible while still being able to claim to be a 2 slot card. On the circuit board side there is a plastic cover over the board that takes all the space before the next slot so a 2 slot card can’t go on that side without having it’s airflow blocked. On the other side it takes all the available space so that any card that wants to blow air through can’t fit and also such that a medium size card (such as the card for 4 NVMe devices) would block it’s air flow. So it’s impossible to have a computer with 6 PCIe slots run the B580 as well as 2 other full size *16 cards.
Support for this type of GPU is something vendors like HP should consider when designing workstation class systems. For HP there is no issue of people installing motherboards in random cases (the HP motherboard in question uses proprietary power connectors and won’t even boot with an ATX PSU without significant work). So they could easily design a motherboard and case with a few extra mm of space between pairs of PCIe slots. The cards that are double width are almost always *16 so you could pair up a *16 slot and another slot and have extra space on each side of the pair. I think for most people a system with 6 PCIe slots with a bit of extra space for GPU cooling would be more useful than having 7 PCIe slots. But as HP have full design control they don’t even need to reduce the number of PCIe slots, they could just make the case taller. If they added another 4 slots and increased the case size accordingly it still wouldn’t be particularly tall by the standards of tower cases from the 90s! The z8 series of workstations are the biggest workstations that HP sells so they should design them to do these things. At the time that the z840 was new there was a lot of ML work being done and HP was selling them as ML workstations, they should have known how people would use them and design them accordingly.
So I removed the NVidia card and decided to run the system with just the Arc card, things should have been fine but Intel designed the card to be as high as possible and put the power connector on top. This prevented installing the baffle for directing air flow over the PCIe slots and due to the design of the z840 (which is either ingenious or stupid depending on your point of view) the baffle is needed to secure the PCIe cards in place. So now all the PCIe cards are just secured by friction in the slots, this isn’t an unusual situation for machines I assemble but it’s not something I desired.
This is the first time I’ve felt compelled to write a blog post reviewing a product before even getting it working. But the physical design of the B580 is outrageously impractical unless you are designing your entire computer around the GPU.
As an aside the B580 does look very nice. The plastic surround is very fancy, it’s a pity that it interferes with the operation of the rest of the system.
20 June, 2025 02:02AM by etbe
The tag2upload service has finally gone live for Debian Developers in an open beta.
If you’ve never heard of tag2upload before, here is a great primer presented by Ian Jackson and prepared by Ian Jackson and Sean Whitton.
In short, the world has moved on to hosting and working with source code in Git repositories. In Debian, we work with source packages that are used to generated the binary artifacts that users know as .deb files. In Debian, there is so much tooling and culture built around this. For example, our workflow passes what we call the island test – you could take every source package in Debian along with you to an island with no Internet, and you’ll still be able to rebuild or modify every package. When changing the workflows, you risk losing benefits like this, and over the years there has been a number of different ideas on how to move to a purely or partially git flow for Debian, none that really managed to gain enough momentum or project-wide support.
Tag2upload makes a lot of sense. It doesn’t take away any of the benefits of the current way of working (whether technical or social), but it does make some aspects of Debian packages significantly simpler and faster. Even so, if you’re a Debian Developer and more familiar with how the sausage have made, you’ll have noticed that this has been a very long road for the tag2upload maintainers, they’ve hit multiple speed bumps since 2019, but with a lot of patience and communication and persistence from all involved (and almost even a GR), it is finally materializing.
So, first, I needed to choose which package I want to upload. We’re currently in hard freeze for the trixie release, so I’ll look for something simple that I can upload to experimental.
I chose bundlewrap, it’s quote a straightforward python package, and updates are usually just as straightforward, so it’s probably a good package to work on without having to deal with extra complexities in learning how to use tag2upload.
So, I do the usual uscan and dch -i to update my package…
And then I realise that I still want to build a source package to test it in cowbuilder. Hmm, I remember that Helmut showed me that building a source package isn’t necessary in sbuild, but I have a habit of breaking my sbuild configs somehow, but I guess I should revisit that.
So, I do a dpkg-buildpackage -S -sa and test it out with cowbuilder, because that’s just how I roll (at least for now, fixing my local sbuild setup is yak shaving for another day, let’s focus!).
I end up with a binary that looks good, so I’m satisfied that I can upload this package to the Debian archives. So, time to configure tag2upload.
The first step is to set up the webhook in Salsa. I was surprised two find two webhooks already configured:
I know of KGB that posts to IRC, didn’t know that this was the mechanism it does that by before. Nice! Also don’t know what the tagpending one does, I’ll go look into that some other time.
Configuring a tag2upload webhook is quite simple, add a URL, call the name tag2upload, and select only tag push events:
I run the test webhook, and it returned a code 400 message about a missing ‘message’ header, which the documentation says is normal.
Next, I install git-debpush from experimental.
The wiki page simply states that you can use the git-debpush command to upload, but doesn’t give any examples on how to use it, and its manpage doesn’t either. And when I run just git-debpush I get:
jonathan@lapcloud:~/devel/debian/python-team/bundlewrap/bundlewrap-4.23.1$ git-debpush
git-debpush: check failed: upstream tag upstream/4.22.0 is not an ancestor of refs/heads/debian/master; probably a mistake ('upstream-nonancestor' check)
pristine-tar is /usr/bin/pristine-tar
git-debpush: some check(s) failed; you can pass --force to ignore them
I have no idea what that’s supposed to mean. I was also not sure whether I should tag anything to begin with, or if some part of the tag2upload machinery automatically does it. I think I might have tagged debian/4.23-1 before tagging upstream/4.23 and perhaps it didn’t like it, I reverted and did it the other way around and got a new error message. Progress!
jonathan@lapcloud:~/devel/debian/python-team/bundlewrap/bundlewrap-4.23.1$ git-debpush
git-debpush: could not determine the git branch layout
git-debpush: please supply a --quilt= argument
Looking at the manpage, it looks like –quilt=baredebian matches my package the best, so I try that:
jonathan@lapcloud:~/devel/debian/python-team/bundlewrap/bundlewrap-4.23.1$ git-debpush --quilt=baredebian
Enumerating objects: 70, done.
Counting objects: 100% (70/70), done.
Delta compression using up to 12 threads
Compressing objects: 100% (37/37), done.
Writing objects: 100% (37/37), 8.97 KiB | 2.99 MiB/s, done.
Total 37 (delta 30), reused 0 (delta 0), pack-reused 0 (from 0)
To salsa.debian.org:python-team/packages/bundlewrap.git
6f55d99..3d5498f debian/master -> debian/master
* [new tag] upstream/4.23.1 -> upstream/4.23.1
* [new tag] debian/4.23.1-1_exp1 -> debian/4.23.1-1_exp1
Ooh! That looked like it did something! And a minute later I received the notification of the upload in my inbox:
So, I’m not 100% sure that this makes things much easier for me than doing a dput, but, it’s not any more difficult or more work either (once you know how it works), so I’ll be using git-debpush from now on, and I’m sure as I get more used to the git workflow of doing things I’ll understand more of the benefits. And at last, my one last use case for using FTP is now properly dead. RIP FTP :)
19 June, 2025 07:49PM by jonathan
I’ve been meaning to write a post about this bug for a while, so here it is (before I forget the details!).
First, I’d like to thank a few people:
I’ll probably forget some details because it’s been more than a week
(and life at $DAYJOB
moves fast), but we’ll see.
Wolfi OS takes security seriously, and one of the things we have is a package which sets the hardening compiler flags for C/C++ according to the best practices recommended by OpenSSF. At the time of this writing, these flags are (in GCC’s spec file parlance):
*self_spec:
+ %{!O:%{!O1:%{!O2:%{!O3:%{!O0:%{!Os:%{!0fast:%{!0g:%{!0z:-O2}}}}}}}}} -fhardened -Wno-error=hardened -Wno-hardened %{!fdelete-null-pointer-checks:-fno-delete-null-pointer-checks} -fno-strict-overflow -fno-strict-aliasing %{!fomit-frame-pointer:-fno-omit-frame-pointer} -mno-omit-leaf-frame-pointer
*link:
+ --as-needed -O1 --sort-common -z noexecstack -z relro -z now
The important part for our bug is the usage of -z now
and
-fno-strict-aliasing
.
As I was saying, these flags are set for almost every build, but sometimes things don’t work as they should and we need to disable them. Unfortunately, one of these problematic cases has been glibc.
There was an attempt to enable hardening while building glibc, but that introduced a strange breakage to several of our packages and had to be reverted.
Things stayed pretty much the same until a few weeks ago, when I started working on one of my roadmap items: figure out why hardening glibc wasn’t working, and get it to work as much as possible.
I started off by trying to reproduce the problem. It’s important to mention this because I often see young engineers forgetting to check if the problem is even valid anymore. I don’t blame them; the anxiety to get the bug fixed can be really blinding.
Fortunately, I already had one simple test to trigger the failure.
All I had to do was install the py3-matplotlib
package and then
invoke:
$ python3 -c 'import matplotlib'
This would result in an abortion with a coredump.
I followed the steps above, and readily saw the problem manifesting again. OK, first step is done; I wasn’t getting out easily from this one.
The next step is to actually try to debug the failure. In an ideal world you get lucky and are able to spot what’s wrong after just a few minutes. Or even better: you also can devise a patch to fix the bug and contribute it to upstream.
I installed GDB, and then ran the py3-matplotlib
command inside it.
When the abortion happened, I issued a backtrace
command inside GDB
to see where exactly things had gone wrong. I got a stack trace
similar to the following:
#0 0x00007c43afe9972c in __pthread_kill_implementation () from /lib/libc.so.6
#1 0x00007c43afe3d8be in raise () from /lib/libc.so.6
#2 0x00007c43afe2531f in abort () from /lib/libc.so.6
#3 0x00007c43af84f79d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4 0x00007c43af86d4d8 in _Unwind_RaiseException () from /usr/lib/libgcc_s.so.1
#5 0x00007c43acac9014 in __cxxabiv1::__cxa_throw (obj=0x5b7d7f52fab0, tinfo=0x7c429b6fd218 <typeinfo for pybind11::attribute_error>, dest=0x7c429b5f7f70 <pybind11::reference_cast_error::~reference_cast_error() [clone .lto_priv.0]>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#6 0x00007c429b5ec3a7 in ft2font__getattr__(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [clone .lto_priv.0] [clone .cold] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#7 0x00007c429b62f086 in pybind11::cpp_function::initialize<pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::scope, pybind11::sibling>(pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#1}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0] ()
from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#8 0x00007c429b603886 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
...
Huh. Initially this didn’t provide me with much information. There
was something strange seeing the abort
function being called right
after _Unwind_RaiseException
, but at the time I didn’t pay much
attention to it.
OK, time to expand our horizons a little. Remember when I said that several of our packages would crash with a hardened glibc? I decided to look for another problematic package so that I could make it crash and get its stack trace. My thinking here is that maybe if I can compare both traces, something will come up.
I happened to find an old discussion where Dann Frazier mentioned that Emacs was also crashing for him. He and I share the Emacs passion, and I totally agreed with him when he said that “Emacs crashing is priority -1!” (I’m paraphrasing).
I installed Emacs, ran it, and voilà : the crash happened again. OK, that was good. When I ran Emacs inside GDB and asked for a backtrace, here’s what I got:
#0 0x00007eede329972c in __pthread_kill_implementation () from /lib/libc.so.6
#1 0x00007eede323d8be in raise () from /lib/libc.so.6
#2 0x00007eede322531f in abort () from /lib/libc.so.6
#3 0x00007eede262879d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4 0x00007eede2646e7c in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1
#5 0x00007eede3327b11 in backtrace () from /lib/libc.so.6
#6 0x000059535963a8a1 in emacs_backtrace ()
#7 0x000059535956499a in main ()
Ah, this backtrace is much simpler to follow. Nice.
Hmmm. Now the crash is happening inside _Unwind_Backtrace
. A
pattern emerges! This must have something to do with stack unwinding
(or so I thought… keep reading to discover the whole truth). You
see, the backtrace
function (yes, it’s a function) and C++’s
exception handling mechanism use similar techniques to do their jobs,
and it pretty much boils down to unwinding frames from the stack.
I looked into Emacs’ source code, specifically the emacs_backtrace
function, but could not find anything strange over there. This bug
was probably not going to be an easy fix…
Being able to easily reproduce the bug is awesome and really helps with debugging, but even better is being able to have a minimal reproducer for the problem.
You see, py3-matplotlib
is a huge package and pulls in a bunch of
extra dependencies, so it’s not easy to ask other people to “just
install this big package plus these other dependencies, and then run
this command…”, especially if we have to file an upstream bug and
talk to people who may not even run the distribution we’re using. So
I set up to try and come up with a smaller recipe to reproduce the
issue, ideally something that’s not tied to a specific package from
the distribution.
Having all the information gathered from the initial debug session,
especially the Emacs backtrace, I thought that I could write a very
simple program that just invoked the backtrace
function from glibc
in order to trigger the code path that leads to _Unwind_Backtrace
.
Here’s what I wrote:
#include <execinfo.h>
int
main(int argc, char *argv[])
{
void *a[4096];
backtrace (a, 100);
return 0;
}
After compiling it, I determined that yes, the problem did happen with this small program as well. There was only a small nuisance: the manifestation of the bug was not deterministic, so I had to execute the program a few times until it crashed. But that’s much better than what I had before, and a small price to pay. Having a minimal reproducer pretty much allows us to switch our focus to what really matters. I wouldn’t need to dive into Emacs’ or Python’s source code anymore.
At the time, I was sure this was a glibc bug. But then something else happened.
I had to stop my investigation efforts because something more important came up: it was time to upload GCC 15 to Wolfi. I spent a couple of weeks working on this (it involved rebuilding the whole archive, filing hundreds of FTBFS bugs, patching some programs, etc.), and by the end of it the transition went smooth. When the GCC 15 upload was finally done, I switched my focus back to the glibc hardening problem.
The first thing I did was to… yes, reproduce the bug again. It had been a few weeks since I had touched the package, after all. So I built a hardened glibc with the latest GCC and… the bug did not happen anymore!
Fortunately, the very first thing I thought was “this must be GCC”, so I rebuilt the hardened glibc with GCC 14, and the bug was there again. Huh, unexpected but very interesting.
At this point, I was ready to start some serious debugging. And then I got a message on Signal. It was one of those moments where two minds think alike: Gabriel decided to check how I was doing, and I was thinking about him because this involved glibc, and Gabriel contributed to the project for many years. I explained what I was doing, and he promptly offered to help. Yes, there are more people who love low level debugging!
We spent several hours going through disassembles of certain functions (because we didn’t have any debug information in the beginning), trying to make sense of what we were seeing. There was some heavy GDB involved; unfortunately I completely lost the session’s history because it was done inside a container running inside an ephemeral VM. But we learned a lot. For example:
It was hard to actually understand the full stack trace leading to
uw_init_context_1[cold]
. _Unwind_Backtrace
obviously didn’t
call it (it called uw_init_context_1
, but what was that [cold]
doing?). We had to investigate the disassemble of
uw_init_context_1
in order to determined where
uw_init_context_1[cold]
was being called.
The [cold]
suffix is a GCC function attribute that can be used to
tell the compiler that the function is unlikely to be reached. When
I read that, my mind immediately jumped to “this must be an
assertion”, so I went to the source code and found the spot.
We were able to determine that the return code of
uw_frame_state_for
was 5
, which means _URC_END_OF_STACK
.
That’s why the assertion was triggering.
After finding these facts without debug information, I decided to bite
the bullet and recompiled GCC 14 with -O0 -g3
, so that we could
debug what uw_frame_state_for
was doing. After banging our heads a
bit more, we found that fde
is NULL
at this excerpt:
// ...
fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1,
&context->bases);
if (fde == NULL)
{
#ifdef MD_FALLBACK_FRAME_STATE_FOR
/* Couldn't find frame unwind info for this function. Try a
target-specific fallback mechanism. This will necessarily
not provide a personality routine or LSDA. */
return MD_FALLBACK_FRAME_STATE_FOR (context, fs);
#else
return _URC_END_OF_STACK;
#endif
}
// ...
We’re debugging on amd64, which means that
MD_FALLBACK_FRAME_STATE_FOR
is defined and therefore is called. But
that’s not really important for our case here, because we had
established before that _Unwind_Find_FDE
would never return NULL
when using a non-hardened glibc (or a glibc compiled with GCC 15). So
we decided to look into what _Unwind_Find_FDE
did.
The function is complex because it deals with .eh_frame
, but we
were able to pinpoint the exact location where find_fde_tail
(one of
the functions called by _Unwind_Find_FDE
) is returning NULL
:
if (pc < table[0].initial_loc + data_base)
return NULL;
We looked at the addresses of pc
and table[0].initial_loc + data_base
, and found that the former fell within libgcc’s text
section, which the latter fell within /lib/ld-linux-x86-64.so.2
text.
At this point, we were already too tired to continue. I decided to keep looking at the problem later and see if I could get any further.
The next day, I woke up determined to find what changed in GCC 15 that
caused the bug to disappear. Unless you know GCC’s internals like
they are your own home (which I definitely don’t), the best way to do
that is to git bisect
the commits between GCC 14 and 15.
I spent a few days running the bisect. It took me more time than I’d
have liked to find the right range of commits to pass git bisect
(because of how branches and tags are done in GCC’s repository), and I
also had to write some helper scripts that:
gcc.yaml
package definition to make it build with the
commit being bisected.At the end, I had a commit to point to:
commit 99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener <rguenther@suse.de>
Date: Fri May 3 14:04:41 2024 +0200
tree-optimization/114589 - remove profile based sink heuristics
Makes sense, right?! No? Well, it didn’t for me either. Even after reading what was changed in the code and the upstream bug fixed by the commit, I was still clueless as to why this change “fixed” the problem (I say “fixed” because it may very well be an unintended consequence of the change, and some other problem might have been introduced).
After obtaining the commit that possibly fixed the bug, while talking to Dann and explaining what I did, he suggested that I should file an upstream bug and check with them. Great idea, of course.
I filed the following upstream bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120653
It’s a bit long, very dense and complex, but ultimately upstream was able to find the real problem and have a patch accepted in just two days. Nothing like knowing the code base. The initial bug became:
https://sourceware.org/bugzilla/show_bug.cgi?id=33088
In the end, the problem was indeed in how the linker defines
__ehdr_start
, which, according to the code (from
elf/dl-support.c
):
if (_dl_phdr == NULL)
{
/* Starting from binutils-2.23, the linker will define the
magic symbol __ehdr_start to point to our own ELF header
if it is visible in a segment that also includes the phdrs.
So we can set up _dl_phdr and _dl_phnum even without any
information from auxv. */
extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
_dl_phdr = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
_dl_phnum = __ehdr_start.e_phnum;
}
But the following definition is the problematic one (from elf/rtld.c
):
extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
This symbol (along with its counterpart, __ehdr_end
) was being
run-time relocated when it shouldn’t be. The fix that was pushed
added optimization barriers to prevent the compiler from doing the
relocations.
I don’t claim to fully understand what was done here, and Jakub’s analysis is a thing to behold, but in the end I was able to confirm that the patch fixed the bug. And in the end, it was indeed a glibc bug.
This was an awesome bug to investigate. It’s one of those that deserve a blog post, even though some of the final details of the fix flew over my head.
I’d like to start blogging more about these sort of bugs, because I’ve encountered my fair share of them throughout my career. And it was great being able to do some debugging with another person, exchange ideas, learn things together, and ultimately share that deep satisfaction when we find why a crash is happening.
I have at least one more bug in my TODO list to write about (another one with glibc, but this time I was able to get to the end of it and come up with a patch). Stay tunned.
P.S.: After having published the post I realized that I forgot to
explain why the -z now
and -fno-strict-aliasing
flags were
important.
-z now
is the flag that I determined to be the root cause of the
breakage. If I compiled glibc with every hardening flag except -z now
, everything worked. So initially I thought that the problem had
to do with how ld.so
was resolving symbols at runtime. As it turns
out, this ended up being more a symptom than the real cause of the
bug.
As for -fno-strict-aliasing
, a Gentoo developer who commented on the
GCC bug above mentioned that this OpenSSF bug had a good point against
using this flag for hardening. I still have to do a deep dive on what
was discussed in the issue, but this is certainly something to take
into consideration. There’s this very good write-up about strict
aliasing in general if you’re interested in understanding it better.
Everybody is trying out AI assistants these days, so I figured I'd jump on that train and see how fast it derails.
I went with CodeRabbit because I've seen it on YouTube — ads work, I guess.
I am trying to answer the following questions:
To reduce the amount of output and not to confuse contributors, CodeRabbit was configured to only do reviews on demand.
What follows is a rather unscientific evaluation of CodeRabbit based on PRs in two Foreman-related repositories, looking at the summaries CodeRabbit posted as well as the comments/suggestions it had about the code.
PR: theforeman/foreman-ansible-modules#1848
The summary CodeRabbit posted is technically correct.
This update introduces several changes across CI configuration, Ansible roles, plugins, and test playbooks. It expands CI test coverage to a new Ansible version, adjusts YAML key types in test variables, refines conditional logic in Ansible tasks, adds new default variables, and improves clarity and consistency in playbook task definitions and debug output.
Yeah, it does all of that, all right. But it kinda misses the point that the addition here is "Ansible 2.19 support", which starts with adding it to the CI matrix and then adjusting the code to actually work with that version. Also, the changes are not for "clarity" or "consistency", they are fixing bugs in the code that the older Ansible versions accepted, but the new one is more strict about.
Then it adds a table with the changed files and what changed in there. To me, as the author, it felt redundant, and IMHO doesn't add any clarity to understand the changes. (And yes, same "clarity" vs bugfix mistake here, but that makes sense as it apparently miss-identified the change reason)
And then the sequence diagrams… They probably help if you have a dedicated change to a library or a library consumer, but for this PR it's just noise, especially as it only covers two of the changes (addition of 2.19 to the test matrix and a change to the inventory plugin), completely ignoring other important parts.
Overall verdict: noise, don't need this.
CodeRabbit also posted 4 comments/suggestions to the changes.
result.task
IMHO a valid suggestion, even if on the picky side as I am not sure how to make it undefined here. I ended up implementing it, even if with slightly different (and IMHO better readable) syntax.
when
for composite CV versionsThat one was funny! The original complaint was that the when
condition used slightly different data manipulation than the data that was passed when the condition was true
.
The code was supposed to do "clean up the data, but only if there are any items left after removing the first 5, as we always want to keep 5 items".
And I do agree with the analysis that it's badly maintainable code. But the suggested fix was to re-use the data in the variable we later use for performing the cleanup. While this is (to my surprise!) valid Ansible syntax, it didn't make the code much more readable as you need to go and look at the variable definition.
The better suggestion then came from Ewoud: to compare the length of the data with the number we want to keep. Humans, so smart!
But Ansible is not Ewoud's native turf, so he asked whether there is a more elegant way to count how much data we have than to use | list | count
in Jinja (the data comes from a Python generator, so needs to be converted to a list
first).
And the AI helpfully suggested to use | count
instead!
However, count
is just an alias for length
in Jinja, so it behaves identically and needs a list
.
Luckily the AI quickly apologized for being wrong after being pointed at the Jinja source and didn't try to waste my time any further.
Wouldn't I have known about the count
alias, we'd have committed that suggestion and let CI fail before reverting again.
The very same complaint was posted a few lines later, as the logic there is very similar — just slightly different data to be filtered and cleaned up.
Interestingly, here the suggestion also was to use the variable. But there is no variable with the data!
The text actually says one need to "define" it, yet the "committable suggestion" doesn't contain that part.
Interestingly, when asked where it sees the "inconsistency" in that hunk, it said the inconsistency is with the composite case above. That however is nonsense, as while we want to keep the same number of composite and non-composite CV versions, the data used in the task is different — it even gets consumed by a totally different playbook — so there can't be any real consistency between the branches.
I ended up applying the same logic as suggested by Ewoud above. As that refactoring was possible in a consistent way.
One of the changes in Ansible 2.19 is that Ansible fails when there are undefined variables, even if they are only undefined for cases where they are unused.
CodeRabbit complains that the names of the defaults I added are inconsistent. And that is technically correct. But those names are already used in other places in the code, so I'd have to refactor more to make it work properly.
Once being pointed at the fact that the variables already exist, the AI is as usual quick to apologize, yay.
PR: theforeman/foreman-ansible-modules#1860
Again, the summary is technically correct
The repository module was updated to support additional parameters for repository synchronization and authentication. New options were added for ansible collections, ostree, Python packages, and yum repositories, including authentication tokens, filtering controls, and version retention settings. All changes were limited to module documentation and argument specification.
But it doesn't add anything you'd not get from looking at the diff, especially as it contains a large documentation chunk explaining those parameters.
No sequence diagram this time. That's a good thing!
Overall verdict: noise (even if the amount is small), don't need this.
CodeRabbit generated two comments for this PR.
Interestingly, none of them overlapped with the issues ansible-lint
and friends found.
Yepp, that's fair
Yepp, I forgot these (not intentionally!).
The diff it suggests is nonsense, as it doesn't take into account the existing Ansible and Yum validations, but it clearly has read them as the style etc of the new ones matches. It also managed to group the parameters correctly by repository type, so it's something.
if module.foreman_params['content_type'] != 'ansible_collection': invalid_list = [key for key in ['ansible_collection_requirements'] if key in module.foreman_params] if invalid_list: module.fail_json(msg="({0}) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list))) + +# Validate ansible_collection specific parameters +if module.foreman_params['content_type'] != 'ansible_collection': + invalid_list = [key for key in ['ansible_collection_auth_token', 'ansible_collection_auth_url'] if key in module.foreman_params] + if invalid_list: + module.fail_json(msg="({0}) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list))) + +# Validate ostree specific parameters +if module.foreman_params['content_type'] != 'ostree': + invalid_list = [key for key in ['depth', 'exclude_refs', 'include_refs'] if key in module.foreman_params] + if invalid_list: + module.fail_json(msg="({0}) can only be used with content_type 'ostree'".format(",".join(invalid_list))) + +# Validate python package specific parameters +if module.foreman_params['content_type'] != 'python': + invalid_list = [key for key in ['excludes', 'includes', 'package_types', 'keep_latest_packages'] if key in module.foreman_params] + if invalid_list: + module.fail_json(msg="({0}) can only be used with content_type 'python'".format(",".join(invalid_list))) + +# Validate yum specific parameter +if module.foreman_params['content_type'] != 'yum' and 'upstream_authentication_token' in module.foreman_params: + module.fail_json(msg="upstream_authentication_token can only be used with content_type 'yum'")
Interestingly, it also said "Note: If 'python' is not a valid content_type, please adjust the validation accordingly." which is quite a hint at a bug in itself.
The module currently does not even allow to create content_type=python
repositories.
That should have been more prominent, as it's a BUG!
Mostly correct.
It did miss-interpret the change to a test playbook as an actual "behavior" change: "Introduced new playbook variables for database configuration" — there is no database configuration in this repository, just the test playbook using the same metadata as a consumer of the library. Later on it does say "Playbook metadata and test fixtures", so… unclear whether this is a miss-interpretation or just badly summarized. As long as you also look at the diff, it won't confuse you, but if you're using the summary as the sole source of information (bad!) it would.
This time the sequence diagram is actually useful, yay. Again, not 100% accurate: it's missing the fact that saving the parameters is hidden behind an "if enabled" flag — something it did represent correctly for loading them.
Overall verdict: not really useful, don't need this.
Here I was a bit surprised, especially as the nitpicks were useful!
My original code used os.environ.get('OBSAH_PERSIST_PATH', '/var/lib/obsah/parameters.yaml')
for the location of the persistence file.
CodeRabbit correctly pointed out that this won't work for non-root users and one should respect XDG_STATE_HOME
.
Ewoud did point that out in his own review, so I am not sure whether CodeRabbit came up with this on its own, or also took the human comments into account.
The suggested code seems fine too — just doesn't use /var/lib/obsah
at all anymore.
This might be a good idea for the generic library we're working on here, and then be overridden to a static /var/lib
path in a consumer (which always runs as root).
In the end I did not implement it, but mostly because I was lazy and was sure we'd override it anyway.
The library allows you to generate both positional (foo
without --
) and non-positional (--foo
) parameters, but the code I wrote would only ever persist non-positional parameters.
This was intentional, but there is no documentation of the intent in a comment — which the rabbit thought would be worth pointing out.
It's a fair nitpick and I ended up adding a comment.
database_host
The library has a way to perform type checking on passed parameters, and one of the supported types is "FQDN" — so a fully qualified domain name, with dots and stuff.
The test playbook I added has a database_host
variable, but I didn't bother adding a type to it, as I don't really need any type checking here.
While using "FQDN" might be a bit too strict here — technically a working database connection can also use a non-qualified name or an IP address, I was positively surprised by this suggestion. It shows that the rest of the repository was taken into context when preparing the suggestion.
reset_args()
can raise AttributeError
when a key is absentThis is a correct finding, the code is not written in a way that would survive if it tries to reset things that are not set.
However, that's only true for the case where users pass in --reset-<parameter>
without ever having set parameter
before.
The complaint about the part where the parameter is part of the persisted set but not in the parsed args is wrong — as parsed args inherit from the persisted set.
The suggested code is not well readable, so I ended up fixing it slightly differently.
argparse
type validationWhen persisting, I just yaml.safe_dump
the parsed parameters, which means the YAML will contain native types like integers.
The argparse
documentation warns that the type checking argparse
does only applies to strings and is skipped if you pass anything else (via default values).
While correct, it doesn't really hurt here as the persisting only happens after the values were type-checked. So there is not really a reason to type-check them again. Well, unless the type changes, anyway.
Not sure what I'll do with this comment.
contextlib.suppress
This was added when I asked CodeRabbit for a re-review after pushing some changes.
Interestingly, the PR already contained try: … except: pass
code before, and it did not flag that.
Also, the code suggestion contained import contextlib
in the middle of the code, instead in the head of the file.
Who would do that?!
But the comment as such was valid, so I fixed it in all places it is applicable, not only the one the rabbit found.
PR: theforeman/foreman-ansible-modules#1867
A workaround was added to the _update_entity method in the ForemanAnsibleModule class to ensure that when updating a host, both content_view_id and lifecycle_environment_id are always included together in the update payload. This prevents partial updates that could cause inconsistencies.
Partial updates are not a thing.
The workaround is purely for the fact that Katello expects both parameters to be sent, even if only one of them needs an actual update.
No diagram, good.
Overall verdict: misleading summaries are bad!
Given a small patch, there was only one comment.
This reads correct on the first glance. More error handling is always better, right?
But if you dig into the argumentation, you see it's wrong. Either:
The AI accepted defeat once I asked it to analyze things in more detail, but why did I have to ask in the first place?!
Well, idk, really.
Yes. It's debatable whether these were useful (see e.g. the database_host
example), but I tend to be in the "better to nitpick/suggest more and dismiss than oversee" team, so IMHO a positive win.
In my opinion it did not. The summaries were either "lots of words, no real value" or plain wrong. The sequence diagrams were not useful either.
Luckily all of that can be turned off in the settings, which is what I'd do if I'd continue using it.
While the actual patches it posted were "meh" at best, there were useful findings that resulted in improvements to the code.
Absolutely! The whole Jinja discussion would have been easier without the AI "help". Same applies for the "error handling" in the workaround PR.
The output is certainly a lot, so yes I think it can be distracting. As mentioned, I think dropping the summaries can make the experience less distracting.
I will disable the summaries for the repositories, but will leave the @coderabbitai review
trigger active if someone wants an AI-assisted review.
This won't be something that I'll force on our contributors and maintainers, but they surely can use it if they want.
But I don't think I'll be using this myself on a regular basis.
Yes, it can be made "usable". But so can be vim
;-)
Also, I'd prefer to have a junior human asking all the questions and making bad suggestions, so they can learn from it, and not some planet burning machine.
17 June, 2025 03:19PM by evgeni
The Internet has changed a lot in the last 40+ years. Fads have come and gone. Network protocols have been designed, deployed, adopted, and abandoned. Industries have come and gone. The types of people on the internet have changed a lot. The number of people on the internet has changed a lot, creating an information medium unlike anything ever seen before in human history. There’s a lot of good things about the Internet as of 2025, but there’s also an inescapable hole in what it used to be, for me.
I miss being able to throw a site up to send around to friends to play with without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my website, costing me thousands in network transfer for the privilege. I miss being able to put a lightly authenticated game server up and not worry too much at night – wondering if that process is now mining bitcoin. I miss being able to run a server in my home closet. Decades of cat and mouse games have rendered running a mail server nearly impossible. Those who are “brave” enough to try are met with weekslong stretches of delivery failures and countless hours yelling ineffectually into a pipe that leads from the cheerful lobby of some disinterested corporation directly into a void somewhere 4 layers below ground level.
I miss the spirit of curiosity, exploration, and trying new things. I miss building things for fun without having to worry about being too successful, after which “security” offices start demanding my supplier paperwork in triplicate as heartfelt thanks from their engineering teams. I miss communities that are run because it is important to them, not for ad revenue. I miss community operated spaces and having more than four websites that are all full of nothing except screenshots of each other.
Every other page I find myself on now has an AI generated click-bait title, shared for rage-clicks all brought-to-you-by-our-sponsors–completely covered wall-to-wall with popup modals, telling me how much they respect my privacy, with the real content hidden at the bottom bracketed by deceptive ads served by companies that definitely know which new coffee shop I went to last month.
This is wrong, and those who have seen what was know it.
I can’t keep doing it. I’m not doing it any more. I reject the notion that this is as it needs to be. It is wrong. The hole left in what the Internet used to be must be filled. I will fill it.
Throughout the 2000s, some of my favorite memories were from LAN parties at my friends’ places. Dragging your setup somewhere, long nights playing games, goofing off, even building software all night to get something working—being able to do something fiercely technical in the context of a uniquely social activity. It wasn’t really much about the games or the projects—it was an excuse to spend time together, just hanging out. A huge reason I learned so much in college was that campus was a non-stop LAN party – we could freely stand up servers, talk between dorms on the LAN, and hit my dorm room computer from the lab. Things could go from individual to social in the matter of seconds. The Internet used to work this way—my dorm had public IPs handed out by DHCP, and my workstation could serve traffic from anywhere on the internet. I haven’t been back to campus in a few years, but I’d be surprised if this were still the case.
In December of 2021, three of us got together and connected our houses together in what we now call The Promised LAN. The idea is simple—fill the hole we feel is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a space that is intrinsically social, even though we’re doing technical things. We can freely host insecure game servers or one-off side projects without worrying about what someone will do with it.
Over the years, it’s evolved very slowly—we haven’t pulled any all-nighters. Our mantra has become “old growth”, building each layer carefully. As of May 2025, the LAN is now 19 friends running around 25 network segments. Those 25 networks are connected to 3 backbone nodes, exchanging routes and IP traffic for the LAN. We refer to the set of backbone operators as “The Bureau of LAN Management”. Combined decades of operating critical infrastructure has driven The Bureau to make a set of well-understood, boring, predictable, interoperable and easily debuggable decisions to make this all happen. Nothing here is exotic or even technically interesting.
The hardest part, however, is rejecting the idea that anything outside our own LAN is untrustworthy—nearly irreversible damage inflicted on us by the Internet. We have solved this by not solving it. We strictly control membership—the absolute hard minimum for joining the LAN requires 10 years of friendship with at least one member of the Bureau, with another 10 years of friendship planned. Members of the LAN can veto new members even if all other criteria is met. Even with those strict rules, there’s no shortage of friends that meet the qualifications—but we are not equipped to take that many folks on. It’s hard to join—-both socially and technically. Doing something malicious on the LAN requires a lot of highly technical effort upfront, and it would endanger a decade of friendship. We have relied on those human, social, interpersonal bonds to bring us all together. It’s worked for the last 4 years, and it should continue working until we think of something better.
We assume roommates, partners, kids, and visitors all have access to The Promised LAN. If they’re let into our friends' network, there is a level of trust that works transitively for us—I trust them to be on mine. This LAN is not for “security”, rather, the network border is a social one. Benign “hacking”—in the original sense of misusing systems to do fun and interesting things—is encouraged. Robust ACLs and firewalls on the LAN are, by definition, an interpersonal—not technical—failure. We all trust every other network operator to run their segment in a way that aligns with our collective values and norms.
Over the last 4 years, we’ve grown our own culture and fads—around half of the people on the LAN have thermal receipt printers with open access, for printing out quips or jokes on each other’s counters. It’s incredible how much network transport and a trusting culture gets you—there’s a 3-node IRC network, exotic hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and even a SIP phone network of “redphones”.
We do not wish to, nor will we, rebuild the internet. We do not wish to, nor will we, scale this. We will never be friends with enough people, as hard as we may try. Participation hinges on us all having fun. As a result, membership will never be open, and we will never have enough connected LANs to deal with the technical and social problems that start to happen with scale. This is a feature, not a bug.
This is a call for you to do the same. Build your own LAN. Connect it with friends’ homes. Remember what is missing from your life, and fill it in. Use software you know how to operate and get it running. Build slowly. Build your community. Do it with joy. Remember how we got here. Rebuild a community space that doesn’t need to be mediated by faceless corporations and ad revenue. Build something sustainable that brings you joy. Rebuild something you use daily.
Bring back what we’re missing.
Took some time yesterday to upload the current state of what will be at some point vym 3 to experimental. If you're a user of this tool you can give it a try, but be aware that the file format changed, and can't be processed with vym releases before 2.9.500! Thus it's important to create a backup until you're sure that you're ready to move on. On the technical side this is also the switch from Qt5 to Qt6.
I was not aware that one can write bad Markdown, since Markdown has such a simple syntax, that I thought you just write, and it’s fine. Naïve, I know!
I’ve started editing the files for this blog/site with Visual Studio Code too, and I had from another project the markdown lint extension installed, so as I was opening old files, more and more problems appeared. On a whim, I searched and found the “lint all files� command, and after running it, oops—more than 400 problems!
Now, some of them were entirely trivial and a matter of subjective style, like mixing both underscore and asterisk for emphasis in a single file, and asterisks and dashes for list items. Others, seemingly trivial like tab indentation, were actually also causing rendering issues, so fixing that solved a real cosmetic issue.
But some of the issues flagged were actual problems. For example, one sentence that I had, was:
Here “something� was interpreted as an (invalid) HTML tag, and not rendered at all.
Another problem, but more minor, was that I had links to Wikipedia with spaces in the link name, which Visual Studio Code breaks at first space, rather than encoded spaces or underscores-based, as Wikipedia generates today. In the rendered output, Pandoc seemed to do the right think though.
However, the most interesting issue that was flagged was no details in HTML links, i.e. links of the form:
Which works for non-visually impaired people, but not for people using assistive technologies. And while trying to fix this, it turns out that you can do much better, for everyone, because “here� is really non-descriptive. You can use either the content as label (“an article about configuring BIND�), or the destination (“an article on this-website�), rather than the plain “here�.
The only, really only check I disabled, was tweaking the trailing punctuation checks in headers, as I really like to write a header that ends with exclamation marks. I like exclamation marks in general! So why not use them in headers too. The question mark is allowlisted by default, though that I use rarely.
During the changes/tweaks, I also did random improvements, but I didn’t change the updated tag, since most of them were minor. But a non-minor thing was tweaking the CSS for code blocks, since I had a really stupid non-symmetry between top and bottom padding (5px vs 0), and which I don’t know where it came from. But the MDN article on padding has as an example exactly what I had (except combined, I had it split). Did I just copy blindly? Possible…
So, all good and then, and I hope this doesn’t trigger a flow of updates on any aggregators, since all the changes were really trivial. And while I don’t write often, I did touch about 60 posts or pages, ouch! Who knew that changing editors can have such a large impact 😆
Welcome to post 50 in the R4 series.
Today we reconnect to a previous post, namely #36 on pub/sub for live market monitoring with R and Redis. It introduced both Redis as well as the (then fairly recent) extensions to RcppRedis to support the publish-subscibe (“pub/sub”) model of Redis. In short, it manages both subscribing clients as well as producer for live, fast and lightweight data transmission. Using pub/sub is generally more efficient than the (conceptually simpler) ‘poll-sleep’ loops as polling creates cpu and network load. Subscriptions are lighterweight as they get notified, they are also a little (but not much!) more involved as they require a callback function.
We should mention that Redis has a recent fork in Valkey that arose when the former did one of these non-uncommon-among-db-companies licenuse suicides—which, happy to say, they reversed more recently—so that we now have both the original as well as this leading fork (among others). Both work, the latter is now included in several Linux distros, and the C library hiredis used to connect to either is still licensed permissibly as well.
All this came about because Yahoo! Finance recently had another ‘hickup’ in which they changed something leading to some data clients having hiccups. This includes GNOME applet Stocks Extension I had been running. There is a lively discussion on its issue #120 suggestions for example a curl wrapper (which then makes each access a new system call).
Separating data acquisition and presentation becomes an attractive alternative, especially given how the standard Python and R accessors to the Yahoo! Finance service continued to work (and how per post #36 I already run data acquisition). Moreoever, and somewhat independently, it occurred to me that the cute (and both funny in its pun, and very pretty in its display) ActivateLinux program might offer an easy-enough way to display updates on the desktop.
There were two aspects to address. First, the subscription side
needed to be covered in either plain C or C++. That, it turns out, is
very straightforward and there are existing documentation and prior
examples (e.g. at StackOverflow) as well as the ability to have an LLM
generate a quick stanza as I did with Claude. A modified variant is now
in the example
repo ‘redis-pubsub-examples’ in file subscriber.c.
It is deliberately minimal and the directory does not even have a
Makefile
: just compile and link against both
libevent
(for the event loop controlling this) and
libhiredis
(for the Redis or Valkey connection). This
should work on any standard Linux (or macOS) machine with those two
(very standard) libraries installed.
The second aspect was trickier. While we can get Claude to modify the
program to also display under x11, it still uses a single controlling
event loop. It took a little bit of probing on my event to understand
how to modify (the x11 use of) ActivateLinux,
but as always it was reasonably straightforward in the end: instead of
one single while
loop awaiting events we now first check
for pending events and deal with them if present but otherwise do not
idle and wait but continue … in another loop that also checks on the Redis or Valkey “pub/sub” events. So two thumbs up
to vibe coding
which clearly turned me into an x11-savvy programmer too…
The result is in a new (and currently fairly bare-bones) repo almm. It includes all
files needed to build the application, borrowed with love from ActivateLinux
(which is GPL-licensed, as is of course our minimal extension) and adds
the minimal modifications we made, namely linking with
libhiredis
and some minimal changes to
x11/x11.c
. (Supporting wayland as well is on the TODO list,
and I also need to release a new RcppRedis version
to CRAN as one currently needs
the GitHub version.)
We also made a simple mp4 video with a sound overlay which describes the components briefly:
Comments and questions welcome. I will probably add a little bit of
command-line support to the almm. Selecting the
symbol subscribed to is currently done in the most minimal way via
environment variable SYMBOL
(NB: not SYM
as
the video using the default value shows). I also worked out how to show
the display only one of my multiple monitors so I may add an explicit
screen id selector too. A little bit of discussion (including minimal Docker use around r2u) is also in issue
#121 where I first floated the idea of having StocksExtension
listen to Redis (or Valkey). Other suggestions are most
welcome, please use issue tickets at the almm repository.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.
I have a few pictures on this blog, mostly in earlier years, because even with
small pictures, the git repository became 80MiB soon—this is not much in
absolute terms, but the actual Markdown/Haskell/CSS/HTML total size is tiny
compared to the picture, PDFs and fonts. I realised I need a better solution,
probably about ten years ago, and that I should investigate
git-annex. Then time passed, and I heard
about git-lfs
, so I thought that’s the way forward.
Now, I recently got interested again into doing something about this repository, and started researching.
git-lfs
I was sure that git-lfs
, being supported by large providers, would be the
modern solution. But to my surprise, git-lfs
is very server centric, which in
hindsight makes sense, but for a home setup, it’s not very good. Maybe I
misunderstood, but git-lfs
is more a protocol/method for a forge to store
files, rather than an end-user solution. But then you need to backup those files
separately (together with the rest of the forge), or implement another way of
safeguarding them.
Further details such as the fact that it keeps two copies of the files (one in the actual checked-out tree, one in internal storage) means it’s not a good solution. Well, for my blog yes, but not in general. Then posts on Reddit about horror stories—people being locked out of github due to quota, as an example, or this Stack Overflow post about git-lfs constraining how one uses git, convinced me that’s not what I want. To each their own, but not for me—I might want to push this blog’s repo to github, but I definitely wouldn’t want in that case to pay for github storage for my blog images (which are copies, not originals). And yes, even in 2025, those quotas are real—GitHub limits—and I agree with GitHub, storage and large bandwidth can’t be free.
git-annex
So back to git-annex
. I thought it’s going to be a simple thing, but oh boy,
was I wrong. It took me half a week of continuous (well, in free time) reading
and discussions with LLMs to understand a bit how it works. I think, honestly,
it’s a bit too complex, which is why the workflows
page lists seven (!) levels of
workflow complexity, from fully-managed, to fully-manual. IMHO, respect to the
author for the awesome tool, but if you need a web app to help you manage git,
it hints that the tool is too complex.
I made the mistake of running git annex sync
once, to realise it actually
starts pushing to my upstream repo and creating new branches and whatnot, so
after enough reading, I settled on workflow 6/7, since I don’t want another tool
to manage my git history. Maybe I’m an outlier here, but everything “automatic�
is a bit too much for me.
Once you do managed yourself how git-annex works (on the surface, at least), it
is a pretty cool thing. It uses a git-annex
git branch to store
metainformation, and that is relatively clean. If you do run git annex sync
,
it creates some extra branches, which I don’t like, but meh.
One of the most confusing things about git-annex was understanding its “remote� concept. I thought a “remote� is a place where you replicate your data. But not, that’s a special remote. A normal remote is a git remote, but which is expected to be git/ssh/with command line access. So if you have a git+ssh remote, git-annex will not only try to push it’s above-mentioned branch, but also copy the files. If such a remote is on a forge that doesn’t support git-annex, then it will complain and get confused.
Of course, if you read the extensive docs, you just do git config remote.<name>.annex-ignore true
, and it will understand that it should not
“sync� to it.
But, aside, from this case, git-annex expects that all checkouts and clones of the repository are both metadata and data. And if you do any annex commands in them, all other clones will know about them! This can be unexpected, and you find people complaining about it, but nowadays there’s a solution:
This is important. Any “leaf� git clone must be followed by that annex.private true
config, especially on CI/CD machines. Honestly, I don’t understand why
by default clones should be official data stores, but it is what it is.
I settled on not making any of my checkouts “stable�, but only the actual storage places. Except those are not git repositories, but just git-annex storage things. I.e., special remotes.
Is it confusing enough yet ? 😄
The special remotes, as said, is what I expected to be the normal git annex remotes, i.e. places where the data is stored. But well, they exist, and while I’m only using a couple simple ones, there is a large number of them. Among the interesting ones: git-lfs, a remote that allows also storing the git repository itself (git-remote-annex), although I’m bit confused about this one, and most of the common storage providers via the rclone remote.
Plus, all of the special remotes support encryption, so this is a really neat way to store your files across a large number of things, and handle replication, number of copies, from which copy to retrieve, etc. as you with.
git-annex has tons of other features, so to some extent, the sky’s the limit. Automatic selection of what to add git it vs plain git, encryption handling, number of copies, clusters, computed files, etc. etc. etc. I still think it’s cool but too complex, though!
Aside from my blog post, of course.
I’ve seen blog posts/comments about people using git-annex to track/store their photo collection, and I could see very well how the remote encrypted repos—any of the services supported by rclone could be an N+2 copy or so. For me, tracking photos would be a bit too tedious, but it could maybe work after more research.
A more practical thing would probably be replicating my local movie collection (all legal, to be clear) better than “just run rsync from time to time� and tracking the large files in it via git-annex. That’s an exercise for another day, though, once I get more mileage with it - my blog pictures are copies, so I don’t care much if they get lost, but movies are primary online copies, and I don’t want to re-dump the discs. Anyway, for later.
Migrating here means ending in a state where all large files are in git-annex, and the plain git repo is small. Just moving the files to git annex at the current head doesn’t remove them from history, so your git repository is still large; it won’t grow in the future, but remains with old size (and contains the large files in its history).
In my mind, a nice migration would be: run a custom command, and all the history
is migrated to git-annex, so I can go back in time and the still use git-annex.
I naïvely expected this would be easy and already available, only to find
comments on the git-annex site with unsure git-filter-branch
calls and some
web discussions. This is the
discussion
on the git annex website, but it didn’t make me confident it would do the right
thing.
But that discussion is now 8 years old. Surely in 2025, with git-filter-repo
,
it’s easier? And, maybe I’m missing something, but it is not. Not from the point
of view of plain git, that’s easy, but because interacting with git-annex, which
stores its data in git itself, so doing this properly across successive steps of
a repo (when replaying the commits) is, I think, not well defined behaviour.
So I was stuck here for a few days, until I got an epiphany: As I’m going to rewrite the repository, of course I’m keeping a copy of it from before git-annex. If so, I don’t need the history, back in time, to be correct in the sense of being able to retrieve the binary files too. It just needs to be correct from the point of view of the actual Markdown and Haskell files that represent the “meat� of the blog.
This simplified the problem a lot. At first, I wanted to just skip these files, but this could also drop commits (git-filter-repo, by default, drops the commits if they’re empty), and removing the files loses information - when they were added, what were the paths, etc. So instead I came up with a rather clever idea, if I might say so: since git-annex replaces files with symlinks already, just replace the files with symlinks in the whole history, except symlinks that are dangling (to represent the fact that files are missing). One could also use empty files, but empty files are more “valid� in a sense than dangling symlinks, hence why I settled on those.
Doing this with git-filter-repo is easy, in newer versions, with the
new --file-info-callback
. Here is the simple code I used:
import os
import os.path
import pathlib
SKIP_EXTENSIONS={'jpg', 'jpeg', 'png', 'pdf', 'woff', 'woff2'}
FILE_MODES = {b"100644", b"100755"}
SYMLINK_MODE = b"120000"
fas_string = filename.decode()
path = pathlib.PurePosixPath(fas_string)
ext = path.suffix.removeprefix('.')
if ext not in SKIP_EXTENSIONS:
return (filename, mode, blob_id)
if mode not in FILE_MODES:
return (filename, mode, blob_id)
print(f"Replacing '{filename}' (extension '.{ext}') in {os.getcwd()}")
symlink_target = '/none/binary-file-removed-from-git-history'.encode()
new_blob_id = value.insert_file_with_contents(symlink_target)
return (filename, SYMLINK_MODE, new_blob_id)
This goes and replaces files with a symlink to nowhere, but the symlink should explain why it’s dangling. Then later renames or moving the files around work “naturally�, as the rename/mv doesn’t care about file contents. Then, when the filtering is done via:
It is easy to onboard to git annex:
git annex add
on those filesFor me it was easy as all such files were in a few directories, so just copying those directories back, a few git-annex add commands, and done.
Of course, then adding a few rsync remotes, git annex copy --to
, and the
repository was ready.
Well, I also found a bug in my own Hakyll setup: on a fresh clone, when the large files are just dangling symlinks, the builder doesn’t complain, just ignores the images. Will have to fix.
This is a blog that I read at the beginning, and I found it very useful as an intro: https://switowski.com/blog/git-annex/. It didn’t help me understand how it works under the covers, but it is well written. The author does use the ‘sync’ command though, which is too magic for me, but also agrees about its complexity 😅
And now, for the actual first image to be added that never lived in the old plain git repository. It’s not full-res/full-size, it’s cropped a bit on the bottom.
Earlier in the year, I went to Paris for a very brief work trip, and I walked around a bit—it was more beautiful than what I remembered from way way back. So a bit random selection of a picture, but here it is:
Enjoy!
This post is a review for Computing Reviews for Understanding Misunderstandings - Evaluating LLMs on Networking Questions , a article published in Association for Computing Machinery (ACM), SIGCOMM Computer Communication Review
Large language models (LLMs) have awed the world, emerging as the fastest-growing application of all time–ChatGPT reached 100 million active users in January 2023, just two months after its launch. After an initial cycle, they have gradually been mostly accepted and incorporated into various workflows, and their basic mechanics are no longer beyond the understanding of people with moderate computer literacy. Now, given that the technology is better understood, we face the question of how convenient LLM chatbots are for different occupations. This paper embarks on the question of whether LLMs can be useful for networking applications.
This paper systematizes querying three popular LLMs (GPT-3.5, GPT-4, and Claude 3) with questions taken from several network management online courses and certifications, and presents a taxonomy of six axes along which the incorrect responses were classified:
The authors also measure four strategies toward improving answers:
The authors observe that, while some of those strategies were marginally useful, they sometimes resulted in degraded performance.
The authors queried the commercially available instances of Gemini and GPT, which achieved scores over 90 percent for basic subjects but fared notably worse in topics that require understanding and converting between different numeric notations, such as working with Internet protocol (IP) addresses, even if they are trivial (that is, presenting the subnet mask for a given network address expressed as the typical IPv4 dotted-quad representation).
As a last item in the paper, the authors compare performance with three popular open-source models: Llama3.1, Gemma2, and Mistral with their default settings. Although those models are almost 20 times smaller than the GPT-3.5 commercial model used, they reached comparable performance levels. Sadly, the paper does not delve deeper into these models, which can be deployed locally and adapted to specific scenarios.
The paper is easy to read and does not require deep mathematical or AI-related knowledge. It presents a clear comparison along the described axes for the 503 multiple-choice questions presented. This paper can be used as a guide for structuring similar studies over different fields.
If you ever face the need to activate the PROXY Protocol in HaProxy
(e.g. if you're as unlucky as I'm, and you have to use Google Cloud TCP
proxy load balancer), be aware that there are two ways to do that.
Both are part of the frontend
configuration.
This one is the big hammer and forces the usage of the PROXY protocol on all connections. Sample:
frontend vogons
bind *:2342 accept-proxy ssl crt /etc/haproxy/certs/vogons/tls.crt
If you have to, e.g. during a phase of migrations, receive traffic directly, without
the PROXY protocol header and from a proxy with the header there is also a more
flexible option based on a tcp-request connection
action. Sample:
frontend vogons
bind *:2342 ssl crt /etc/haproxy/certs/vogons/tls.crt
tcp-request connection expect-proxy layer4 if { src 35.191.0.0/16 130.211.0.0/22 }
Source addresses here are those of GCP global TCP proxy frontends. Replace with whatever suites your case. Since this is happening just after establishing a TCP connection, there is barely anything else available to match on beside of the source address.
I recently wrote about How to Use SSH with FIDO2/U2F Security Keys, which I now use on almost all of my machines.
The last one that needed this was my Raspberry Pi hooked up to my DEC vt510 terminal and IBM mechanical keyboard. Yes I do still use that setup!
To my surprise, generating a key on it failed. I very quickly saw that /dev/hidraw0 had incorrect permissions, accessible only to root.
On other machines, it looks like this:
crw-rw----+ 1 root root 243, 16 May 24 16:47 /dev/hidraw16
And, if I run getfacl on it, I see:
# file: dev/hidraw16 # owner: root # group: root user::rw- user:jgoerzen:rw- group::--- mask::rw- other::---
Yes, something was setting an ACL on it. Thus began to saga to figure out what was doing that.
Firing up inotifywatch, I saw it was systemd-udevd or its udev-worker. But cranking up logging on that to maximum only showed me that uaccess was somehow doing this.
I started digging. uaccess turned out to be almost entirely undocumented. People say to use it, but there’s no description of what it does or how. Its purpose appears to be to grant access to devices to those logged in to a machine by dynamically adding them to ACLs for devices. OK, that’s a nice goal, but why was machine A doing this and not machine B?
I dug some more. I came across a hint that uaccess may only do that for a “seat”. A seat? I’ve not heard of that in Linux before.
Turns out there’s some information (older and newer) about this out there. Sure enough, on the machine with KDE, loginctl list-sessions shows me on seat0, but on the machine where I log in from ttyUSB0, it shows an empty seat.
But how to make myself part of the seat? I tried various udev rules to add the “seat” or “master-of-seat” tags, but nothing made any difference.
I finally gave up and did the old-fashioned rule to just make it work already:
TAG=="security-device",SUBSYSTEM=="hidraw",GROUP="mygroup"
I still don’t know how to teach logind to add a seat for ttyUSB0, but oh well. At least I learned something. An annoying something, but hey.
This all had a laudable goal, but when there are so many layers of indirection, poorly documented, with poor logging, it gets pretty annoying.
11 June, 2025 02:12PM by John Goerzen
This was my hundred-thirty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on:
I also continued my to work on libxmltok and suricata. This month I also had to do some support on seger, for example to inject packages newly needed for builds.
This month was the eighty-second ELTS month. During my allocated time I uploaded or worked on:
All packages I worked on have been on the list of longstanding packages. For example espeak-ng has been on this list for more than nine month. I now understood that there is a reason why packages are on this list. Some parts of the software have been almost completely reworked, so that the patches need a “reverse” rework. For some packages this is easy, but for others this rework needs quite some time. I also continued to work on libxmltok and suricata.
Unfortunately I didn’t found any time to work on this topic.
This month I uploaded bugfix versions of:
This month I uploaded bugfix versions of:
This month I uploaded bugfix versions of:
Thanks a lot to the Release Team who quickly handled all my unblock bugs!
It is this time of the year when just a few packages arrive in NEW: it is Hard Freeze. So I enjoy this period and basically just take care of kernels or other important packages. As people seem to be more interested in discussions than in fixing RC bugs, my period of rest seems to continue for a while. So thanks for all this valuable discussions and really thanks to the few people who still take care of Trixie. This month I accepted 146 and rejected 10 packages. The overall number of packages that got accepted was 147.
08 June, 2025 05:48PM by alteholz
My Debian contributions this month were all sponsored by Freexian. Things were a bit quieter than usual, as for the most part I was sticking to things that seemed urgent for the upcoming trixie release.
You can also support my work directly via Liberapay or GitHub Sponsors.
After my appeal for help last month to
debug intermittent sshd
crashes, Michel
Casabona helped me put together an environment where I could reproduce it,
which allowed me to track it down to a root
cause and fix it. (I
also found a misuse of
strlcpy
affecting at
least glibc-based systems in passing, though I think that was unrelated.)
I worked with Daniel Kahn Gillmor to fix a regression in ssh-agent
socket
handling.
I fixed a reproducibility bug depending on whether passwd
is installed on
the build system, which would have
affected security updates during the lifetime of trixie.
I backported openssh 1:10.0p1-5 to bookworm-backports.
I issued bookworm and bullseye updates for CVE-2025-32728.
I backported a fix for incorrect output when formatting multiple documents as PDF/PostScript at once.
I added a simple autopkgtest.
I upgraded these packages to new upstream versions:
In bookworm-backports, I updated these packages:
I fixed problems building these packages reproducibly:
I backported fixes for some security vulnerabilities to unstable (since we’re in freeze now so it’s not always appropriate to upgrade to new upstream versions):
I fixed various other build/test failures:
I added non-superficial autopkgtests to these packages:
I packaged python-django-hashids and python-django-pgbulk, needed for new upstream versions of python-django-pgtrigger.
I ported storm to Python 3.14.
I fixed a build failure in apertium-oci-fra.
08 June, 2025 12:20AM by Colin Watson
Back in 2020 I posted about my desk setup at home.
Recently someone in our #remotees
channel at work asked about WFH setups and given quite a few things changed in mine, I thought it's time to post an update.
But first, a picture!
(Yes, it's cleaner than usual, how could you tell?!)
It's still the same Flexispot E5B, no change here. After 7 years (I bought mine in 2018) it still works fine. If I'd have to buy a new one, I'd probably get a four-legged one for more stability (they got quite affordable now), but there is no immediate need for that.
It's still the IKEA Volmar. Again, no complaints here.
Now here we finally have some updates!
A Lenovo ThinkPad X1 Carbon Gen 12, Intel Core Ultra 7 165U, 32GB RAM, running Fedora (42 at the moment).
It's connected to a Lenovo ThinkPad Thunderbolt 4 Dock. It just worksâ„¢.
It's still the P410, but mostly unused these days.
An AOC U2790PQU 27" 4K. I'm running it at 150% scaling, which works quite decently these days (no comparison to when I got it).
As the new monitor didn't want to take the old Dell soundbar, I have upgraded to a pair of Alesis M1Active 330 USB.
They sound good and were not too expensive.
I had to fix the volume control after some time though.
It's still the Logitech C920 Pro.
The built in mic of the C920 is really fine, but to do conference-grade talks (and some podcasts 😅), I decided to get something better.
I got a FIFINE K669B, with a nice arm.
It's not a Shure, for sure, but does the job well and Christian was quite satisfied with the results when we recorded the Debian and Foreman specials of Focus on Linux.
It's still the ThinkPad Compact USB Keyboard with TrackPoint.
I had to print a few fixes and replacement parts for it, but otherwise it's doing great.
Seems Lenovo stopped making those, so I really shouldn't break it any further.
Logitech MX Master 3S. The surface of the old MX Master 2 got very sticky at some point and it had to be replaced.
I'm still terrible at remembering things, so I still write them down in an A5 notepad.
I've also added a (small) whiteboard on the wall right of the desk, mostly used for long term todo lists.
Turns out Xeon-based coasters are super stable, so it lives on!
Yepp, still a thing. Still USB-A because... reasons.
Still the Bose QC25, by now on the third set of ear cushions, but otherwise working great and the odd 15€ cushion replacement does not justify buying anything newer (which would have the same problem after some time, I guess).
I did add a cheap (~10€) Bluetooth-to-Headphonejack dongle, so I can use them with my phone too (shakes fist at modern phones).
And I do use the headphones more in meetings, as the Alesis speakers fill the room more with sound and thus sometimes produce a bit of an echo.
The Bose need AAA batteries, and so do some other gadgets in the house, so there is a technoline BC 700 charger for AA and AAA on my desk these days.
Yepp, I've added an IKEA Tertial and an ALDI "face" light. No, I don't use them much.
I've "built" a KVM switch out of an USB switch, but given I don't use the workstation that often these days, the switch is also mostly unused.
07 June, 2025 03:17PM by evgeni
Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we’ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website.
In this report:
The Open Technology Fund’s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a “whitebox� audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service.
The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility.
OTF’s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.
Danielle Navarro published an interesting and amusing article on their blog on When good pseudorandom numbers go bad. Danielle sets the stage as follows:
[Colleagues] approached me to talk about a reproducibility issue they’d been having with some R code. They’d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using
set.seed()
to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren’t “just a little bit different� in the way that we’ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
Thanks to David Wheeler for posting about this article on our mailing list
There were two scholarly articles published this month that related to reproducibility:
Daniel Hugenroth and Alastair R. Beresford of the University of Cambridge in the United Kingdom and Mario Lins and René Mayrhofer of Johannes Kepler University in Linz, Austria published an article titled Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments. In their paper, they:
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare “attestable builds� with reproducible builds by noting an attestable build requires “only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it�, and proceed by determining that t�he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.�
Timo Pohl, Pavel Novák, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is “sparse�, focusing on few individual problems and ecosystems, and therefore identify space for more critical research.
In Debian this month:
Ian Jackson filed a bug against the debian-policy
package in order to delve into an issue affecting Debian’s support for cross-architecture compilation, multiple-architecture systems, reproducible builds’ SOURCE_DATE_EPOCH
environment variable and the ability to recompile already-uploaded packages to Debian with a new/updated toolchain (binNMUs). Ian identifies a specific case, specifically in the libopts25-dev
package, involving a manual page that had interesting downstream effects, potentially affecting backup systems. The bug generated a large number of replies, some of which have references to similar or overlapping issues, such as this one from 2016/2017.
Chris Hofstaedtler filed a bug against the metasnap.debian.net service to note that some packages are not available in metasnap API.
22 reviews of Debian packages were added, 24 were updated and 11 were removed this month, all adding to our knowledge about identified issues.
Hans-Christoph Steiner of the F-Droid catalogue of open source applications for the Android platform published a blog post on Making reproducible builds visible. Noting that “Reproducible builds are essential in order to have trustworthy software�, Hans also mentions that “F-Droid has been delivering reproducible builds since 2015�. However:
There is now a “Reproducibility Status� link for each app on
f-droid.org
, listed on every app’s page. Our verification server shows ✔�� or 💔 based on its build results, where ✔�� means our rebuilder reproduced the same APK file and 💔 means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a ✅ for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that “provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.�
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295
, 296
and 297
to Debian:
--walk
argument being available, and only add that argument on newer versions after we test for that. […]lzma
comparator from Will Hollywood. […][…]Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues […]. This was then uploaded to Debian as version 0.6.0-1
.
Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 […][…] and 297 […][…], and disorderfs to version 0.6.0 […][…].
Once again, there were a number of improvements made to our website this month including:
Chris Lamb:
SOURCE_DATE_EPOCH
example page […]SOURCE_DATE_EPOCH
snippet from Sebastian Davis, which did not handle non-integer values correctly. […]David A. Wheeler:
README.md
file. […]Hans-Christoph Steiner:
LICENSE
file. […]Jochen Sprickerhof:
Sebastian Davids:
SOURCE_DATE_EPOCH
page. […]SOURCE_DATE_EPOCH
page. […]The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.
However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL’s public post, “recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year�. As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:
jenkins.debian.net
server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.i386
architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386
is no longer supported as a ‘regular’ architecture — there will be no official kernel and no Debian installer for i386
systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386
to amd64
.ionos17-amd64.debian.net
, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.riscv64
architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.Outside of this, a number of smaller changes were also made by Holger Levsen:
reproduce.debian.net-related:
ppc64el
architecture due to RAM size. […]nginx_request
and nginx_status
with the Munin monitoring system. […][…]rebuilderd-cache-cleanup.service
and run it daily via timer. […][…][…][…][…]$HOSTNAME
variable in the rebuilderd logfiles. […]equivs
package on all worker nodes. […][…]Jenkins nodes:
sudo
tool to fix up permission issues. […][…]riscv64
, FreeBSD, etc.. […][…][…][…]ntpsec-ntpdate
(instead of ntpdate
) as the former is available on Debian trixie and bookworm. […][…]ControlPath
for all nodes. […]munin
user uses the same SSH config as the jenkins
user. […]tests.reproducible-builds.org-related:
Misc:
multiarch_versionskew
script. […]In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
debrebuild
line number. […]rebuilder-debian.sh
script. […]…]rebuildctl
to sync only ‘arch-specific’ packages. […][…]The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
Bernhard M. Wiedemann:
cmake/musescore
netdiscover
autotrace
, ck
, cmake
, crash
, cvsps
, gexif
, gq
, gtkam
, ibus-table-others
, krb5-appl
, ktoblzcheck-data
, leafnode
, lib2geom
, libexif-gtk
, libyui
, linkloop
, meson
, MozillaFirefox
, ncurses
, notify-sharp
, pcsc-acr38
, pcsc-asedriveiiie-serial
, pcsc-asedriveiiie-usb
, pcsc-asekey
, pcsc-eco5000
, pcsc-reflex60
, perl-Crypt-RC
, python-boto3
, python-gevent
, python-pytest-localserver
, qt6-tools
, seamonkey
, seq24
, smictrl
, sobby
, solfege
, urfkill
, uwsgi
, wsmancli
, xine-lib
, xkeycaps
, xquarto
, yast-control-center
, yast-ruby-bindings
and yast
libmfx-gen
, libmfx
, liboqs
Chris Hofstaedtler:
jabber-muc
.Chris Lamb:
golang-github-lucas-clemente-quic-go
.Jelle van der Waa:
Jochen Sprickerhof:
Zhaofeng Li:
--mtime
and --clamp-mtime
to bsdtar
.James Addison:
python3
— requested enabling a LTO-adjacent option that should improve build reproducibility.freezegun
for a timezone issue causing unit tests to fail during testing.tutanota
in an attempt to resolve a long-standing reproducibility issue.Zbigniew Jędrzejewski-Szmek:
0xFFFF
: Use SOURCE_DATE_EPOCH
for date in manual pages.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
IRC: #reproducible-builds
on irc.oftc.net
.
Mastodon: @reproducible_builds@fosstodon.org
Mailing list: rb-general@lists.reproducible-builds.org
Welcome to post 49 in the R4 series.
The Two Cultures is a term first used by C.P. Snow in a 1959 speech and monograph focused on the split between humanities and the sciences. Decades later, the term was (quite famously) re-used by Leo Breiman in a (somewhat prophetic) 2001 article about the split between ‘data models’ and ‘algorithmic models’. In this note, we argue that statistical computing practice and deployment can also be described via this Two Cultures moniker.
Referring to the term linking these foundational pieces is of course headline bait. Yet when preparing for the discussion of r2u in the invited talk in Mons (video, slides), it occurred to me that there is in fact a wide gulf between two alternative approaches of using R and, specifically, deploying packages.
On the one hand we have the approach described by my friend Jeff as “you go to the Apple store, buy the nicest machine you can afford, install what you need and then never ever touch it�. A computer / workstation / laptop is seen as an immutable object where every attempt at change may lead to breakage, instability, and general chaos—and is hence best avoided. If you know Jeff, you know he exaggerates. Maybe only slightly though.
Similarly, an entire sub-culture of users striving for
“reproducibility� (and sometimes also “replicability�) does the same.
This is for example evidenced by the popularity of package renv
by Rcpp collaborator and pal Kevin. The expressed hope is
that by nailing down a (sub)set of packages, outcomes are constrained to
be unchanged. Hope springs eternal, clearly. (Personally, if need be, I
do the same with Docker containers and their respective
Dockerfile
.)
On the other hand, ‘rolling’ is fundamentally different approach. One (well known) example is Google building “everything at @HEAD�. The entire (ginormous) code base is considered as a mono-repo which at any point in time is expected to be buildable as is. All changes made are pre-tested to be free of side effects to other parts. This sounds hard, and likely is more involved than an alternative of a ‘whatever works’ approach of independent changes and just hoping for the best.
Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to a ‘staging’ place (Debian calls this the ‘unstable’ distribution) and, if no side effects are seen, propagated after a fixed number of days to the rolling distribution (called ‘testing’). With this mechanism, ‘testing’ should always be installable too. And based on the rolling distribution, at certain times (for Debian roughly every two years) a release is made from ‘testing’ into ‘stable’ (following more elaborate testing). The released ‘stable’ version is then immutable (apart from fixes for seriously grave bugs and of course security updates). So this provides the connection between frequent and rolling updates, and produces immutable fixed set: a release.
This Debian approach has been influential for any other projects—including CRAN as can be seen in aspects of its system providing a rolling set of curated packages. Instead of a staging area for all packages, extensive tests are made for candidate packages before adding an update. This aims to ensure quality and consistence—and has worked remarkably well. We argue that it has clearly contributed to the success and renown of CRAN.
Now, when accessing CRAN
from R, we fundamentally have
two accessor functions. But seemingly only one is widely known
and used. In what we may call ‘the Jeff model’, everybody is happy to
deploy install.packages()
for initial
installations.
That sentiment is clearly expressed by this bsky post:
One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package I go check for a new version because invariably it’s been updated with 18 new major features 😆
And that is why we have two cultures.
Because some of us, yours truly included, also use
update.packages()
at recurring (frequent !!) intervals:
daily or near-daily for me. The goodness and, dare I say, gift of
packages is not limited to those by my pal Vincent. CRAN updates all the time, and
updates are (generally) full of (usually excellent) changes, fixes, or
new features. So update frequently! Doing (many but small) updates
(frequently) is less invasive than (large, infrequent) ‘waterfall’-style
changes!
But the fear of change, or disruption, is clearly pervasive. One can only speculate why. Is the experience of updating so painful on other operating systems? Is it maybe a lack of exposure / tutorials on best practices?
These ‘Two Cultures’ coexist. When I delivered the talk in Mons, I
briefly asked for a show of hands among all the R users in the audience to see who
in fact does use update.packages()
regularly. And maybe a
handful of hands went up: surprisingly few!
Now back to the context of installing packages: Clearly ‘only
installing’ has its uses. For continuous integration checks we generally
install into ephemeral temporary setups. Some debugging work may be with
one-off container or virtual machine setups. But all other uses may well
be under ‘maintained’ setups. So consider calling
update.packages()
once in while. Or even weekly or daily.
The rolling feature of CRAN is a real benefit, and it is
there for the taking and enrichment of your statistical computing
experience.
So to sum up, the real power is to use
install.packages()
to obtain fabulous new statistical
computing resources, ideally in an instant; andupdate.packages()
to keep these fabulous resources
current and free of (known) bugs.For both tasks, relying on binary installations accelerates and eases the process. And where available, using binary installation with system-dependency support as r2u does makes it easier still, following the r2u slogan of ‘Fast. Easy. Reliable. Pick All Three.’ Give it a try!
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.
built on Rust with (Bitcoin style) encryption, whole new architecture. Maybe this time they've got it right?
This post is an unpublished review for The subjective value of privacy • Assessing individuals' calculus of costs and benefits in the context of state surveillance
Internet users, software developers, academics, entrepreneurs – basically everybody is now aware of the importance of considering privacy as a core part of our online experience. User demand, and various national or regional laws, have made privacy a continuously present subject. And privacy is such an all-encompassing, complex topic, the angles from which it can be studied seems never to finish; I recommend computer networking-oriented newcomers to the topic to refer to Brian Kernighan’s excellent work [1]. However, how do regular people –like ourselves, in our many capacities– feel about privacy? Lukas Antoine presents a series of experiments aiming at better understanding how people throughout the world understands privacy, and when is privacy held as more or less important than security in different aspects,
Particularly, privacy is often portrayed as a value set at tension against surveillance, and particularly state surveillance, in the name of security: conventional wisdom presents the idea of privacy calculus. This is, it is often assumed that individuals continuously evaluate the costs and benefits of divulging their personal data, sharing data when they expect a positive net outcome, and denying it otherwise. This framework has been accepted for decades, and the author wishes to challenge it. This book is clearly his doctoral thesis on political sciences, and its contents are as thorough as expected in this kind of product.
The author presents three empirical studies based on cross-survey analysis. The first experiment explores the security justifications for surveillance and how they influence their support. The second one searches whether the stance on surveillance can be made dependent on personal convenience or financial cost. The third study explores whether privacy attitude is context-dependant or can be seen as a stable personality trait. The studies aim to address the shortcomings of published literature in the field, mainly, (a) the lack of comprehensive research on state surveillance, needed or better understanding privacy appreciation, (b) while several studies have tackled the subjective measure of privacy, there is a lack of cross-national studies to explain wide-ranging phenomena, (c) most studies in this regard are based on population-based surveys, which cannot establish causal relationships, (d) a seemingly blind acceptance of the privacy calculus mentioned above, with no strong evidence that it accurately measures people’s motivations for disclosing or withholding their data. The specific take, including the framing of the tension between privacy and surveillance has long been studied, as can be seen in Steven Nock’s 1993 book [2], but as Sannon’s article in 2022 shows [3], social and technological realities require our undertanding to be continuously kept up to date.
The book is full with theoretical references and does a very good job of explaining the path followed by the author. It is, though, a heavy read, and, for people not coming from the social sciences tradition, leads to the occasional feeling of being lost. The conceptual and theoretical frameworks and presented studies are thorough and clear. The author is honest in explaining when the data points at some of his hypotheses being disproven, while others are confirmed.
The aim of the book is for people digging deep into this topic. Personally, I have authored several works on different aspects of privacy (such as a book [4] and a magazine number [5]), but this book did get me thinking on many issues I had not previously considered. Looking for comparable works, I find Friedewald et al.’s 2017 book [6] chapter organization to follow a similar thought line. My only complaint would be that, for the publication as part of its highly prestigious publisher, little attention has been paid to editorial aspects: sub-subsection depth is often excessive and unclear. Also, when publishing monographs based on doctoral works, it is customary to no longer refer to the work as a “thesis” and to soften some of the formal requirements such a work often has, with the aim of producing a more gentle and readable book; this book seems just like the mass-production of an (otherwise very interesting and well made) thesis work.
References:
This post is an unpublished review for Humanities and big data in Ibero-America • Theory, methodology and practical applications
Digital humanities is a young–though established–field. It deals with different expressions in which digital data manipulation techniques can be applied and used to analyze subjects that are identified as belonging to the humanities. Although most often used to analyze different aspects of literature or social network analysis, it can also be applied to other humanistic disciplines or artistic expressions. Digital humanities employs many tools, but those categorized as big data are among the most frequently employed. This book samples different takes on digital humanities, with the particularity that it focuses on Ibero-American uses. It is worth noting that this book is the second in a series of four volumes, published or set to be published between 2022 and 2026. Being the output of a field survey, I perceive this book to be targeted towards fellow Digital Humanists – people interested in applying computational methods to further understand and research topics in the humanities. It is not a technical book in the sense Computer Science people would recognize as such, but several of the presented works do benefit from understanding some technical concepts.
The 12 articles (plus an introduction) that make up this book are organized in three parts:
(1) “Theoretical Framework” presents the ideas and techniques of data science (that make up the tools for handling big data), and explores how data science can contribute to literary analysis, all while noting that many such techniques are usually frowned upon in Latin America as data science “smells neoliberal”;
(2) “Methodological Issues” looks at specific issues through the lens of how they can be applied to big data, with specific attention given to works in Spanish; and
(3) “Practical Applications” analyzes specific Spanish works and communities based on big data techniques.
Several chapters treat a recurring theme: the simultaneous resistance and appropriation of big data by humanists. For example, at least three of the chapters describe the tensions between humanism (“aesthesis”) and cold, number-oriented data analysis (“mathesis”).
The analyzed works of Parts 2 and 3 are interesting and relatively easy to follow.
Some inescapable ideological gleans from several word uses – from the book’s and series’ name, which refers to the Spanish-speaking regions as “Ibero-America”, often seen as Eurocentric, in contrast with the “Latin America” term much more widely used throughout the region.
I will end with some notes about the specific versions of the book I reviewed. I read both an EPUB version and a print copy. The EPUB did not include links for easy navigation to footnotes, that is, the typographical superindexes are not hyperlinked to the location of the notes, so it is very impractical to try to follow them. The print version (unlike the EPUB) did not have an index, that is, the six pages before the introduction are missing from the print copy I received. For a book such as this one, not having an index hampers the ease of reading and referencing.
Blocking comment spammers on an Ikiwiki blog
Despite comments on my ikiwiki blog being fully moderated, spammers have been increasingly posting link spam comments on my blog. While I used to use the blogspam plugin, the underlying service was likely retired circa 2017 and its public repositories are all archived.
It turns out that there is a relatively simple way to drastically reduce the amount of spam submitted to the moderation queue: ban the datacentre IP addresses that spammers are using.
Looking up AS numbers
It all starts by looking at the IP address of a submitted comment:
From there, we can look it up using
whois
:The important bit here is this line:
which referts to Autonomous System 207408, owned by a hosting company in Germany called Servinga.
Alternatively, you can use this WHOIS server with much better output:
Looking up IP blocks
Autonomous Systems are essentially organizations to which IPv4 and IPv6 blocks have been allocated.
These allocations can be looked up easily on the command line either using a third-party service:
or a local database downloaded from IPtoASN.
This is what I ended up with in the case of Servinga:
Preventing comment submission
While I do want to eliminate this source of spam, I don't want to block these datacentre IP addresses outright since legitimate users could be using these servers as VPN endpoints or crawlers.
I therefore added the following to my Apache config to restrict the CGI endpoint (used only for write operations such as commenting):
and then put the following in
/etc/apache2/spammers.include
:Finally, I can restart the website and commit my changes:
Future improvements
I will likely automate this process in the future, but at the moment my blog can go for a week without a single spam message (down from dozens every day). It's possible that I've already cut off the worst offenders.
I have published the list I am currently using.
04 June, 2025 08:28PM