veer66

veer66

Prerequisite

I use Ubuntu 20.04 and I have already install node.js.

Install neovim

apt-get install neovim

Install vimplug

sh -c 'curl -fLo "${XDG_DATA_HOME:-$HOME/.local/share}"/nvim/site/autoload/plug.vim --create-dirs \
       https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim'

(source: https://github.com/junegunn/vim-plug)

Setup vimplug

nvim $HOME/.config/nvim/init.vim

In $HOME/.config/nvim/init.vim

call plug#begin(stdpath("data") . '/plugged')

call plug#end()

Install coc.vim

In $HOME/.config/nvim/init.vim

call plug#begin(stdpath("data") . '/plugged')
Plug 'neoclide/coc.nvim', {'branch': 'release'}
call plug#end()

In nvim,

:PlugInstall

Install rust-analyzer

mkdir .local/bin
curl -L https://github.com/rust-analyzer/rust-analyzer/releases/latest/download/rust-analyzer-linux -o ~/.local/bin/rust-analyzer
chmod +x ~/.local/bin/rust-analyzer

(source: https://rust-analyzer.github.io/manual.html#rust-analyzer-language-server-binary)

Install coc-rust-analyzer

In nvim,

:CocInstall coc-rust-analyzer

Alt Text

Setup coc

In $HOME/.config/nvim/coc-settings.json

{"rust-analyzer.server.path": "/home/YOURNAME/.local/bin/rust-analyzer"}                                                                                                         

Please replace YOURNAME with your name.

Open a source code in Rust

It should work.

Alt Text

I didn't do anything fancy. I create C compatible wrapper in Rust, and call the wrapper from Common Lisp using CFFI.

My C compatible wrapper functions

#[no_mangle]
pub extern "C" fn zstd_line_read_new<'a>(zstd_file_path: *const c_char) -> *mut c_void {
    let r_zstd_file_path = unsafe { CStr::from_ptr(zstd_file_path) };
    let file = File::open(r_zstd_file_path.to_str().unwrap());
    if file.is_err() {
        eprintln!("Cannot open file {}", r_zstd_file_path.to_str().unwrap());
        return ptr::null::<c_void>() as *mut c_void;
    }
    let file = file.unwrap();
    let wrapper = DecoderWrapper::new(file);
    Box::into_raw(Box::new(wrapper)) as *mut c_void
}

#[no_mangle]
pub extern "C" fn zstd_line_read<'a>(reader: *mut c_void) -> *const c_char {
    let wrapper: *mut DecoderWrapper<'a> = reader as *mut DecoderWrapper<'a>;
    let mut line = Vec::with_capacity(BUF_SIZE);
    unsafe {
        match (*wrapper).read_line(&mut line) {
            Ok(len) => {
                if len == 0 {
                    return ptr::null();
                } else {
                    return CString::from_vec_unchecked(line).into_raw();
                }
            }
            Err(e) => {
                panic!(e)
            }
        }
    }
}

I added this to Carto.toml

[lib]
crate-type = ["cdylib"]

It makes Cargo create DLL on Windows or .so on Linux instead of Rust specific library format.

In my Common Lisp code:

(ql:quickload :cffi)
(defpackage :t1
  (:use :cl :cffi))
(in-package :t1)

(define-foreign-library zstd_read_line
  (:win32 (:default "./target/release/zstd_read_line"))
  (t (:default "./target/release/libzstd_read_line")))

(use-foreign-library zstd_read_line)

(defcfun ("zstd_line_read_new" create-zstd-reader) :pointer (zstd_archive_path :string))
(defcfun ("zstd_line_read" zstd-read-line) :string (reader :pointer))

(let ((reader (create-zstd-reader "test1.txt.zst")))
  (loop for line = (zstd-read-line reader)
    while line
    do (print line)))

It required Quicklisp for install CFFI. This process doesn't require cbindgen because I define C interface in Common Lisp manually, as you may see at defcfun.

And then it works even on Windows 10.

Alt Text

I try different to serialize my data before saving into RocksDB. #rustlang

serde_json

Time

real    4m11.713s
user    13m32.809s
sys     1m33.887s

Space

2.1GB

bincode

Time

real    2m13.772s
user    10m45.541s
sys     1m41.670s

Space

1.1GB

serde-lexpr (S-Expression)

Time

real    10m54.622s
user    22m11.570s
sys     1m10.663s

Space

2.2GB

serde_cbor

Time

real    2m52.010s
user    12m17.019s
sys     1m32.687s

Space

1.7GB

CBOR and bincode are obviously faster than JSON and S-Expression. CBOR is a less efficient than bincode. However, more programming languages support CBOR.

Common Lisp เป็นระบบที่ไม่ตายง่าย ๆ ถ้าโปรแกรมที่เราใช้รันที 0.5 วินาที ก็คงไม่เป็นไร แต่ถ้าเริ่มโหลด data สัก 10 นาทีแล้วถึงจะทำงานต่อได้ หรือโปรแกรมต้องรันต่อกันสามวันก็คงไม่อยากจะเริ่มใหม่บ่อย ๆ จะ save state ลง disk กันบ่อย ๆ โปรแกรมก็เสร็จช้าเข้าไปอีก

Common Lisp เวลาเกิดข้อผิดพลาด กดจะส่งสัญญาณ (signal ที่เป็น verb) ที่มีสภาวะ (condition) ของข้อผิดพลาดและสภาวะแวดล้อมของข้อผิดพลาดแนบไปด้วย ตัวรับสัญญาณที่กำหนดไว้ก่อนว่าสภาวะไหนให้ทำอะไรก็ทำงานได้สารพัด ตั้งแต่พิมพ์อะไรออกมาบอกเฉย ๆ หรือกระโดดข้ามบางส่วน กลับไปทำโปรแกรมซ้ำอีกครั้ง หรือจัดการไม่ได้มันจะไปเรียก debugger ถามคนที่อยู่หน้าจอว่าให้ทำอะไรต่อ ส่วนที่ให้ถามว่าทำอะไรต่อก็กำหนดไว้ก่อนได้ว่าจะให้ทำอะไร เลือกแผน B แผน C ได้ มากไปกว่านั้นถ้าคนที่อยู่หน้าจอเป็น programmer ก็แก้โปรแกรมบางส่วนแล้วสั่งให้รันต่อได้

I specified the source path in $HOME/.config/common-lisp/source-registry.conf.

source-registry.conf:

(:source-registry
  (:tree (:home "Develop/t/mt2021"))
  (:tree (:home "Develop/free"))
  :inherit-configuration)

Common Lisp packages can be put in $HOME/Develop/t/mt2021 or Develop/free.

Now I create a new project using Quickproject.

In SBCL, I ran this:

* (ql:quickload 'quickproject)
* (quickproject:make-project #p"~/Develop/free/amphi" :depends-on '(arrow-macros cl-ppcre jonathan))

I just realized yesterday that I don't have to use

(asdf:load-system :amphi)

.

I can use

(ql:quickload :amphi)

to load my local projects. The upside of ql:quickload is that it also install all dependencies specified in .asd.

Now the process is easy and started to make sense. 😅

With the external-program package, I can code to call an external command line program easily. For example:

(external-program:run "ls" '("-l") :output *standard-output*)

What if I want to keep streaming (piping) input and reading output. (1) instead of run, I use start.

(setf *x* (external-program:start "cat" '() :output :stream :input :stream))

(2) I create a thread for reading the output and print it. Bordeaux-threads API looks quite similar to POSIX.

(setf *read-thread*
      (bordeaux-threads:make-thread
       (lambda ()
     (loop for line = (read-line (external-program:process-output-stream *x*) nil)
           while line
           do (format t "OUT: ~A~%" line)))
       :name "titi"))

(3) I get an output stream for passing data to the external program.

(setf *out* (external-program:process-input-stream *x*))

(4) I write something to out and flash it.

(progn
  (loop for i from 1 to 10
    do (format *out* "!!! ~A~%" i))
  (force-output *out*))

At this step, you should see some output from the (2) step, which are:

OUT: !!! 1
OUT: !!! 2
...
OUT: !!! 10

You can keep playing with it.

(5) And finally I clean everything up.

(close *out*)
(bordeaux-threads:join-thread *read-thread*)

I tried to use cl-lzma. I didn’t know that the vector should be static-vectors:vector instead of vector. Anyways, now I know.

(ql:quickload 'cl-lzma)
(use-package :cl-lzma)
(use-package :static-vectors)
(setq a (multiple-value-list 
                  (with-static-vector 
                              (v 100 :initial-contents (loop for i from 0 below 100 collect 8)) 
                    (lzma-compress v))))
(apply #'lzma-decompress a)

And, finally, it got

#(8 8 8 8 8 8 8 8 8 8 8
 8 8 8 8 8 8 8 8 8 8 8 8 
8 8 8 8 8 8 8 8 8 8 8 8 8
 8 8 8  8 8 8 8 8 8 8 8 8
 8 8 8 8 8 8 8 8 8 8 8 8 8 
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
 8 8 8 8 8 8 8 8 8 8 8 8 8 8
 8 8 8 8 8 8 8 8 8)

for decompression, which is correct!

Given I have a file of document id and text unit id. For example:

doc-id, text-unit-ids
1,100,200,300
2,50,8,1,6

I want to keep them in a list of doc-id, text-unit-id pairs. For example:

1,100
1,200
1,300
2,50
2,8
2,1
2,6

So I defined my variable in Rust, as follow:

let word_id_tu_ids: Vec<(u32, u32)> = vec![];

Sometimes I have a problem that I flipped them. I put a pair of text-unit-id and doc-id instead of text-unit-it and doc-id. For example, I put 100,1 instead of 100,1.

By (u32, u32), the Rust compiler cannot help me. So I tried:

type WordId = u32;
type DocId = u32;

fn push(w: WordId, d: DocId) {
    let mut v: Vec<(WordId, DocId)> = vec![];
    v.push((d, w));
}

It didn't help. So I tried struct instead.

struct WordId(u32);
struct DocId(u32);

fn push(w: WordId, d: DocId) {
    let mut v: Vec<(WordId, DocId)> = vec![];
    v.push((d, w));
}

Now the compiler can detect the error. However, maybe it is not clear to human eyes. So I defined another struct.

struct WordId(u32);
struct DocId(u32);

struct WordIdDocId {
    word_id: WordId,
    doc_id: DocId,
}

fn push(w: WordId, d: DocId) {
    let mut v: Vec<WordIdDocId> = vec![];
    v.push(WordIdDocId {word_id: w, doc_id: d});
}

Now it is clear to human eyes and the compiler. Anyways, is it what people call over-engineering? What if I want to:

let u = w + 1;

We can do it in Rust, but the code is going to be even longer. published: true description: tags: #rust #rustlang //coverimage: https://directurltoimage.jpg


Given I have a file of document id and text unit id. For example:

doc-id, text-unit-ids
1,100,200,300
2,50,8,1,6

I want to keep them in a list of doc-id, text-unit-id pairs. For example:

1,100
1,200
1,300
2,50
2,8
2,1
2,6

So I defined my variable in Rust, as follow:

let word_id_tu_ids: Vec<(u32, u32)> = vec![];

Sometimes I have a problem that I flipped them. I put a pair of text-unit-id and doc-id instead of text-unit-it and doc-id. For example, I put 100,1 instead of 100,1.

By (u32, u32), the Rust compiler cannot help me. So I tried:

type WordId = u32;
type DocId = u32;

fn push(w: WordId, d: DocId) {
    let mut v: Vec<(WordId, DocId)> = vec![];
    v.push((d, w));
}

It didn't help. So I tried struct instead.

struct WordId(u32);
struct DocId(u32);

fn push(w: WordId, d: DocId) {
    let mut v: Vec<(WordId, DocId)> = vec![];
    v.push((d, w));
}

Now the compiler can detect the error. However, maybe it is not clear to human eyes. So I defined another struct.

struct WordId(u32);
struct DocId(u32);

struct WordIdDocId {
    word_id: WordId,
    doc_id: DocId,
}

fn push(w: WordId, d: DocId) {
    let mut v: Vec<WordIdDocId> = vec![];
    v.push(WordIdDocId {word_id: w, doc_id: d});
}

Now it is clear to human eyes and the compiler. Anyways, is it what people call over-engineering? What if I want to:

let u = w + 1;

We can do it in Rust, but the code is going to be even longer.

Prerequisite

Emacs

(setq inferior-lisp-program "sbcl")

Anyways, for me, I need more memory so I did this.

(setq inferior-lisp-program "sbcl --dynamic-space-size 13000")

ASDF

  • Edit ~/.config/common-lisp/source-registry.conf
(:source-registry
  (:tree (:home "Develop/thesis"))
  :inherit-configuration)

Develop/thesis must be changed to a path to a directory in the home directory.

The project

My project name is mt-seq and I put it in ~/Develop/thesis.

Update 2020-12-06: Or the project can be created using quickproject. Thanks Michał “phoe” Herda.

  • ~/Develop/thesis/mt-seq.asd
(defsystem mt-seq
  :description "mt-seq"
  :author "Vee Satayamas"
  :license "LLGPL"
  :depends-on ("asdf" "lparallel" "bt-semaphore")
  :components ((:module "src"
		:serial t
		:components ((:file "packages")
			     (:file "mt")))))
  • ~/Develop/thesis/src/mt.lisp
(in-package :mt-seq)

(defun toto (x)
  x)
  • ~/Develop/thesis/src/packages.lisp
(defpackage :mt-seq
  (:use :cl :lparallel :bt-semaphore)
  (:export #:toto))

I expect that this should be sufficient for starting a new project in Common Lisp in a proper format.

เขียน #rustlang แบบ

(word_id, textunit_id) 

วันเดียวพอจำได้หรอก เขียนไปนาน ๆ อาจจะพลาดใส่

(textunit_id, word_id)

ก็ได้

อันนี้ static type อาจจะวืดเพราะทั้ง textunit-id และ word-id ก็เป็น u32 ทั้งคู่ อาจจะแก้แบบนี้สร้าง type มาใหม่เลย WordId กับ TextunitId แต่มันก็เหนื่อยอยู่นะ

ไม่ก็ใช้ struct แทน เช่น

struct WordIdTextunitId {
    word_id: u32,
    textunit_id: u32,
}

อันนี้เขียนง่ายดี แต่ compiler ไม่ได้ช่วยอะไรมากนะ ก็คือ programmer ก็ดูเอาเมื่อไหร่เขียน

WordIdTextunitId { word_id: textunit_id, textunit_id: word_id }

แบบนี้ก็ผิด โปรแกรมเมอร์เห็นเอง แต่ใส่สลับกัน compiler มันก็ผ่านนะ

แล้วคิดไปติดมามันก็พอกับเขียน Clojure เลย

{:word-id word-id
 :textunit-id textunit-id}

แบบนี้ก็ชัดเจนอยู่แล้ว ไม่ต้องประกาศ struct ด้วย หรือจะใช้ defstruct เลยก็ได้

ใน Common Lisp ก็คล้าย ๆ กัน def struct ก็ได้ จะทำเป็น alist ก็ได้

(list (cons :word-id word-id) 
       (cons :textunit-id textunit-id))

แบบนี้ก็ได้

กลับมา Rust จะจัดหนักแบบ ทำแบบนี้ก็บึ้มอยู่ดี

type WordId = u32;
type TextunitId = u32;

struct WordIdTextUnitId {
    word_id: WordId,
    textunit_id: TextunitId,
}

fn main() {
   let w: WordId = 1;
   let t: TextunitId = 2;
   let x = WordIdTextUnitId { word_id: t, textunit_id: w};
}

สลับได้ compiler ไม่ช่วยอะไร

แต่ถ้าทำเป็น struct หมดเลย

struct WordId(u32);
struct TextunitId(u32);

struct WordIdTextUnitId {
    word_id: WordId,
    textunit_id: TextunitId,
}

fn main() {
    let w = WordId(1);
    let t = TextunitId(2);
    let x = WordIdTextUnitId { word_id: t, textunit_id: w};
}

แบบนี้ compiler มันจะช่วยได้รู้แล้วว่าสลับกัน

แต่ก็จะมาเจอว่าผมอยากได้

let v = w + 1; 

แบบนี้ทำไง ก็มีวิธีทำหลายทางนะ เพียงแต่มันก็ต้องออกแรง